Design of FLUD

NIST Special Publication 500-248
April 2001

Design of a File Format for
Logging Website Interaction

John Cugini / [email protected]
Sharon Laskowski / [email protected]
Information Technology Laboratory
National Institute of Standards and Technology (NIST)

Contribution of the National Institute of Standards and Technology. Not subject to copyright. Reference to specific commercial products or brands is for information purposes only; no endorsement or recommendation by the National Institute of Standards and Technology, explicit or implicit, is intended.

Abstract

The logging of user behavior in support of web usability testing is constrained by the difficulty of capturing and analyzing large amounts of logged data. However, there is great potential for the development of tools to support automated recording and analysis, especially for remote or large scale testing. In this paper, we propose a format for the representation of user interaction with a website. A widely accepted format enables the development of a set of software tools to process the data, the sharing of data sets for longer term analysis and research, and provides a common language for expressing user interaction with a website.

Keywords

log format; user logging; usability testing; web-based applications

1. Introduction

This paper describes a format called FLUD, Framework for Logging Usability Data, for representing user interaction with a website. A common file format enables sharing of logged data and the development of interoperable tools. The logging of user behavior in support of web usability testing is constrained by the difficulty of capturing and analyzing large amounts of logged data. However, there is great potential for the development of tools to support automated recording and analysis, especially for remote or large scale testing. The few tools that currently exist do not interoperate and so cannot exchange data easily. Researchers cannot easily share large data sets of user logs for further post-mortem analysis and exploration. The file format that we propose here is designed to address these problems. It can be viewed as a browser-independent language for expressing user interaction with a website. Given such a language, one can then formally describe what is being collected during a usability test.

In Section 2 we provide the background that led to the development of of FLUD. Section 3 summarizes related work. The FLUD design and accompanying tools we developed for its use are presented in Sections 4 and 5. In Section 6 we list a number of issues that are factors in the adoption of FLUD by the usability community.

2. Background and Motivation

2.1 Why FLUD?

The Visualization and Usability Group, which is part of the Information Access Division of NIST's Information Technology Laboratory has been working on measurement, testing, and standards relating to usability engineering since 1997. In particular, the NIST Web Metrics Testbed has focused on automation to support the usability engineering process which has resulted in the development of prototype tools that are publically available to the usability community.

One focus of this effort is the recording of the users' interaction with a web-based application as they attempt to perform given tasks. This captured log data can be valuable for analyzing and improving usability [Etgen00]. As we developed our recording tool WebVIP, [WebVIP] we found that we did not have a formal specification for the user logs we wanted to collect. Initially, our output was closely tied to the specific event model of the browser we were working with. It soon became evident that log data is quite complex and that a common file format was needed to allow various software components (such as recorders, parsers, analyzers, and visualizers) to exchange information. At that point, we focused part of our effort on FLUD.

2.2 Approaches to Usability Testing

While there are some purely automated approaches to usability evaluation (but see [Niel99] for a skeptical view), the automation provided by most of the NIST Web Metrics prototypes supports the usability engineer during the course of usability testing, in which subjects are asked to perform some tasks.

Usability testing encompasses a range of approaches:

Direct human observation of the subject by a usability engineer, who records and interprets the subject's behavior. This approach has great semantic depth, since all the subject's external behavior is available for analysis by an intelligent observer; indeed, the subject's thought processes may be queried as well. The process is, however, time-consuming and there seems to be only slight opportunity for automation.
High-level automated monitoring. The software under test can be instrumented by hand so as to report application-specific performance metrics, such as a score indicating degree of success in achieving a task and time taken. This approach allows a larger number of subjects to participate in the test, but is less helpful in analyzing why subjects succeed or fail.
Automated monitoring of low-level user behavior. This can often be done with the help of existing software, e.g. by capturing events as reported by a browser. The problem here is that the higher-level, more meaningful description of the subjects' behavior is lost.

Our efforts are aimed at combining the best aspects of the last two approaches: we want to automatically generate a mid-level description of behavior such that we can achieve breadth (a large number of subjects), but also enable some computer-assisted analysis of why individual subjects performed as they did.

3. Other Related Work on Log Files

W3C has promulgated a Common Log Format [W3Log] for server logs, which record requests for web pages from external clients. These logs are compiled automatically as a byproduct of running a web server, but at best contain information only about page jumps made by a user. No intra-page activity is captured. Moreover, it can be difficult to distinguish the activity of a particular user among requests from several sources and the use of cache storage by a browser may hide repeat requests. In short, a server log tracks the activity of a server, not a subject. It is noteworthy, however, that several analysis and visualization tools for server logs [Analog, Flash, Hoch99, Hoch00, LogAn, PWeb, Webal] have emerged, encouraged no doubt by the existence of this common log format.

There are several event models, such as those supported by popular browsers, e.g. Netscape [Netsc] and Internet Explorer [MSIE], the widely used Xwindow system [Nye93], and the Document Object Model proposed by W3C [DOM]. Because they represent user activity at a low level (e.g. mouse clicks and keystrokes), they are application-independent. They do not, however, attempt to represent user behavior at a higher level of abstraction, such as task performance.

Hilbert and Redmiles [Hilb98, Hilb99] have developed a prototype system for gathering information on user activity, but have not proposed a format for that information. Finally, there has been some work by Fu [Fu01] whose goal is to compile low-level information on user activity into higher-level abstractions.

4. FLUD Design

The FLUD (Framework for Logging Usability Data) format is intended to provide a representation of user interaction that is general enough to support a wide range of usability testing. The complete specification [Cugi01] of FLUD's syntax and semantics is available on-line.

4.1 Requirements and Scope

In light of the general goals outlined above, the FLUD format is designed to satisfy the following set of specific requirements and constraints:

The format should be machine readable and writable, but also somewhat human-readable as well.
It should support a database approach to user tests, by identifying various dimensions typical of usability log data: subject, website, dataset, task, and date, among others.
It should define the operations typically encountered by a user of web-based applications. Concepts such as mouse clicks, keystrokes, radio buttons, scrolling, and jumping to a new page must all be included.
It should also encompass the context of interaction, including window and browser operations as well as activity internal to the webpage.
However, it will exclude non-interactive user behavior such as eye gaze, oral reports, and forehead-slapping for the obvious reason that in order to capture such behavior, one needs special equipment beyond the usual keyboard, mouse, and display. Also, the formal representation of such behavior is more problematic.
Also, it will not define (at least for now) more exotic interaction modes, such as voice-activated commands or gaze-controlled cursor location.
However, FLUD should be extensible; it should allow for the representation of unforeseen types of activity (although such extensions cannot be meaningfully processed by generic tools). For instance, if an application makes use of an input device or a widget of a type not pre-defined by the FLUD specification, grammatical hooks are provided so that the subject's interaction can nonetheless be represented and parsed, as so-called ad hoc fields. Of course, any analysis that depended on the semantics of such fields could be performed only by suitably customized software.

4.2 Top-level FLUD File Concepts and Definitions

4.2.1 Session - A FLUD file records exactly one session. A session is defined as the interaction of a single subject with a single fully configured hardware system during a continuous time interval. A switch of platform or subject is therefore considered to be a new session, by definition. Within a session, a subject may visit several websites and webpages and attempt to perform several tasks.

4.2.2 Task - The FLUD file format is designed for task-oriented usability testing: the subject is given a task to perform (e.g. find at least three documents about Iowa, find out how much a Boeing 747 weighs) and then his/her performance is monitored. Undirected browsing can also be recorded within a single "dummy" task.

4.2.3 Events - An event is defined to be a nearly instantaneous occurrence involving the subject, the system under test, or both. The event model is described in detail below. FLUD events are not quite the same as those within current, well-known, low-level event models. Therefore, a generator may need to map from their output to the FLUD level of abstraction. For instance, FLUD includes higher-level information such as webpage navigation, as well as typical low-level events (mouse, keyboard, widget).

4.2.4 Questionnaire - A FLUD file can represent the results of a portion of the session wherein the subject responds to a questionnaire set up by the tester. The difference between a questionnaire and a task is that a questionnaire requests information directly from the subject (e.g. "How old are you?", "Do you think the graphics are helpful or annoying?"), whereas a task is usually meant to simulate the intended usage of the website and the subject is monitored to find out such things as whether most people use the website effectively. Also, only the results of the questionnaire are reported, not the process by which they were answered (e.g. timing of the responses is not reported).

4.2.5 Notes - A note record captures information typed in during the session by the subject or tester or some other author. The idea is that test session manager software might provide a facility for observations, comments, complaints, or recommendations by interested parties if they encounter some unusual situation. This could be invoked at the initiative of the note's author or prompted by the system.

4.2.6 Conformance - The FLUD specification [Cugi01] defines conformance, that is, what constitutes a valid FLUD file. Conformance breaks down into syntactic and semantic requirements.

Syntactically, the file is described in a context-free grammar with a few context-sensitive constraints. The file is a sequence of records, each of which is a sequence of fields. Each field has a name (explicit or implicit), a value, and a type which defines the range of those values. The specification attempts to define fields so as to cover most common types of user interaction. Some of these defined fields are required to be present, some are optional. In additional, the file may contain so-called ad hoc fields, which are not defined -- this is the mechanism for extensibilty.

Semantically, the basic requirement is that the file truly reflect a subject's behavior: if the file says the subject pressed mouse button #2 at a given time, then that must really have happened. It is not required that the file capture all of the subject's interactive behavior, even if that behavior is representable. Producing a complete record of behavior is probably beyond the capability of most generators. Furthermore, the generator may deliberately omit reporting some kinds of activity, e.g. mouse motion. In short, FLUD requires the truth, and nothing but the truth, but not the whole truth.

4.3 Basic Event Model

Events are occurrences of short duration that are apparent to the user and involve the system under test. By this definition, we exclude operations of long duration (which may, however, be represented as a sequence of several events), purely internal changes to the system, and non-interactive user behavior. After some analysis, we decided that an event could have up to three distinct aspects, as listed below. The FLUD syntax marker is shown next to each component.

User_action (#U)

An action performed directly by the user and associated with a particular input device, typically a mouse or a keyboard.

This_widget (#W)

Describes state changes in the widget, if any, to which the user_action was targeted. Screen objects, such as buttons, textboxes, menus, checkboxes, and sliders, are typical widgets.

System_effect

System_effect is used to describe "everything else"; in particular, how the state of the system as seen by the user changes (either as a result of the user_action or autonomously). System_effects are further sub-divided into three categories:

Other_widget (#OW)

State changes in any widget, other than this_widget.

Window_state (#WN)

Includes typical window operations, such as open, close, move, re-size, and iconify.

Webpage_operation (#OP)

print: usually available as a browser operation
newpage: operations involving a new webpage
- request, loading, complete
page_locate: indicates which part of the webpage is visible within a window

An event record can contain any combination of these components. While there can be at most one user_action and this_widget component, there may be several system_effects. An example would be a user clicking on a "Clear" button: the mouse-click is the user_action, the triggering of Clear button is the this_widget aspect, but there may be several other system_effects such as textboxes cleared to the null string, checkboxes set to "off", and windows closed. When several components share an event record, it means that they are causally related, not merely contemporaneous.

4.4 Example

Click to see an example of a complete FLUD file.

5. Tools

If the FLUD format is to be more than an academic exercise, it must be supported by a set of software tools. The potential advantage of course is that widespread adoption of a common format enables these tools to be generic and sharable.

5.1 Generators

A FLUD generator is any software that monitors user and system behavior during a test session, and produces a FLUD file that accurately represents (some of) that behavior. Thus, the more information about the session available to the generator, the better. An ideal generator would know about not only low-level events, but also the broader computing context (browser and window operations), and the application (task metrics). Two implementation strategies suggest themselves:

5.1.1 Instrumentation of the website - There are at least two extant systems, WebVIP [WebVIP] and WET [Etgen99] that semi-automatically instrument the pages of a website so as to report user activity thereon. This approach makes sense when the focus is on design and review of a particular website, as opposed to study of user behavior on the web in general. Obviously, user activity outside of the instrumented site is not captured. WebVIP has recently been modified to generate its output in the FLUD format.

Website instrumentation supports remote testing and does not require a special browser. Webpages can be further customized by hand to incorporate task knowledge. However, there are some implementation difficulties with delivering the log data back to the website owner. In particular, the generator must either transmit data back to the server for every event or find storage on the client side where data can be buffered and managed. Another problem is the issue of privacy: instrumentation opens the door to the possibility of extensive covert tracking of user activity.

5.1.2 Instrumentation of the browser - At least one study [Choo00] has been done to track the way users navigate the web, using special software called WebTracker that traces browser activity. This technique enables the researcher to follow a subject's travel throughout the entire web, not just a chosen website. This approach implies installation overhead for each subject, rather than per website. In theory, since all of a subject's interaction with the web is through a browser, an instrumented browser potentially offers a more complete trace of user activity than instrumentation of a webpage. This approach, however, seems less oriented towards the incorporation of task or application knowledge. Customizing a website, by contrast, may more easily allow automatic reporting of task metrics.

5.2 Parser

We have developed a FLUD parser to support the format. It checks the syntax of a logfile and can generate three kinds of output as a result:

Parse file: This is a highly stylized rendition of the original logfile. Its intended purpose is to serve as input to other automated processes, such as analyzers and visualizers (see below). Instead of processing the logfile directly, they can invoke the parser to perform low-level syntax checking. The resulting parse file can then be easily analyzed for higher-level purposes, such as statistical summaries and the like.
HTML file: This file is essentially a pretty-print version of the logfile. Indentation and color are used to clarify the file structure. Thus, this file is primarily oriented towards human review. If syntax errors are found, an error message will be inserted in the HTML file.
Userpath files: These files are designed as input to the NIST VisVIP software, [VisVIP] which presents a 3D visualization of a user's navigation of a website. Each task within the session generates a separate file. Each file represents the web pages visited by the subject during a single task, and the length of time spent on each page.

5.3 Post-processors

The real payoff for the FLUD format would be a large varied suite of software tools available to the usability engineer for analysis (statistical and otherwise) and visualization of user behavior on the web. As mentioned above, VisVIP [VisVIP] is one such tool, but we at NIST are planning to develop others and encourage the community of usability experts to contribute as well.

6. Open Issues

6.1 Scope

FLUD is targeted at a level of representation appropriate for usability analysis and evaluation. We hope to get feedback from usability preofessionals as to how well this goal has been achieved. In particular, are some of the entities defined in FLUD of little interest? Conversely, does FLUD fail to provide a representation for certain valuable kinds of entities and user behavior?

6.2 Feasibility

How feasible is it to build sophisticated generators of FLUD files? Are there special difficulties representing user interaction with dynamically generated pages? What problems are presented by the need to map between FLUD and other event models? Finally, our experience with WebVIP has revealed some low-level issues with the website instrumentation approach, such as events being reported back to the Javascript code out of chronological sequence, and inadequate mechanisms for transmitting information back to the webserver.

6.3 Acceptance

Even given a specification of good technical merit, it will have little value unless it is adopted among a wide enough circle of users. The factors governing acceptance of a standard for exchanging information include ease of use, perceived technical benefit, and, of course, recursively, acceptance by others with whom one wishes to communicate. The first two factors, at least, can be addressed by a suite of software tools that are readily available and confer some advantage on their users.

Figure 2: VisVIP displays a userpath through a website

References

[Analog]: Analog (server log analyzer): http://www.analog.cx/
[Choo00]: Chun Wei Choo, Brian Detlor, Don Turnbull, "Information Seeking on the Web - An Integrated Model of Browsing and Searching", First Monday, Volume 5 no. 2, Feb 7, 2000: http://choo.fis.utoronto.ca/FIS/SSHRC/ and http://firstmonday.org/issues/issue5_2/choo/index.html
[Cugi01]: J. Cugini, "The FLUD format: Logging Usability Data from Web-based Applications", NIST Special Publication 500-247, January 2001: http://www.itl.nist.gov/iad/vug/cugini/webmet/flud/specification.html
[DOM]: The DOM (Document Object Model) Level 2 Event Model: http://www.w3.org/TR/DOM-Level-2-Events/events.html
[Etgen99]: M.P. Etgen, J. Cantor, "What does getting WET (Web Event-logging Tool) Mean for Web Usability?", Proceedings of the 5th Conference on Human Factors and the Web, Gaithersburg, MD, June 1999: http://zing.ncsl.nist.gov/hfweb/proceedings/etgen-cantor/index.html
[Etgen00]: M.P. Etgen, J. Cantor, "A Comparison of Two Usability Testing Methods: Formal Usability Testing and Automated Usability Logging", Proceedings of UPA 2000, Asheville, North Carolina, August 14-18, 2000.
[Flash]: FlashStats (server log analyzer): http://www.maximized.com/products/flashstats/
[Fu01]: W.-T. Fu (in press), "ACT-PRO: Action protocol tracer -- a tool for analyzing simple, rule-based tasks", Behavior Research Methods, Instruments, & Computers.
[Hilb98]: D.M. Hilbert and D.F. Redmiles, "Agents for Collecting Application Usage data Over the Internet", Proceedings of the Second International Conference on Autonomous Agents, Minneapolis/St. Paul, MN, ACM, May 10-13, 1998.
[Hilb99]: D.M. Hilbert and D.F. Redmiles, "Extracting Usability Information from User Interface Events", Technical Report UCI-ICS-99-40, Department of Information and Computer Science, University of California, Irvine.
[Hoch99]: H. Hochheiser and B. Shneiderman, "Using Interactive Visualizations of WWW Log Data to Characterize Access Patterns and Inform Site Design", ASIS'99 Proceedings of the 62nd Annual Meeting of the American Society for Information Science, October 31-November 4, 1999, Vol. 36, 331-344.
[Hoch00]: H. Hochheiser and B. Shneiderman, "Coordinating Overviews and Detail Views of WWW Log Data", Tech report 200-25, Human-Computer Interaction Lab (HCIL) at the University of Maryland, October 2000.
[MSIE]: The Internet Explorer Event Model: http://msdn.microsoft.com/workshop/author/om/event_model.asp or http://www.webreference.com/js/column10/
[LogAn]: HTTPD Log Analyzers (list of server log analyzers): http://www.hypernews.org/HyperNews/get/www/log-analyzers.html
[Netsc]: The Netscape Navigator Event Model: http://www.webreference.com/js/column9/
[Niel99]: J. Nielsen, "Voodoo Usability", Jakob Nielsen's Alertbox, December 12, 1999: http://www.useit.com/alertbox/991212.html
[Nye93]: A. Nye, Xlib Reference Manual, O'Reilly & Associates, 1993.
[PWeb]: pwebstats (server log analyzer): http://martin.gleeson.com/pwebstats/
[VisVIP]: VisVIP (visualization of user paths through websites): http://www.itl.nist.gov/iad/vug/cugini/webmet/visvip/vv-home.html
[W3Log]: "Logging Control In W3C httpd": http://www.w3.org/Daemon/User/Config/Logging.html
[Webal]: Webalizer (server log analyzer): http://webalizer.dexa.org/
[WebVIP]: WebVIP (website instrumenter): http://www.nist.gov/webmetrics/

NIST Special Publication 500-248April 2001

Design of a File Format forLogging Website Interaction

John Cugini / [email protected]Sharon Laskowski / [email protected]Information Technology LaboratoryNational Institute of Standards and Technology (NIST)