NIST Special Publication 500-248
April 2001
Design of a File Format for
Logging Website Interaction
Contribution of the National Institute of Standards and Technology. Not
subject to copyright. Reference to specific commercial products or brands is for
information purposes only; no endorsement or recommendation by the National
Institute of Standards and Technology, explicit or implicit, is intended.
Abstract
The logging of user behavior in support of web usability
testing is constrained by the difficulty of capturing and analyzing large
amounts of logged data. However, there is great potential for the development of
tools to support automated recording and analysis, especially for remote or
large scale testing. In this paper, we propose a format for the representation
of user interaction with a website. A widely accepted format enables the
development of a set of software tools to process the data, the sharing of data
sets for longer term analysis and research, and provides a common language for
expressing user interaction with a website.
Keywords
log format; user logging; usability testing; web-based
applications
1. Introduction
This paper describes a format called FLUD, Framework
for Logging Usability Data, for representing user interaction with a website. A
common file format enables sharing of logged data and the development of
interoperable tools. The logging of user behavior in support of web usability
testing is constrained by the difficulty of capturing and analyzing large
amounts of logged data. However, there is great potential for the development of
tools to support automated recording and analysis, especially for remote or
large scale testing. The few tools that currently exist do not interoperate and
so cannot exchange data easily. Researchers cannot easily share large data sets
of user logs for further post-mortem analysis and exploration. The file format
that we propose here is designed to address these problems. It can be viewed as
a browser-independent language for expressing user interaction with a website.
Given such a language, one can then formally describe what is being collected
during a usability test.
In Section 2 we provide the background that led to the development of of
FLUD. Section 3 summarizes related work. The FLUD design and accompanying tools
we developed for its use are presented in Sections 4 and 5. In Section 6 we list
a number of issues that are factors in the adoption of FLUD by the usability
community.
2. Background and Motivation
2.1 Why FLUD?
The Visualization and Usability Group,
which is part of the Information
Access Division of NIST's Information
Technology Laboratory has been working on measurement, testing, and
standards relating to usability engineering since 1997. In particular, the NIST Web Metrics Testbed has focused
on automation to support the usability engineering process which has resulted in
the development of prototype tools that are publically available to the
usability community.
One focus of this effort is the recording of the users' interaction with a
web-based application as they attempt to perform given tasks. This captured log
data can be valuable for analyzing and improving usability [Etgen00].
As we developed our recording tool WebVIP, [WebVIP]
we found that we did not have a formal specification for the user logs we wanted
to collect. Initially, our output was closely tied to the specific event model
of the browser we were working with. It soon became evident that log data is
quite complex and that a common file format was needed to allow various software
components (such as recorders, parsers, analyzers, and visualizers) to exchange
information. At that point, we focused part of our effort on FLUD.
2.2 Approaches to Usability Testing
While there are some purely
automated approaches to usability evaluation (but see [Niel99]
for a skeptical view), the automation provided by most of the NIST Web Metrics
prototypes supports the usability engineer during the course of usability
testing, in which subjects are asked to perform some tasks.
Usability testing encompasses a range of approaches:
- Direct human observation of the subject by a usability engineer, who
records and interprets the subject's behavior. This approach has great
semantic depth, since all the subject's external behavior is available for
analysis by an intelligent observer; indeed, the subject's thought processes
may be queried as well. The process is, however, time-consuming and there
seems to be only slight opportunity for automation.
- High-level automated monitoring. The software under test can be
instrumented by hand so as to report application-specific performance metrics,
such as a score indicating degree of success in achieving a task and time
taken. This approach allows a larger number of subjects to participate in the
test, but is less helpful in analyzing why subjects succeed or fail.
- Automated monitoring of low-level user behavior. This can often be done
with the help of existing software, e.g. by capturing events as reported by a
browser. The problem here is that the higher-level, more meaningful
description of the subjects' behavior is lost.
Our efforts are aimed
at combining the best aspects of the last two approaches: we want to
automatically generate a mid-level description of behavior such that we can
achieve breadth (a large number of subjects), but also enable some
computer-assisted analysis of why individual subjects performed as they
did.
3. Other Related Work on Log Files
W3C has promulgated a Common Log
Format [W3Log]
for server logs, which record requests for web pages from external clients.
These logs are compiled automatically as a byproduct of running a web server,
but at best contain information only about page jumps made by a user. No
intra-page activity is captured. Moreover, it can be difficult to distinguish
the activity of a particular user among requests from several sources and the
use of cache storage by a browser may hide repeat requests. In short, a server
log tracks the activity of a server, not a subject. It is noteworthy, however,
that several analysis and visualization tools for server logs [Analog,
Flash,
Hoch99,
Hoch00,
LogAn,
PWeb,
Webal]
have emerged, encouraged no doubt by the existence of this common log
format.
There are several event models, such as those supported by popular browsers,
e.g. Netscape [Netsc]
and Internet Explorer [MSIE],
the widely used Xwindow system [Nye93],
and the Document Object Model proposed by W3C [DOM].
Because they represent user activity at a low level (e.g. mouse clicks and
keystrokes), they are application-independent. They do not, however, attempt to
represent user behavior at a higher level of abstraction, such as task
performance.
Hilbert and Redmiles [Hilb98,
Hilb99]
have developed a prototype system for gathering information on user activity,
but have not proposed a format for that information. Finally, there has been
some work by Fu [Fu01]
whose goal is to compile low-level information on user activity into
higher-level abstractions.
4. FLUD Design
The FLUD (Framework for Logging Usability Data) format
is intended to provide a representation of user interaction that is general
enough to support a wide range of usability testing. The complete specification
[Cugi01]
of FLUD's syntax and semantics is available on-line.
4.1 Requirements and Scope
In light of the general goals outlined
above, the FLUD format is designed to satisfy the following set of specific
requirements and constraints:
- The format should be machine readable and writable, but also somewhat
human-readable as well.
- It should support a database approach to user tests, by identifying
various dimensions typical of usability log data: subject, website, dataset,
task, and date, among others.
- It should define the operations typically encountered by a user
of web-based applications. Concepts such as mouse clicks, keystrokes, radio
buttons, scrolling, and jumping to a new page must all be included.
- It should also encompass the context of interaction, including
window and browser operations as well as activity internal to the webpage.
- However, it will exclude non-interactive user behavior such as
eye gaze, oral reports, and forehead-slapping for the obvious reason that in
order to capture such behavior, one needs special equipment beyond the usual
keyboard, mouse, and display. Also, the formal representation of such behavior
is more problematic.
- Also, it will not define (at least for now) more exotic interaction modes,
such as voice-activated commands or gaze-controlled cursor location.
- However, FLUD should be extensible; it should allow for the representation
of unforeseen types of activity (although such extensions cannot be
meaningfully processed by generic tools). For instance, if an
application makes use of an input device or a widget of a type not pre-defined
by the FLUD specification, grammatical hooks are provided so that the
subject's interaction can nonetheless be represented and parsed, as so-called
ad hoc fields. Of course, any analysis that depended on the semantics
of such fields could be performed only by suitably customized software.
4.2 Top-level FLUD File Concepts and Definitions
4.2.1 Session - A FLUD file records exactly one session. A session is
defined as the interaction of a single subject with a single fully configured
hardware system during a continuous time interval. A switch of platform or
subject is therefore considered to be a new session, by definition. Within a
session, a subject may visit several websites and webpages and attempt to
perform several tasks.
4.2.2 Task - The FLUD file format is designed for task-oriented
usability testing: the subject is given a task to perform (e.g. find at least
three documents about Iowa, find out how much a Boeing 747 weighs) and then
his/her performance is monitored. Undirected browsing can also be recorded
within a single "dummy" task.
4.2.3 Events - An event is defined to be a nearly instantaneous
occurrence involving the subject, the system under test, or both. The event
model is described in detail below. FLUD events are not quite the same as those
within current, well-known, low-level event models. Therefore, a generator may
need to map from their output to the FLUD level of abstraction. For instance,
FLUD includes higher-level information such as webpage navigation, as well as
typical low-level events (mouse, keyboard, widget).
4.2.4 Questionnaire - A FLUD file can represent the results of a
portion of the session wherein the subject responds to a questionnaire set up by
the tester. The difference between a questionnaire and a task is that a
questionnaire requests information directly from the subject (e.g. "How old are
you?", "Do you think the graphics are helpful or annoying?"), whereas a task is
usually meant to simulate the intended usage of the website and the subject is
monitored to find out such things as whether most people use the website
effectively. Also, only the results of the questionnaire are reported, not the
process by which they were answered (e.g. timing of the responses is not
reported).
4.2.5 Notes - A note record captures information typed in during the
session by the subject or tester or some other author. The idea is that test
session manager software might provide a facility for observations, comments,
complaints, or recommendations by interested parties if they encounter some
unusual situation. This could be invoked at the initiative of the note's author
or prompted by the system.
4.2.6 Conformance - The FLUD specification [Cugi01]
defines conformance, that is, what constitutes a valid FLUD file. Conformance
breaks down into syntactic and semantic requirements.
Syntactically, the file is described in a context-free grammar with a few
context-sensitive constraints. The file is a sequence of records, each of which
is a sequence of fields. Each field has a name (explicit or implicit), a value,
and a type which defines the range of those values. The specification attempts
to define fields so as to cover most common types of user interaction. Some of
these defined fields are required to be present, some are optional. In
additional, the file may contain so-called ad hoc fields, which are not
defined -- this is the mechanism for extensibilty.
Semantically, the basic requirement is that the file truly reflect a
subject's behavior: if the file says the subject pressed mouse button #2 at a
given time, then that must really have happened. It is not required
that the file capture all of the subject's interactive behavior, even if that
behavior is representable. Producing a complete record of behavior is probably
beyond the capability of most generators. Furthermore, the generator may
deliberately omit reporting some kinds of activity, e.g. mouse motion. In short,
FLUD requires the truth, and nothing but the truth, but not the whole truth.
4.3 Basic Event Model
Events are occurrences of short duration that are
apparent to the user and involve the system under test. By this definition, we
exclude operations of long duration (which may, however, be represented as a
sequence of several events), purely internal changes to the system, and
non-interactive user behavior. After some analysis, we decided that an event
could have up to three distinct aspects, as listed below. The FLUD syntax marker
is shown next to each component.
- User_action (#U)
- An action performed directly by the user and associated with a particular
input device, typically a mouse or a keyboard.
- This_widget (#W)
- Describes state changes in the widget, if any, to which the user_action
was targeted. Screen objects, such as buttons, textboxes, menus, checkboxes,
and sliders, are typical widgets.
- System_effect
- System_effect is used to describe "everything else"; in particular, how
the state of the system as seen by the user changes (either as a result of the
user_action or autonomously). System_effects are further sub-divided into
three categories:
- Other_widget (#OW)
- State changes in any widget, other than this_widget.
- Window_state (#WN)
- Includes typical window operations, such as open, close, move, re-size,
and iconify.
- Webpage_operation (#OP)
-
- print: usually available as a browser operation
- newpage: operations involving a new webpage
- request, loading, complete
- page_locate: indicates which part of the webpage is visible within a
window
An event record can contain any combination
of these components. While there can be at most one user_action and this_widget
component, there may be several system_effects. An example would be a user
clicking on a "Clear" button: the mouse-click is the user_action, the triggering
of Clear button is the this_widget aspect, but there may be several other
system_effects such as textboxes cleared to the null string, checkboxes set to
"off", and windows closed. When several components share an event record, it
means that they are causally related, not merely contemporaneous.
4.4 Example
Click to see an example of a complete FLUD file.
5. Tools
If the FLUD format is to be more than an academic exercise, it
must be supported by a set of software tools. The potential advantage of course
is that widespread adoption of a common format enables these tools to be generic
and sharable.
5.1 Generators
A FLUD generator is any software that monitors user and
system behavior during a test session, and produces a FLUD file that accurately
represents (some of) that behavior. Thus, the more information about the session
available to the generator, the better. An ideal generator would know about not
only low-level events, but also the broader computing context (browser and
window operations), and the application (task metrics). Two implementation
strategies suggest themselves:
5.1.1 Instrumentation of the website - There are at least two extant
systems, WebVIP [WebVIP]
and WET [Etgen99]
that semi-automatically instrument the pages of a website so as to report user
activity thereon. This approach makes sense when the focus is on design and
review of a particular website, as opposed to study of user behavior on the web
in general. Obviously, user activity outside of the instrumented site is not
captured. WebVIP has recently been modified to generate its output in the FLUD
format.
Website instrumentation supports remote testing and does not require a
special browser. Webpages can be further customized by hand to incorporate task
knowledge. However, there are some implementation difficulties with delivering
the log data back to the website owner. In particular, the generator must either
transmit data back to the server for every event or find storage on the client
side where data can be buffered and managed. Another problem is the issue of
privacy: instrumentation opens the door to the possibility of extensive covert
tracking of user activity.
5.1.2 Instrumentation of the browser - At least one study [Choo00]
has been done to track the way users navigate the web, using special software
called WebTracker that traces browser activity. This technique enables
the researcher to follow a subject's travel throughout the entire web, not just
a chosen website. This approach implies installation overhead for each subject,
rather than per website. In theory, since all of a subject's
interaction with the web is through a browser, an instrumented browser
potentially offers a more complete trace of user activity than instrumentation
of a webpage. This approach, however, seems less oriented towards the
incorporation of task or application knowledge. Customizing a website, by
contrast, may more easily allow automatic reporting of task metrics.
5.2 Parser
We have developed a FLUD parser to support the format. It
checks the syntax of a logfile and can generate three kinds of output as a
result:
- Parse file
- This is a highly stylized rendition of the original logfile. Its intended
purpose is to serve as input to other automated processes, such as analyzers
and visualizers (see below). Instead of processing the logfile directly, they
can invoke the parser to perform low-level syntax checking. The resulting
parse file can then be easily analyzed for higher-level purposes, such as
statistical summaries and the like.
- HTML file
- This file is essentially a pretty-print version of the logfile.
Indentation and color are used to clarify the file structure. Thus, this file
is primarily oriented towards human review. If syntax errors are found, an
error message will be inserted in the HTML file.
- Userpath files
- These files are designed as input to the NIST VisVIP software, [VisVIP]
which presents a 3D visualization of a user's navigation of a website. Each
task within the session generates a separate file. Each file represents the
web pages visited by the subject during a single task, and the length of time
spent on each page.
5.3 Post-processors
The real payoff for the FLUD format would be a
large varied suite of software tools available to the usability engineer for
analysis (statistical and otherwise) and visualization of user behavior on the
web. As mentioned above, VisVIP [VisVIP]
is one such tool, but we at NIST are planning to develop others and encourage
the community of usability experts to contribute as well.
6. Open Issues
6.1 Scope
FLUD is targeted at a level of representation appropriate for
usability analysis and evaluation. We hope to get feedback from usability
preofessionals as to how well this goal has been achieved. In particular, are
some of the entities defined in FLUD of little interest? Conversely, does FLUD
fail to provide a representation for certain valuable kinds of entities and user
behavior?
6.2 Feasibility
How feasible is it to build sophisticated generators of
FLUD files? Are there special difficulties representing user interaction with
dynamically generated pages? What problems are presented by the need to map
between FLUD and other event models? Finally, our experience with WebVIP has
revealed some low-level issues with the website instrumentation approach, such
as events being reported back to the Javascript code out of chronological
sequence, and inadequate mechanisms for transmitting information back to the
webserver.
6.3 Acceptance
Even given a specification of good technical merit, it
will have little value unless it is adopted among a wide enough circle of users.
The factors governing acceptance of a standard for exchanging information
include ease of use, perceived technical benefit, and, of course, recursively,
acceptance by others with whom one wishes to communicate. The first two factors,
at least, can be addressed by a suite of software tools that are readily
available and confer some advantage on their users.
|
Figure 2: VisVIP displays a userpath through a
website |
References
- [Analog]
- Analog (server log analyzer): http://www.analog.cx/
- [Choo00]
- Chun Wei Choo, Brian Detlor, Don Turnbull, "Information Seeking on the Web
- An Integrated Model of Browsing and Searching", First Monday,
Volume 5 no. 2, Feb 7, 2000: http://choo.fis.utoronto.ca/FIS/SSHRC/
and http://firstmonday.org/issues/issue5_2/choo/index.html
- [Cugi01]
- J. Cugini, "The FLUD format: Logging Usability Data from Web-based
Applications", NIST Special Publication 500-247, January 2001: http://www.itl.nist.gov/iad/vug/cugini/webmet/flud/specification.html
- [DOM]
- The DOM (Document Object Model) Level 2 Event Model: http://www.w3.org/TR/DOM-Level-2-Events/events.html
- [Etgen99]
- M.P. Etgen, J. Cantor, "What does getting WET (Web Event-logging Tool)
Mean for Web Usability?", Proceedings of the 5th Conference on Human
Factors and the Web, Gaithersburg, MD, June 1999: http://zing.ncsl.nist.gov/hfweb/proceedings/etgen-cantor/index.html
- [Etgen00]
- M.P. Etgen, J. Cantor, "A Comparison of Two Usability Testing Methods:
Formal Usability Testing and Automated Usability Logging", Proceedings of UPA
2000, Asheville, North Carolina, August 14-18, 2000.
- [Flash]
- FlashStats (server log analyzer): http://www.maximized.com/products/flashstats/
- [Fu01]
- W.-T. Fu (in press), "ACT-PRO: Action protocol tracer -- a tool for
analyzing simple, rule-based tasks", Behavior Research Methods, Instruments,
& Computers.
- [Hilb98]
- D.M. Hilbert and D.F. Redmiles, "Agents for Collecting Application Usage
data Over the Internet", Proceedings of the Second International
Conference on Autonomous Agents, Minneapolis/St. Paul, MN, ACM, May
10-13, 1998.
- [Hilb99]
- D.M. Hilbert and D.F. Redmiles, "Extracting Usability Information from
User Interface Events", Technical Report UCI-ICS-99-40, Department of
Information and Computer Science, University of California, Irvine.
- [Hoch99]
- H. Hochheiser and B. Shneiderman, "Using Interactive Visualizations of WWW
Log Data to Characterize Access Patterns and Inform Site Design", ASIS'99
Proceedings of the 62nd Annual Meeting of the American Society for Information
Science, October 31-November 4, 1999, Vol. 36, 331-344.
- [Hoch00]
- H. Hochheiser and B. Shneiderman, "Coordinating Overviews and Detail Views
of WWW Log Data", Tech report 200-25, Human-Computer Interaction Lab (HCIL) at
the University of Maryland, October 2000.
- [MSIE]
- The Internet Explorer Event Model: http://msdn.microsoft.com/workshop/author/om/event_model.asp
or http://www.webreference.com/js/column10/
- [LogAn]
- HTTPD Log Analyzers (list of server log analyzers): http://www.hypernews.org/HyperNews/get/www/log-analyzers.html
- [Netsc]
- The Netscape Navigator Event Model: http://www.webreference.com/js/column9/
- [Niel99]
- J. Nielsen, "Voodoo Usability", Jakob Nielsen's Alertbox, December 12,
1999: http://www.useit.com/alertbox/991212.html
- [Nye93]
- A. Nye, Xlib Reference Manual, O'Reilly & Associates, 1993.
- [PWeb]
- pwebstats (server log analyzer): http://martin.gleeson.com/pwebstats/
- [VisVIP]
- VisVIP (visualization of user paths through websites): http://www.itl.nist.gov/iad/vug/cugini/webmet/visvip/vv-home.html
- [W3Log]
- "Logging Control In W3C httpd": http://www.w3.org/Daemon/User/Config/Logging.html
- [Webal]
- Webalizer (server log analyzer): http://webalizer.dexa.org/
- [WebVIP]
- WebVIP (website instrumenter): http://www.nist.gov/webmetrics/