UPA Workshop Paper

Web Usability Logging: Tools and Formats

John Cugini / [email protected]
Information Technology Laboratory
National Institute of Standards and Technology (NIST)
Gaithersburg, MD 20899

Contribution of the National Institute of Standards and Technology. Not subject to copyright. Reference to specific commercial products or brands is for information purposes only; no endorsement or recommendation by the National Institute of Standards and Technology, explicit or implicit, is intended.

Submitted to the Tools to Support Faster and Better Usability Engineering workshop, August 15, 2000, Asheville, NC (held in conjunction with the UPA 2000 conference). This paper ia accessible at: http://www.itl.nist.gov/iaui/vvrg/cugini/webmet/paper-aug2000.html

Abstract

Systematic usability studies of web-based applications require software tools and data standards to support both the capture and representation of users' activities. Web logging tools can be based within the browser, the website, or the underlying server. We compare the advantages and costs of these strategies. Logging tools produce logfiles, which must then be analyzed by the usability engineer. A common format for such logfiles would greatly facilitate information exchange and enable common software for presentation and analysis.

Keywords

Usability; logging; logfiles; remote testing

1. Motivation

Information technology can help support usability studies of web-based applications. In particular, the automatic capture and digital representation of users' activities provide a base of support for later analysis. Section 2 of this paper, describes and compares various approaches to the automation of user logging. Ideally, logging tools support rapid and remote usability testing. Section 3 discusses some issues in the design of a general format for logfiles. We do not consider herein either the problem of "deep" interpretation of logfiles, nor of capturing data that is not dependent on direct user interaction with the system.

2. Taxonomy of Web Logging Tools

There are at least three plausible instrumentation techniques for recording user activity on the web. The following table provides a brief definition and comparison.

Technique Pro Con

Browser: The browser can be modified so as to record selected user activity during a session.

Can get all of a user's web activity (over several sites)
Probably easier to install per unit than website instrumentation
Can capture intrapage activity
Can easily capture browser-level operations as well (reload, back, etc.)
Usually can store logfile locally and ship back to a central site later

Each user has to install instrumentation
Obviously browser-dependent
May be difficult to access and/or modify browser code
May be hard to relate activity to semantics of website, since there is no special information per website

Website: The pages of the website under study can be modified to capture events within the page. Javascript provides a convenient mechanism.

Instrument once only
Can still track activity per user
Can capture intrapage activity
More opportunity for smart interpretation of low-level events

Installation can be tricky
Some technical issues about delivering log data back to website owner
Some issues with browser dependence, but solvable
May be used for covert tracking of user activity

Server: The webserver simply watches and records who asks for pages on the host.

Zero effort - server generates logfile by default
Emerging standards for server logfiles allows for common tools
Browser independent

Harder to track activity per user
No intrapage information
Shows only original page requests - local caching can hide repeat visits

Technique	Pro	Con
Browser: The browser can be modified so as to record selected user activity during a session.	Can get all of a user's web activity (over several sites) Probably easier to install per unit than website instrumentation Can capture intrapage activity Can easily capture browser-level operations as well (reload, back, etc.) Usually can store logfile locally and ship back to a central site later	Each user has to install instrumentation Obviously browser-dependent May be difficult to access and/or modify browser code May be hard to relate activity to semantics of website, since there is no special information per website
Website: The pages of the website under study can be modified to capture events within the page. Javascript provides a convenient mechanism.	Instrument once only Can still track activity per user Can capture intrapage activity More opportunity for smart interpretation of low-level events	Installation can be tricky Some technical issues about delivering log data back to website owner Some issues with browser dependence, but solvable May be used for covert tracking of user activity
Server: The webserver simply watches and records who asks for pages on the host.	Zero effort - server generates logfile by default Emerging standards for server logfiles allows for common tools Browser independent	Harder to track activity per user No intrapage information Shows only original page requests - local caching can hide repeat visits

The browser-based approach is more "user-centric". It is especially appropriate for studies on such issues as how people use the web, whether interaction styles are more strongly determined by the website or the user, and other questions that are not strongly task-based. The website-based approach is "application-centric". It is a good choice when the purpose is to design or improve a specific website so as to achieve its goals, whether those be sales, information dissemination, or others. It fits well with task-based usability testing, although non-directed usage may also be monitored.

The Department of Information Studies at the University of Toronto has developed a tool called WebTracker[1] which exemplifies the first technique. Tools to assist in the second approach include WET[2] from AT+T and WebVIP[3] from NIST.

3. Design Issues for Logfile Standardization

The basic problem in designing a standard format for logfiles is to reconcile the representation of common, low-level events with that of higher-level and more meaningful, but application-specific, operations. Software can detect that the user is clicking a mouse at a given pixel location; the usability engineer wants to know that the user is selecting a given menu option, in the course of performing a specified task. Our group[4] at NIST is currently experimenting with a draft logfile format to see if this semantic gap can be bridged. The following subsections give a brief description of the major design decisions we have made so far.

3.1 Basic Structure

The basic structure of the logfile is that it contains general information about the session first, then blocks of task data and/or questionnaire data. Tasks are broken down further into events. A questionnaire can also be nested within a task. A crude picture showing the nesting structure:

             Logfile
           -----------
           /    |    \
          /    Task   \
         /    /   \    \
        Events    Questionnaire
                        |
                   Response

In questionnaire blocks, the logfile represents only the content of the user's responses to questions, typically concerning demographic information or user satisfaction. The low-level events evoked in the course of responding are not of interest.

The task block allows application-specific metrics and other task-level information to be recorded.

There are potentially three kinds of substantive components of an event: user action, widget state, and system effect. These are related causally; an example would be a user clicking on a "close window" button. The user action is the mouse click, the widget is the labelled button, and the effect is the actual closing of the window. These components can be reported within a single event record, or separately. A "smarter" generator would report them together, thus building into the logfile more integrated knowledge about user behavior and system response. Conversely, a simpler generator could report results piecemeal and leave the integration as a chore for the logfile analyzer.

3.2 Conformance

Our conformance strategy is to be syntactically inclusive and semantically lenient. We try to define representations for all the events and widget interactions commonly seen in web-apps. We do not, however, require that all the defined elements actually be recorded. That is, a logfile generator may choose to systematically leave out certain classes of events, either because they are of no interest, or because they are difficult to detect.

3.3 Scope of events

Users often interact not only with the pages of a web-app, but also with the screen environment in which they are embedded. Specifically, a realistic usability study should note when the user performs browser operations (e.g. reload, print) and window operations (e.g. resize). In keeping with our conformance strategy, we define the way in which such behavior is to be represented.

4. Future Work

We intend to continue refining the WebVIP generator and logfile format, and also to develop logfile interpreters, such as analysis and visualization software. We believe that a good logfile format could serve as a medium of exchange for general-purpose usability tools.

References

[1] WebTracker information at http://choo.fis.utoronto.ca/esproject/

[2] WET information at http://zing.ncsl.nist.gov/hfweb/proceedings/etgen-cantor/index.html

[3] WebVIP information at http://zing.ncsl.nist.gov/webmet/vip/webvip-process.html

[4] Visualization and Virtual Reality Group information at http://www.nist.gov/itl/div894/vvrg/