To specify or measure usability, measures of effectiveness, efficiency and satisfaction are required for goals. Usability measures may be specified for overall goals (e.g. produce a letter) or for narrower goals (e.g. performsearch and replace). Selecting usability measures for the most important user goals may mean ignoring many functions, but is likely to be the most practical approach. Examples of appropriate measures are given in Table B.1.
Table B.1 Examples of measures of usability
Percentage of goals achieved;
Percentage of users successfully completing task;
Average accuracy of completed tasks
Time to complete a task;
Tasks completed per unit time;
Monetary cost of performing the task
Rating scale for satisfaction;
Usage rate over time;
Frequency of complaints
B.2. Measures for desired properties of the product
Additional measures may be required for particular desired properties of the product which contribute to usability. Examples of some of these properties and additional specialised measures are given in Table B.2. In addition, where appropriate the measures given in Table B.1 can also be used for the usability objectives given in Table B.2.
Table B.2 Examples of measures for desired properties of the product
Appropriate for trained users
Number of power tasks performed; Percentage of relevant functions used
Relative efficiency compared with an expert user
Rating scale for satisfaction with power features
Appropriate for walk up and use
Percentage of tasks completed successfully on first attempt
Time taken on first attempt; Relative efficiency on first attempt
Rate of voluntary use
Appropriate for infrequent or intermittent use
Time spent relearning functions; Number of persistent errors
Frequency of reuse
Minimisation of support requirements
Number of references to documentation; Number of calls to support; Number of accesses to help
Productive time; Time to learn to criterion
Number of functions learned; Percentage of users able to learn to criterion
Time to learn to criterion; Time to relearn to criterion; Relative efficiency while learning
Rating scale for ease of learning
Percentage of errors corrected or reported by the system; Number of user errors tolerated
Time spent on correcting errors
Rating scale for error handling
Percentage of words read correctly at normal viewing distance
B.3 Choosing usability criteria
The choice of criterion values of measures of usability depends on the requirements for the product and the needs of the organisation setting the criteria. Usability objectives may relate to a primary goal (e.g. produce a letter) or a sub-goal (e.g. search and replace) or secondary goals (e.g. learnability or adaptability). Focusing usability objectives on the most important user goals may mean ignoring many functions, but is likely to be the most practical approach. Setting usability objectives for specific sub-goals may permit evaluation earlier in the development process.
It may be necessary to specify criteria both for the minimum acceptable level of usability and for the target level of usability.
When setting criterion values for a group of users, the criteria may be set as an average (e.g. average time for completion of a task to be no more than 10 minutes), for individuals (e.g. all users can complete the task within 10 minutes), or for a percentage of users (e.g. 90% of users are able to complete the task in 10 minutes).
B.4 Types of measures
Measures of usability should be based on data which reflect the results of users interacting with the product or work system. It is possible to gather data by objective means, such as the measurement of output, of speed of working or of the occurrence of particular events. Alternatively data may be gathered from the subjective responses of the users expressing feelings, beliefs, attitudes or preferences. Objective measures provide direct indications of effectiveness and efficiency while subjective measures can be linked directly with satisfaction.
It should be noted that it is possible to obtain data relating to each component of usability from objective or from subjective measures. For example, satisfaction can also be inferred from objective measures of the behaviour of the users, and estimates of effectiveness and efficiency can be derived from subjective opinions which the users express about their work and its outputs.
The validity of the data gathered to predict the level of usability achieved when a product is actually used will depend upon the extent to which the users, tasks and context of use are representative of the real situation and the nature of the measures chosen. At one extreme one may make measurements in the "field" using a real work situation as the basis for the evaluation of the usability of a product. At the other end of the continuum one may evaluate a particular aspect of the product in a "laboratory" setting in which those aspects of the context of use which are relevant are re-created in a representative and controlled way. The advantage of using the laboratory based approach is that it offers the opportunity to exercise greater control over the variables which are expected to have critical effects on the level of usability achieved, and more precise measurements can be made. The disadvantage is that the artificial nature of a laboratory environment can produce unrealistic results.
Evaluations may be conducted at different points along the continuum between the field and laboratory settings depending upon the issues which need to be investigated and the completeness of the product which is available for test. The choice of test environment and measures will depend upon the goals of the measurement activity and their relationship with the design cycle.
B.5 Measures of effectiveness and efficiency
B.5.1 Measuring effectiveness
Effectiveness is defined as the accuracy and completeness with which users achieve specified goals.
To measure accuracy and completeness it is necessary to produce an operational specification of the criteria for successful goal achievement. This can be expressed in terms of the quality and quantity of output, for example, the specification of a required format for output documents together with the number and length of documents to be processed.
Accuracy can be measured by the extent to which the quality of the output corresponds to the specified criteria, and completeness can be measured as the proportion of the target quantity which has been achieved.
If a single measure of effectiveness is required, it is possible to combine measures of accuracy and completeness. For example, completeness and accuracy may be calculated as percentages and multiplied together to give a percentage value for effectiveness [2,12]. In cases where it is not appropriate to trade accuracy off against completeness, the two measures should be considered independently.
B.5.2 Measuring efficiency
Efficiency is measured by relating the level of effectiveness achieved to the resources used. Temporal efficiency can be defined as the ratio between the measure of effectiveness in achieving a specified goal, and the time it takes to achieve that goal. This provides an absolute measure of temporal efficiency in a particular context. Similar calculations can be made with respect to efficiency in the use of mental or physical energy, materials or financial cost.
B.6 Measures of satisfaction
Satisfaction (defined as comfort and acceptability of use) is a subjective response of users to interaction with the product. Satisfaction can be assessed by subjective or objective measures. Objective measures may be based on observation of the behaviour of the user (e.g. body posture, body movement, frequency of absences) or can be based on monitoring the physiological responses of the user.
Subjective measures of satisfaction are produced by quantifying the strength of a user's subjectively expressed reactions, attitudes, or opinions. This process of quantification can be done in a number of ways, for example, by asking the user to give a number corresponding to the strength of their feeling at any particular moment, or by asking users to rank products in order of preference, or by using an attitude scale based on a questionnaire.
Attitude scales, when properly developed, have the advantage that they can be quick to use, have known reliabilities, and do not require special skills to apply. Attitude questionnaires which are developed using psychometric techniques will have known and quantifiable estimates of reliability and validity, and can be resistant to factors such as faking, positive or negative response bias, and social desirability. They also enable results to be compared with established norms for responses obtained in the past. QUIS [3,16] and SUMI [10,11] are examples of attitude questionnaires.
B.7 Measures of cognitive workload
Workload entails both physical and mental aspects of tasks. Physical workload is the load resulting from movements, walking, carrying, etc. and also arising from constrained postural positions. The activities that are of a mental nature (perception, information processing, thinking, etc.) contribute to 'cognitive workload'. The design of computer hardware should take into account physical demands caused by high rates of input and sustained periods of activity. Interactive dialogues result in cognitive demands which in some circumstances can have a significant effect on usability of software products.
It is possible to measure the cognitive workload on the user. Cognitive effort is one resource expended in relation to the accuracy and completeness with which users achieve goals and can therefore contribute to the measurement of efficiency. However, cognitive workload has certain special characteristics in that both under- and over-loading may result in lowered efficiency. A task demanding too little mental effort may result in a lowered efficiency because it leads to boredom and lack of vigilance, which directly lowers effectiveness. In such a case overall efficiency would be enhanced by increasing demand. Excessive cognitive workload may also result in lowered effectiveness, if it causes information to be missed and results in errors. This is a particularly important issue in situations where safety is critical, e.g. air traffic control and process control. Measures of cognitive workload can be used to predict these types of problems.
Two examples of questionnaires which have been validated for measuring mental effort and perceived work demands are the Subjective Mental Effort Questionnaire (SMEQ)  and the Task Load Index (TLX) . The SMEQ is a unidimensional, whereas the TLX distinguishes between six aspects of work demands.