Methodology for Usability Test of Personal Antiviruses (July 2012)

Introduction

Methodology and tools for this test were developed at the Department of Psychology of Taganrog Technological Institute of the South Federal University and information analytical center of SFU. Two Candidates of Psychology took part in their creation and the teachers and MS of this Department having two diplomas (in Technology and Psychology) or majoring in Engineering Psychology.

The third-year students of the Department of Psychology of the average age of 20 years old took part in the testing as users with 6 men and 6 women among them.

Four personal antiviruses of Internet Security class took part in the testing. They were the products of the companies that showed the best percent in the market by the results of Russian antivirus market analysis for 2010-2012. We also decided to add an Internet Security version of Avast that is one of the most popular antiviruses inRussia.

The versions with Russian interface took part in the testing to avoid the possible influence of the “language barrier” with the users that took part in the testing.

Table 1 represent the antivirus software that took part in the testing (the below mentioned products are up-to-date for the test beginning – June 04, 2012).

 

Table 1. Tested antivirus products and their versions

Product

Version

Avast! Internet Security 7 7.0.1426
Dr.Web Security Space 7 7.0.0.10140
Eset Smart Security 5 5.2.9.12
Kaspersky Internet Security 2012 12.0.0.374
Norton Internet Security 2012 19.07.2015

 

Usability factors

The performed analysis of a great number of used factors and metrics proved that we can use five factors for antivirus evaluation including the users’ operation speed (further on mentioned as the work speed), a number of errors, users’ learnability, satisfaction and visual attractiveness of the user’s interface (or technical aesthetics factor). These five factors measuring allow us to receive a complex usability assessment.

 

Diagram 1. Usability factor

 

 

Two methods are used for the factors assessment: usability testing and expert testing.

12 people took part in usability testing (by experts’ opinion a group of 5 to 12 people is enough to find about 90% of erroneous situations when working with applications). Five experts took part in the expert assessment including two Candidates of Psychology and three MS majoring at Engineering Psychology.

Let’s discuss the content of every evaluation parameter and the method of its processing.

1. Users’ operation speed

User’s common operation speed assessment can be performed with the help of expert assessment. The expert evaluation method based on the method of users’ operations modeling (GOMS, CPM, KLM, NGOMSL, CMN-GOMS, etc.) is the most accepted: here the application common operations are selected and then these operations algorithms are built and the average time for every algorithm is estimated. During this estimation, every algorithm is divided into a number of simple operations (pressing the button, moving the cursor on the screen, further operations analysis, etc.) with the time values received on a great number of users. The time for the following 10 operations was estimated for personal antiviruses:

  1. A separate directory scanning for malware samples.
  2. Full scanning launch for all the drives and areas.
  3. Scheduled scanning settings (12:00 every Thursday).
  4. Update launch.
  5. Restoring the file from Quarantine.
  6. Searching the help for the information about Quarantine.
  7. Viewing the last scanning and threats report.
  8. Network screen settings (add Adobe Reader to the trusted applications).
  9. Adding an application to the scanning exceptions.
  10. Setting the reaction type for the found threat – “Add to the Quarantine” or “Ask the user”.

2. Number of users’ errors

A number of errors is specified with the usability testing performed for a group of users. During this test, the users have to execute the same operations as they performed for the experts’ estimation of common actions time. These actions execution is videoed on the screen and then the operations execution algorithms are built. An operation non-execution and deviations in the user’s operation algorithms is considered an error by the expert. The testing result is a number of errors for all the actions on the whole and for every separate operation. The errors are then grouped according to their types (motor errors, misprints, application logic misunderstanding) or their appearing results (critical, uncritical).

3. Users’ learnability

The learnability relates to the number of quality educational means and integral and consistent application information model. The education means evaluation includes the software documentation quality and exhaustiveness (description of typical operations, settings, errors occurred, etc.), the presence of educational means in the user’s interface (search, context help) and additional support means (specialized forums, teaching materials at the companies’ official websites, etc.).

The information model is considered to be the software structure containing information about its condition and functioning. When estimating the information model we analyze the following factors:

  1. User interface, settings, documents and software reports structuring.
  2. Availability of information about users’ operation results, software reaction on the users’ operations and application condition in the user interface at every moment.
  3. Symmetry for the application elements performing common functions (like decision taking buttons or application windows switch buttons).
  4. No functions, settings, application windows and controls elements duplication in different application components.
  5. No non-functioning application windows, dead links and incorrect terminology.

For teaching materials and information model assessment, we use the questionnaires completed by expert after their work with the application.

4. Satisfaction

Satisfaction is estimated on the basis of users’ questioning after their work with the application.

The questioning is aimed at finding out how comfortable the users felt while working with the application. The questionnaire included by general questions (“Was the application easy to work with?”) and special questions aimed at discovering different application tools usability including antivirus components, help, reports, settings, etc.

The users were also asked some open questions (“Provide a free form description of the problems occurring while working with the application”, “What could you suggest to improve the application”).

5. Technical aesthetics

Technical aesthetics is estimated in a combined way – by an expert assessment and users’ questioning after their work with the application.

This factor consists of estimating the fonts, colors, animation, sound signals, pictograms, application elements and application elements grouping applied in the user interface. An expert assessment allows to find the mistakes related to the problems in visual and sound interface design and the users’ questioning allows to get information about the implemented solutions attractiveness, their intelligibility and usability. The criteria for the technical aesthetics expert assessment rely on the requirements of  usability GOSTs for user interface design and assessment. For example, the expert assessment of fonts readability relies on GOST R ISO 9355-2-2009 for the recognized screen resolutions. And the users’ technical aesthetics assessment is performed through the questions like “Were all the text boxes in user interface easy to read?”, “Were all the icons applied in the software intelligible?”, etc.

 

Testing conditions and procedure

A group of users and group of experts were formed before testing each product consisting of 2 stages:

  1. Usability testing;
  2. Expert testing.

To exclude random factors during usability testing and expert testing, the following requirements were observed: the testing is held at the same room, at the same time and on the similar PC. To reduce the social desirability effect during the test, all the instructions were presented in printing, the testing procedure was presented on the plasma display and the instructor just handed out the materials before the test beginning and specified the time for every stage.

Usability testing of every product was held in the group of users and took 100 minutes: the users got familiarized with the instructions and completed the questionnaires during 10 minutes; then they got familiarized with the product for 60 minutes and executed operations with the product for 30 minutes. After the testing has been finished, the users completed the questionnaires on their satisfaction after working with the product and the product visual attractiveness.

To avoid the grades overlapping when working with different products, only one product a day was tested.

The same experts provided independent evaluation for every product.

Every expert got familiarized with every product within three hours and then completed questionnaires to estimate the teaching materials, information model and technical aesthetics.

Three separate experts estimated the users’ operation speed simultaneously by GOMS methods and analyzed the videos of operations of all the users and calculated their errors.

 

Results processing and analysis

All the received data was normalized and brought to the range of 0 to 100%. Depending on the type of measured parameters, processing has its own peculiarities:

1. Common operations time measuring

The minimum time for every operation is summarized and taken for the work time with an “ideal product” (100%). Increasing of the general operation time of every product as compared to the time of the ideal product by 1 reduces its final grade by 0.5%.

2. Measuring the number of errors

Zero mistakes are taken for 100%. After that the number of errors for all the operations in the group of users is summarized. Every critical error reduces the final grade by 1%, and uncritical error – by 0.5%.

3. Learnability

We separately summarize the results for two questionnaires: for teaching materials assessment and application information model assessment. All the received data is averaged.

The questionnaires are processed this way: 100/n value is calculated for every requirement where n is the number of requirements in the questionnaire. The value of 100/(2*n) is calculated for partial meeting the requirements. Thus, the grade for every questionnaire is normalized in the range of 0 to 100.

4. Satisfaction

After processing every user’s questionnaire, we receive the value in the range of 0 to 50 (10 questions with 5 grades of answer in each). This value for every user is normalized and then the average grade is calculated for every users’ group; it is considered the final value for this factor.

5. Technical aesthetics.

The results for two questionnaires are calculated separately: for the users’ and the experts’ assessment of the user interface. The users’ questionnaire is processed by analogy with the satisfaction questionnaire and the experts’ questionnaire is processed by analogy with the learnability questionnaires.

The obtained data of the users’ and experts’ questionnaires are averaged.

After the data for every estimated rate has been processed, we get 5 normalized values. The final value is calculated as follows:

E:= ∑Ai*Bi, (1)

where E is an usability value;

Ai is a weight ratio for every assessed value;

Bi is a value for every assessed factor.

Generally, the weight ratios are assumed equaling 0.2. But many experts specify that two criteria are always the most important ones for different application types and all the rest are less important. The poll between the teaching staff and MS of the Department of Psychology of TTI SFU allowed arranging the importance of the applied values for personal antiviruses and highlighting the most differentiated weight ratios:

  • Operation speed — 0.15,
  • Number of errors — 0.15,
  • Learnability — 0.2,
  • Satisfaction — 0.25,
  • Technical aesthetics — 0.25.

Bibliography

  1. V.P. Zinchenko, V.M. Munipov. Basics of Economics. – Moscow: Logos, 2001.
  2. V.V. Golovach. User Unterface Design. 2001 – 141 p..
  3. GOST R IEC 60073-2000 Basic and safety principles for man-machine interfase, marking and identification. Coding principles for indication devices and actuators.
  4. GOST R IEC 9355-1-2009 Ergonomic requirements for the design of displays and control actuators. Part 1. Interactions with human.
  5. GOST R IEC 9355-2-2009 - Ergonomic requirements for the design of displays and control actuators. Part 2. Displays.
  6. GOST R IEC 9241-11-2010 Ergonomic requirements for office work with visual display terminals (VDTs). Part 11. Guidance on usability
  7. GOST R IEC 9241-110-2009 Ergonomics of human-system interaction. Part 110. Dialogue principle.
  8. GOST R IEC 14915-1-2010 Ergonomics of multimedia user interfaces. Part 1. Design principles and framework.
  9. GOST R IEC 10075-2-2009 Ergonomic principles of assuring of adequacy of mental workload. Part 2. Design principles.
  10. D. Raskin, Interface: New Directions for Designing Interactive Systems. — Translated from English. – St.-Petersburg: Symbol-Plus, 2004. – 272.
  11. I.A. Ponomaryov. Methods of User Interface Quality Assessment. http://it-claim.ru/Library/Books/ITS/wwwbook/ist6/ponomarev2/ponomarev2.htm
  12. T. Mandel. Interface Design. – Moscow: DMK-Press, 2005.
  13. Kieras D. A Guide to GOMS Model Usability Evaluation using GOMSL and GLEAN3 – University of Michigan (ftp.eecs.umich.edu/people/kieras), 2002.