Speech as Interface in Web Applications for Visually Challenged


Doctoral Thesis / Dissertation, 2013

124 Pages


Excerpt


TABLE OF CONTENTS

Certificate

Abstract

Acknowledgements

List of Tables

List of Figures

List of Symbols and Abbreviations

PART ONE (Introduction, Literature Survey, Problem Identification and Issues)

CHAPTER 1. INTRODUCTION

1.1. Major Objectives

1.2. Domain of Research

1.3. Research Scope

1.4. A Survey for assessment of User Requirements

1.5. Research Problem Statement

1.6. Proposed Solution

1.7. Organization of Thesis

CHAPTER 2. BACKGROUND WORK
2.1. Problems and Issues
2.1.1. Accessibility, Usability and Navigability
2.1.2. Keyboard based Accessibility
2.1.2.1. Windows Application Accessibility Vs Web Accessibility
2.1.2.2. Keys used in accessibility by visually impaired
2.1.2.3. Key Shortcuts based accessibility
2.2. Role of Web Developers
2.3. Role of Assistive Tools
2.4. Role of Visually Challenged Users
2.4.1. Various Strategies to Design Web Accessibility Tools
2.4.2. Universal Vs. Local Installation
2.5. W3C Recommendations on Accessibility
2.5.1. W3C and WAI
2.5.2. WAI Specifications
2.5.3. Web Content Accessibility
2.5.4. Authoring Tool Accessibility
2.5.5. User Agent Accessibility
2.5.6. Accessible Rich Internet Contents (ARIA) Suite
2.5.7. Challenges posed by Web 2.0
2.6. Existing Systems and Related Work
2.7. Text to Speech (TTS) on Web

PART TWO (Problem Solutions, Approaches and Methodologies)

CHAPTER 3. WACTA, THE SPEECH BASED WEB BROWSER 21 - 44
3.1. Introduction
3.2. System Design
3.2.1. Conceptual Architecture
3.2.2. Link Navigation Mode
3.2.3. Navigate-All Mode
3.2.4. Newsreader Mode
3.2.5. Page Analytic Mode
3.2.6. Query Mode
3.2.7. Text Glimpses through Mouse Mode
3.2.8. Switch Over between two Modes
3.2.9. Key Shortcuts
3.2.10. Informed Search
3.3. System Development and Implementation
3.3.1. Platform and Language
3.3.2. .NET Framework
3.3.3. The WebBrowser Class
3.3.4. Web Page Analytics
3.3.5. HtmlDocument Class
3.3.6. HtmlElement Class
3.3.7. HtmlElementCollection Class
3.3.8. SpeechSynthesizer Class
3.3.9. Newsreader Style Navigation
3.3.10. Tab Based Navigation
3.3.11. Up/Down Key Based Navigation
3.3.12. Query within Site
3.3.13. Analytical Mode
3.3.14. Text Glimpse Mode
3.3.15. Narrator Voice Management
3.3.16. Speech Feedback For User Input
3.3.17. Commands and Key Shortcuts in WACTA
3.4. User Evaluation of WACTA Web Browser
3.5. concluSIONS

CHAPTER 4. DIRECT SPEECH-ENABLING THE PUBLIC UTILITY WEBSITES 43- 55
4.1. Introduction
4.2. System Design and Architecture
4.2.1. Design Goals and Decisions
4.2.2. The Conceptual Architecture
4.3. Speech enabling public utility websites: Case study of Indian Railways Website
4.3.1. Navigation related issues
4.3.2. Identify the Key functionalities of the website
4.3.3. Speech enabling the Book a Berth functionality
4.4. System Assessment
4.4.1. Results and Discussion
4.4.2. Conclusions

CHAPTER 5. VOICEXML /VOIP BASED CLIENT INTERFACE FOR INTERACTIVE BROWSING 56- 67
5.1. Introduction
5.2. Existing Systems
5.2.1. VoiceXML
5.2.2. GUIs, WUIs, VUIs
5.2.3. W3C recommended Speech Interface Framework
5.2.4. Other Framework for speech Interface
5.3. Design of SICE Framework
5.3.1. Enhanced Functionality
5.3.2. Platform and Language
5.3.3. System Architecture
5.3.3.1. HTTP Event Handler
5.3.3.2. VoIP Event Handler
5.3.3.3. Keystroke Handler
5.3.3.4. Speech Recognizer
5.3.3.5. Speech Synthesizer
5.3.3.6. Content Summarizer
5.3.3.7. Interface
5.3.3.8. Encrpter-Decrypter
5.4. Speech based web applications development using SICE Framework
5.4.1. As a Result Checker
5.5. Conclusions

CHAPTER 6. PERFORMANCE EVALUATION OF INTERNET ASSISTIVE TOOLS 68 - 75
6.1. Defining the Framework
6.1.1. Identification of Performance Attributes
6.1.2. Identification of Usage Properties
6.1.3. Identification of Usage Metrics
6.1.4. Ground Case Values of Usage Metrics
6.1.5. Mapping Usage Properties to Performance Attributes
6.1.6. Formulation for Performance Attributes
6.1.7. Formulation for Overall Performance Measure of Assistive Tool
6.1.8. Using the Framework in Performance Evaluation
6.2. conclusions

PART THREE (Results and Findings, Discussion, Conclusions and Directions for Future Research)

CHAPTER 7. RESULTS AND DISCUSSION 76 -78
7.1. introduction
7.2. evaluation methodology
7.3. evaluation results
7.4. conclusions

CHAPTER 8. CONCLUSIONS AND FUTURE DIRECTIONS 79 – 80 8.1 introduction and work summary
8.2 contributions of the research work
8.3 future directions

References

Appendix- I: WACTA WEB BROWSER source code

bibliography

List of Publications (Out of Present Research Work)

Curriculum Vitae

LIST OF TABLES

Table 2.1 Accessibility, Usability and Navigability from the three perspectives

Table 3.1 The WebBrowser Class

Table 3.2 The HtmlDocument Class

Table 3.3 The HtmlElement Class

Table 3.4 The HtmlElementCollection Class

Table 3.5 The SpeechSynthesizer class

Table 3.6 Newsreader style Navigation

Table 3.7 Tab based Navigation

Table 3.8 Up/ Down key based Navigation

Table 3.9 Query mode of WACTA

Table 3.10 Analytic mode of WACTA

Table 3.11 Text Glimpse Mode of WACTA

Table 3.12 Narrator Voice Management in WACTA

Table 3.13 Speech Feedback for User Inputs

Table 3.14 Lists of Keyboard Commands in WACTA

Table 6.1 Performance Attribute Definitions

Table 6.2 Usage Property Definitions

Table 6.3 Usage Metrics Descriptions

Table 6.4 Mapping of Usage Properties to Performance Attributes

Table 6.5 Formulation for the Performance Attributes

Table 6.6 Equation for Overall Performance Measure

Table 7.1 Test cases to evaluate the WACTA Web browser

Table 7.2 User Evaluation Analysis

LIST OF FIGURES

Fig. 2.1 The three aspects of Accessibility, Usability and Navigability

Fig. 3.1 Schematic diagram of the WACTA Web Browser

Fig. 3.2 Use-Case Diagram of WACTA Web Browser

Fig. 3.3 DOM Structure of a Web Page

Fig. 3.4 The WACTA Browser

Fig. 4.1 Interactions during the website creation

Fig. 4.2 User Initiation

Fig. 4.3 Interactions among User, Host Server and Speech Server

Fig. 4.4 Home Page of www.irctc.co.in

Fig. 4.5 Plan my Travel Page of www.irctc.co.in

Fig. 4.6 List of Trains Page of www.irctc.co.in

Fig. 4.7 Berth Availability Page of www.irctc.co.in

Fig. 4.8 Passenger Details Page of www.irctc.co.in

Fig. 4.9 Ticket Details Page of www.irctc.co.in

Fig. 4.10 Make Payment Page of www.irctc.co.in

Fig. 5.1 Schematic Diagram

Fig. 5.2 Structure of SICE Framework

Fig. 5.3 Summarization Algorithm Process

Fig. 5.4 Requested Page

Fig. 5.5 Source Code for Requested Page

Fig. 5.6 VoiceXML File of Requested Page

Fig. 5.7 Final Page after providing the required input

Fig. 6.1 Schematic Hierarchical Structure

CERTIFICATE

Certified that Mr. Prabhat Verma (Enrollment No. 09001001013 ) has carried out the research work presented in this thesis entitled "Speech as Interface in Web Applications for Visually Challenged" for the award of Doctor of Philosophy from Uttarakhand Technical University, Dehradun under my supervision. The thesis embodies results of original work, and studies are carried out by the student himself and the contents of the thesis do not form the basis for the award of any other degree to the candidate or to anybody else from this or any other University/Institution.

(Dr. Raghuraj Singh)

Professor

Harcourt Butler Technological Institute,

Kanpur (Uttar Pradesh). India.

Date:

Abstract

This research work addresses to some of the important issues related to web accessibility in context of visually challenged users. Accessibility refers to the ability of a user, despite disabilities or impairments, to use a resource. For Internet-based applications, accessibility means that all users can perceive, understand, navigate, interact with, and contribute to the Web. Speech is a convenient medium of interaction for visually challenged users, Internet accessibility for them is made possible by providing an alternative speech-based interface for human-computer interaction.

Problems associated with speech based web interfaces are manifold. Most of the web content available today has been designed for the visual interface via graphical browsers. Sighted individuals can quickly locate the information that is most relevant to them. Visual layout of the webpage also helps a lot in efficient browsing of the webpage. But, this task can be time consuming and extremely difficult for people with visual disabilities. Speech based browsers are generally sequential in processing. Thus, a visually challenged user may require ‘listening’ the whole page content in order to reach his/her topic of interest. Also, they are not able to get any layout information of the web page.

Assistive Technologies (ATs) like screen readers make the use of underlying DOM (Document Object Model) structure of the web page to narrate its contents to a visually challenged user. To ensure that ATs work correctly on a webpage, web developers must follow the W3C and other guidelines while creating the websites. Unfortunately, due to lack of awareness among web developers, this requirement is not adequately met and as a result, a large amount of web content remains inaccessible to ATs and visually challenged users. Web 2.0 has further increased this trend by empowering the end user with web authoring capabilities. The role of ATs is thus to expose such inaccessible webpage contents using some clever techniques and present them before the visually challenged user.

Despite their shortcomings, Screen Readers have been the primary tool for using internet by visually challenged. Unfortunately, most of the popular and workable screen readers are proprietary and bear a heavy price tag. For example, cost of JAWS, a popular screen reader by Freedom Scientific for single user license is around $900. This cost is 10 times higher than that of Windows7 operating system! The cost is evidently too high to be afforded by an average Indian individual with visual disability. High cost is also attributed by small product market for assistive tools. There are ATs in freeware domain but they are not popular since most of them may not provide adequate functionalities.

ATs for web access pose greater problems as compare to ATs for Desktop Applications. Microsoft Narrator integrated in Windows 7 Operating System works only for Desktop Applications and not for web access. Findings of a study conducted by Enabling Dimensions, January 2002, New Delhi, reveal that accessing web content was “frustratingly difficult” for visually challenged, implying the need for availability of more accessible and usable web content as well as better software to use the Web effectively by them. Thus, design and develop of usable as well as affordable assistive tools for visually challenged users is an important research issue.

In confirmation to the “Research by Application Development” philosophy adopted and as a part of this research work, an enhanced speech based web browser named ‘ WACTA’ has been designed and developed with a vision of providing an improved yet affordable and easy to use web browser for visually challenged users. The system has been implemented in programming language C# and .NET 4.0 framework. Microsoft Speech API (SAPI 5.0) has been used for narration of the text and user input feedback.

WACTA web Browser has several unique features that distinct it from other screen readers. First, the speech based web browser has been implemented completely using .NET managed code. So far, screen readers have been implemented using unmanaged code (Ihtml interface). Implementation in managed code offers better reliability, automatic garbage collection, bounds checking at run-time, improved security etc. Due to limited available functionalities, coding in C# managed code poses greater challenges in ensuring the requisite behavior of ATs. Second, speech features have been integrated in the WACTA browser itself which is implemented using WebBrowser class. Thus, the web browser can be customized for visually challenged users and new functionalities can be added later. It can be used in many modes depending upon the user need and user disability level. Thus, there are link navigation mode, navigate-all mode, interactive mode, newsreader mode, query mode, analytical mode and mouse Glimpse mode (for partially blind user) which can be chosen from among menu or using the key shortcuts. At any time, visually challenged user can make switch over from one mode to another. It aids to minimizing the time required to complete a given task. WACTA complies with the W3C’s User Agent Accessibility Guidelines (UAAG). The prototype is already tested for 25 skilled blind users and they have found it user friendly as it fulfills their basic requirements to access the internet effectively for both routine and important tasks.

There are important tasks like getting a berth booked on Railways Reservation Website, paying taxes at Income tax website etc. which are complex in nature and have span about multiple web pages. Such task may not be conveniently performed by a visually challenged user with the aid of assistive technologies like screen readers. Besides, the assistive tools may not be installed on a public terminal. This makes a disadvantage for visually challenged users since the web based facility to utilize these public utilities is even more needed for them than their visually able counterparts. Such important public utilities should be universally accessible without the need to install any assistive tool on the computer used. At present, fetching mp3 on a remote web service is the only standard way for converting text to speech. APIs used for this purpose are proprietary and provide text to speech services. In this context, we have proposed two frameworks using which, owners of the public utility websites may directly speech enables their existing websites for more important functionalities at a minimum cost and effort required. The first framework, Speech-Enabler makes use of existing technologies like JavaScript and speech API, therefore provides robust and lightweight solution to the accessibility problem. The other framework, SICE Framework is based on VoiceXML and Voice-Over Internet Protocol (VOIP). It can be used for developing web based interactive voice applications without the need of telephony. Thus, complex web data can be conveniently handled using customized two way dialogue based access system in a controlled way.

To the best of our knowledge, at present no formal framework is available to evaluate the performance of assistive tools in quantitative terms that are used by visually challenged users to use internet. This research work takes some initiative in this direction by formulating a hierarchical model for quantitative evaluation of assistive tools for blinds. Identifying various Performance Attributes and Usage Metrics, we have established relationship among these to obtain the Overall Performance Index of the assistive tools.

This research work has strengthened our belief that knowledge and technology should be used in favor of mankind to every possible extent. Internet is a wonderful tool having ability to compensate the visual impairment with technology. Design and development of powerful yet affordable speech-based interfaces would be certainly helpful in enhancing the overall Quality of Life of visually challenged. The work done by us shall be further taken up in future e.g. enhancing the features and capabilities of WACTA web browser, its design and development for Android based Tablet Computers as well as extending it for Hindi scripted websites.

ACKNOWLEDGEMENT

I would like to express my deepest gratitude to my Supervisor, Prof. Raghuraj Singh for his guidance and support throughout this research work. His single mindedness, dedication and enthusiasm towards research have constantly inspired me. It has been an honor working with him.

I am thankful to Uttarakhand Technical University, Dehradun for providing me the opportunity and approval to do this research work.

This research is a part of the Major Research Project entitled “Design and Development of Web Browser for Visually Challenged” funded by the University Grants Commission, New Delhi running in Computer Science & Engineering Department of Harcourt Butler Technological Institute, Kanpur during the year 2009 - 2012. I would like to thank the University Grant Commission, New Delhi for extending all the support to realize this work.

I am thankful to Harcourt Butler Technological Institute, Kanpur for providing necessary facilities to complete this research work.

I am thankful to Adult Training Centre, National Institute of Visually Handicap, Dehradun for providing their help and support for the research work.

I am greatly indebted to Prof. Padam Kumar, Professor and Head, Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee for his valuable suggestions regarding the principles of good research work.

I am thankful to Prof. Hema A Murthi, Indian Institute of Technology, Chennai for her valuable suggestions regarding the scope of this research work at the very beginning.

Finally, I am thankful to my family members, friends, and colleagues for supporting me directly or indirectly in this work.

(Prabhat Verma)

Chapter 1 Introduction

One of the original goals of the proponents of internet was to provide equal access to information for all irrespective of their disabilities. The idea was to represent documents on different platforms and different user interfaces including text-based and auditory interfaces in a single computer network. It was then planned to convert each document into Braille [1]. After a success story of more than two decades of Internet, this goal is still only partially met. Findings of a study conducted by Enabling Dimensions, January 2002, New Delhi, reveal that accessing web content was “frustratingly difficult” for visually challenged, implying the need for availability of more accessible and usable web content as well as better software to use the Web effectively by them[2]. Internet technology can become a very effective tool for visually challenged in compensating their disability by empowering them with the knowledge and information of choices available as regards employment, independent living etc [3].

The web has become an indispensable source of information and we use it for performing routine tasks as well. The primary mode of interaction with the web is via graphical browsers, which are designed for visual interaction. As we browse the Web, we have to filter through a lot of irrelevant data. Sighted individuals can process visual data in no time at all. They can quickly locate the information that is most relevant to them. Visual layout of the webpage also helps a lot in efficient browsing of the webpage. But, this task can be time consuming and extremely difficult for people with visual disabilities. They are not able to get any layout information of the web page. Speech based browsers are generally sequential in processing. Therefore, clever techniques must be applied for presenting the items available on the website as per the need of the user.

Despite their shortcomings, Screen Readers have been the primary tool for using internet by visually challenged. Costs of major commercial Screen Readers are not trivial at present which is clear from the following data [4]:

- JAWS (Standard) Single User License: $895.
- Window-Eyes Single User License: $895.
- HAL Single User License: $795.
- System Access Single User License: $399.

The cost is evidently too high to be afforded by an average Indian individual with visual disability. This cost is 10 times higher than that of Windows7 Operating System! High cost is also attributed by small product market for assistive tools. Thus, there is a need to design and develop usable as well as affordable assistive tools for visually challenged users.

1.1 Major Objectives

This research work aims to bring new insights in the broad area of Human-Computer Interaction from the view point of visually challenged people. Major objectives of this research are as given below:

- To study and analyze the existing systems of Web browsing for visually challenged and to identify enhancement possibilities.
- To devise the most plausible way of text surfing, searching, querying, Information/data extraction, Form–Filling, mailing, blogging etc. in view of the constraints of visually challenged Users.
- To design and develop the Speech based Browsing System for above usages.
- To formulate a framework for the Performance Evaluation of Speech based Browsing Systems.

1.2 Domain of the Research

This research work encompasses the following domain specific tasks or subtasks related to human computer interaction:

i. Design and development of speech based improved Interfaces,
ii. Web Content Analysis and Management,
iii. Speech based Interactions,
iv. Keyboard based Accessibility.

It must be emphasized that this work makes the use of existing speech technologies only and the domain of speech processing, speech analysis or signal processing is beyond the scope of this research.

1.3 Research Scope

Some of the identified research issues in this domain are as follows:
i. Finding Scope for enhancement of Accessibility and usability of inaccessible web content using affordable Assistive Technology (AT).
ii. Designing the framework for enhanced Speech based Web Browser/Screen Reader,
iii. Devising better approaches to Intra/Inter Web page Link-Navigation,
iv. Need for direct speech enabled Public Utility Websites for visually challenged users.
v. Performance Evaluation criteria for Assistive Technology.

1.4 A Survey for assessment of User Requirements

A Successful design requires an understanding of the target user groups and their goals, requirements, preferences, difficulties with existing systems etc. To get the status and exact requirement of web accessibility & usability among visually disabled, we made a survey with a group of 50 visually disabled at Adult Training Centre, National Institute of Visually Handicap, Dehradun, India. The participants were comfortable in using normal keyboards for providing inputs to the computer. Screen Reader, JAWS was being used by them for web browsing and email. They admired JAWS and admitted that they were able to use internet only because of this software. However, they also told that sometimes they were not able to access all the components of a webpage using JAWS. Most often, they were not able to find where to click on the webpage and as a result not able to proceed further. Thus using JAWS they were able to perform simple tasks e.g. news paper reading, general surfing, knowledge gathering, simple query etc. but they were not comfortable in performing complex tasks involving multiple form filling. Besides, JAWS is not a freeware and its cost is too high to be afforded by an average Indian individual. Thus, the web usage of the participants was limited to the institute laboratory only.

Web usage by a visually disabled user may be categorized into simple, intermediate and complex. A usage is simple if a visually disabled person browses for some news article, e-book or collects information on some topic. Screen Readers may serve well for all such simple usages. Tasks like sending or receiving e-mails, performing simple queries like finding examination result of a student by entering his/her roll number may be considered as of intermediate complexity. Tasks like getting a travel ticket reserved or online shopping are of complex category because they require multiple form filling that may spread across several web pages in a complex structure. Each of the above categories of usage pose different types of issues and challenges for visually disabled, e.g. in the simple category, an intelligent text summarization technique may be helpful to avoid reading from first to last word.

1.5 Research Problem Statement

“To address the issues and challenges related to Web Accessibility in context of Visually Challenged Internet Users and to devise the most plausible way of using Internet for their important and routine tasks which could empower them to live their life independently with dignity.”

1.6 Proposed Solution

The thesis addresses the core research problem of design, development, implementation and evaluation of an enhanced speech based web browser for visually challenged users. The solution makes the use of keyboard based accessibility through narration of the webpage in a controlled way and by providing speech feedback for user input. Speech features have been integrated in a dedicated web browser itself. Various modes of usages e.g. navigation, interaction, user input, reading, query etc. have been provided with seamless switch over from one mode to another. This makes it possible to quickly locate an element or perform a desired task on web page. The System is implemented using C# and .NET managed code which ensures the requisite robust and error-free operations. WebBrowser class provides skeleton for our web browser. Microsoft Speech API (SAPI 5.1) has been used for speech synthesis. The solution demonstrates the best practice of application design, development and Software Engineering.

On the periphery, we address more general issues related to web accessibility for Visually Challenged users. There are important tasks like getting a berth booked on Railways Reservation Website, paying taxes at Income tax website etc. which are complex in nature and have span about multiple web pages. Such task may not be conveniently performed by a visually challenged user using assistive technologies like screen readers. Besides, the assistive tools may not be installed on a public terminal. This makes a disadvantage for visually challenged users since the web based facility to utilize these public utilities is even more needed for them than their visually able counterparts. Such important public utilities should be universally accessible without the need to install any assistive tool on the computer used. In this context, we have provided two frameworks using which, owners of the public utility websites may directly speech enables their existing websites for more important functionalities at a minimum cost and effort required. The first framework, Speech-Enabler makes use of existing technologies like JavaScript and speech API, therefore provides robust and lightweight solution to the accessibility problem. The other framework, SICE Framework is based on VoiceXML and Voice-Over Internet Protocol (VOIP). It can be used for developing web based interactive voice applications without the need of telephony. Thus, complex web data can be conveniently handled using customized two way dialogue based access system in a controlled way. Similarly, we have taken some initiative in quantitative evaluation of assistive tools for blinds by formulating a hierarchical model. Identifying various Performance Attributes and Usage Metrics, we have established relationship among these to obtain the Overall Performance Index of the assistive tools.

1.7 Organization of Thesis

This thesis work addresses to some of the important issues related to web accessibility in context of visually challenged users. The research first aims to design develop and implement an improved yet affordable speech based web browser for visually challenged users. The second objective is to study the problems and issues of accessibility related to visually challenged users which are of more general nature and need global attention. Our emphasis on direct speech enabling public utility sites for their more important functionalities is one such issue which needs immediate attention of big public website owners.

The thesis is divided into three parts and eight chapters:

PART 1 (Introduction, Literature Survey, Problem Identification and Issues)

Chapter 1: Introduction

This chapter consists of Research Problem Statement, Research Objectives and Research Scope and refinement in the area chosen. The proposed solution is discussed.

Chapter 2: Background Work

This chapter explores the issues related to accessibility, usability and navigability. Keyboard based accessibility for visually challenged is discussed. Applicability of W3C Recommendations on Accessibility and challenges posed by Web 2.0 are also highlighted. A brief survey of existing tools for browsing by the visually challenged like Screen Readers, Transcoders, Web based Interactive Systems, IVR based interactive systems, TTS on web like WebAnyWhere etc. is made. Advantages and weaknesses of these systems along with their critical evaluation are given.

PART 2 (Problem Solutions, Approaches and Methodologies)

Chapter 3: WACTA, the Speech based Web Browser

This chapter describes the design, development, implementation and evaluation of an enhanced speech based web browser for visually challenged users. Architecture, working methodology and unique features of WACTA web browser are discussed.

Chapter 4: Direct Speech-enabling the Public Utility Websites

In this chapter, we have provided a frameworks using which, owners of the public utility websites may directly speech enables their existing websites for more important functionalities at a minimum cost and effort required. The framework, makes use of existing technologies like JavaScript and speech API, therefore provides robust and lightweight solution to the accessibility problem.

Chapter 5: VoiceXML /VOIP based Client Interface for interactive browsing

In this chapter, a framework based on VoiceXML and Voice-Over Internet Protocol (VOIP) has been presented which can be used for developing web based interactive voice applications without the need of telephony.

Chapter 6: Performance Evaluation of Internet Assistive Tools

This chapter discusses problems and issues involved in performance evaluation of assistive tools for visually challenged users. A hierarchical model for performance measure has been presented by us.

PART 3 (Results and Findings, Discussion, Conclusions and Directions for Future Research)

Chapter 7: Results and Discussion

This chapter deals with the important results and findings of the research work along with the performance evaluation of the developed systems. The suitability of the frameworks is also evaluated on the basis of their applicability for different types of tasks.

Chapter 8: ConclusionS and Future Directions

This chapter summarizes the whole work, its impact in academia & Industry and also indicates the possible scopes for future work.

Chapter 2 Background Work

Speech is a convenient medium of interaction for visually challenged users. Internet accessibility is made possible for them by providing an alternative speech-based interface for human-computer interaction. Visually challenged users generally have no difficulty in using ordinary keyboards. Thus, command based user input using speech recognition is generally not required by the visually challenged users since its incorporation may adversely affect reliability and usability of the browsing system.

2.1 Problems and Issues

2.1.1 Accessibility, Usability and Navigability

Accessibility, Usability and Navigability are the terms that creates lot confusion and are frequently encountered in related literature. Therefore, it is important to unambiguously define them before proceeding further. Each is a tripartite [5] as it relates the three aspects: Web Page(s), Assistive tool and Blind User. This fact is depicted in Figure 1. These properties are defined in Table 1 from the aspect of each player concerned.

Accessibility refers to the ability of a user, despite disabilities or impairments, to use a resource. For Internet-based applications, accessibility means that all users can perceive, understand, navigate, interact with, and contribute to the Web [6]. Visually challenged users most often use the Internet for sending E-mails, looking for some specific information like reservation inquiry or examination result inquiry, mailing, Web based learning, accessing news sites or for chat. They may also wish to make some transaction on internet like purchase of an item or e-ticket. But due to various reasons inherent to the web pages as well as tools used to access these pages, they are not able to perform these tasks independently. As a result, they may require performing uncomfortable travels. The concept of e-Learning is well perceived by the visually challenged since it helps them in overcoming the basic problems of commuting by bringing the class room to their home: thereby circumventing any bias that human instructors or fellow students might have. Audio-only e-Learning through Internet can be much helpful for them but realizing such a system has major challenges for describing visual elements like photographs, graphics, diagrams and charts.

The first role to ensure accessibility, usability and navigability in webpage(s) lies with Web Authors/Web Developers who are expected to follow the accessibility guidelines during website creation. Assistive tools can, most often function correctly if the webpage is compliant to the accessibility guideline. Unfortunately, this is not always the case since a large percentage of web pages have inaccessible contents. Thus, the role of assistive tools becomes important in terms of enhancing accessibility, usability and navigability of the webpage(s) so that blind users may be able to access and use them. However, they are required to become proficient in using the assistive tool as well as to find out tricks and ways to use the web against all odds. Thus, they also have to play a role to efficiently use the assistive tool.

illustration not visible in this excerpt

Fig. 2.1: The three aspects of Accessibility, Usability and Navigability

illustration not visible in this excerpt

Table 2.1: Accessibility, Usability and Navigability defined from the three perspectives

2.1.2 Keyboard based Accessibility

2.1.2.1 Windows Application Accessibility Vs Web Page Accessibility

The key difference between developing accessible desktop applications and developing accessible Web applications is that, while the Microsoft Win32 application program interface allows the writing of accessibility members on arbitrary controls, HTML is read-only for roles and events. A browser maps HTML tags to MSAA values, and the developer has no direct access to them.

It is strange that complete suite is available in Microsoft technologies for developing accessible desktop applications (Microsoft Active Accessibility/UI Automation) whereas there is no suite for web page accessibility. In the Operating System Windows 7, a narrator has been provided for Desktop Controls accessibility which does not work on web pages.

As far as the Web accessibility is concerned, a lot of literature is available on how to design or create accessible web contents but there is a lack of literature on know-how of Accessible Technologies for Web.

2.1.2.2 Keys used in accessibility by visually challenged

To browse web pages effectively, navigation keys are to be used by the visually challenged users. The navigation keys are: TAB, SHIFT+TAB, CTRL+TAB, CTRL+SHIFT+TAB, UPARROW, DOWNARROW, LEFTARROW and RIGHTARROW. Besides, PGUP, PGDOWN, HOME, CONTROL and ALT keys are also used to control the narration of the web page.

2.1.2.3 Key Shortcuts based accessibility

Blind users seldom use mouse since it requires coordination of eyes and position of hand. It is convenient for them to use keyboard shortcuts for inputting commands in GUI environment. Therefore, screen readers provide a set of keyboard shortcuts to use their various functionalities by visually challenged users. They require learning these key shortcut based commands to use the web effectively. Therefore, it is desirable that all the screen readers follow a standard notation for keyboard shortcut based commands meant for accessibility.

2.1.3 Role of Web Developers

Assistive Technologies (ATs) like screen readers make the use of underlying DOM (Document Object Model) structure to narrate the webpage elements to a visually challenged user. To ensure that ATs work correctly on a webpage, web developers must follow the W3C and other guidelines while creating the websites. Unfortunately, due to lack of awareness among web developers, this requirement is not adequately met and as a result, a large amount of web content remains inaccessible to ATs and visually challenged users. An important checkpoint that a web developer should perform before launching of the website is to navigate through the links and form controls on a page using the keyboard only (for example, using the "Tab" key). Making sure that all links and form controls can be accessed without using the mouse, and that the links clearly indicate what they lead to. Overall Simplicity of the web page is a pre-requisite for ensuring the accessibility of a web page [7].

2.1.4 Role of Assistive Tools

Various assistive tools for using web by blind users have been designed using approaches like context based approach, semantic approach, annotation based approach, text summarization, etc. These assistive tools try to enhance the power of blind user by performing one or more of the following changes:

(A) Provide TTS (Text to speech) service i.e. speaking out the content of web page and giving speech feedback to user input by echoing the character typed. (Basic Service).
(B) Make the search informed using some heuristics, thereby reducing the time taken to search some information on web page.
(C) Providing better control over web page element by the means of shortcut keys,
(D) Take to some otherwise inaccessible content.
(E) Take to some otherwise unreachable link / form element
(F) Reduce the no of links (Performances) required to traverse to reach to some element on web page.
(G) Simplify the webpage both in structure and content.
(H) Providing a better understanding of web page layout / structure.
(I) Providing a better understanding of images by the means of reading out their ALT text.
(J) Providing a better understanding of visual diagrams by interpreting them.

2.1.5 Role of Visually Challenged Users

Screen readers are sophisticated programs with considerable functionality – an expert user, whether sighted or blind, is needed to use these programs effectively. As compare to visual User Interfaces, that are self explanatory to the sighted users, screen readers may use several keyboard shortcuts to access the web effectively. The usage of these shortcuts is to be learnt by visually challenged users. There may be several modes in which visually challenged user may use screen reader depending on the need for access. Each mode may be activated by a keyboard shortcut to be pressed by user. Thus, proper training and learning of visually challenged users is also a key to effective use of assistive technology.

2.1.6 Various strategies to design the Web Accessibility tools

Several strategies are used to address the issues related to speech based web access for blind users. The first strategy employs a client based assistive tool (e.g. screen reader) to speak out the web content in some desired order. Generally, such tools are required to be installed locally on user computer. Second strategy makes the use of a proxy server or client based transcoder [8] that renders the web content after converting it to a more accessible form. Another strategy used is to speech enable a website directly by the web author thus requiring no assistive tool on part of blind user. None of these strategies provide perfect solution for the problem and each may have its own merit and drawback. Usability of the screen readers is mainly constrained by the complex structure/ poor accessibility of web pages. The transcoder based access services may not be applied for secure sites as they do not permit to access or modify its code by a third party. Direct Speech enabling a site may be difficult to maintain.

2.1.7 Universal Vs. Local Installation

An assistive tool may require to be locally installed on user machine or it may be provided online as a web service. The first approach constraints the use of the assistive tool by its availability in installed form whereas in the second approach local installation is not required. Thus, user can access the web using any public terminal.

2.2 W3C Recommendations on Accessibility

The concrete definition of Web accessibility is established by the Web accessibility standards. The most significant Web accessibility standards are developed and published by the same organization that standardizes the pivotal Web technologies in the first place; the World Wide Web Consortium. In addition to the education and outreach value for the community, this ensures that accessibility features are considered in the standardization process of the core Web technologies.

2.2.1 W3C and WAI

The World Wide Web Consortium (W3C) is an international community that develops standards to ensure the long-term growth of the Web. The W3C mission is to lead the Web to its full potential. Currently, the main W3C activities include Web design and applications, Web architecture, Semantic Web, Extensible Markup Language (XML) technologies, Web of services, Web of devices, and browsers and authoring tools. The W3C vision is the One Web; the Web for all and the Web on everything.

Besides technologies, W3C develops also guidelines for their usage. Following the Web development and the interests of its membership, the W3C is also continuously exploring new prominent areas for Web standardization, e.g. in the form of joint events and W3Cworkshops.

In 1997, W3C launched the Web Accessibility Initiative (WAI) that falls into the Web design and applications domain. In brief, WAI works with organizations around the world to develop strategies, guidelines, and resources to help make the Web accessible to people with disabilities. WAI activities follow the W3C Process which explains the rigorous standardization process in detail World Wide Web Consortium Process Document [9].

2.2.2 WAI Specifications

To date, the WAI has published standard guidelines and specifications related to the following aspects of Web accessibility:

1. Web Content Accessibility Guidelines (WCAG)
2. Authoring Tool Accessibility Guidelines (ATAG)
3. User Agent Accessibility Guidelines (UAAG)
4. Evaluation and Report Language (EARL)
5. Accessible Rich Internet Applications (WAI-ARIA)

These consider representing Web content (e.g. HTML pages), authoring tools (e.g. functions supporting accessibility evaluation during authoring), the user agent aspect (e.g. ability to use keyboard for input and to pause dynamic content), publishing evaluation reports (e.g. reporting suggestions how to improve the accessibility of a certain application), and dynamic Web content (e.g. scripting).

Besides the standard guidelines, WAI produces also other significant accessibility resources. In particular, abstract guidelines and technique-specific or explanatory considerations are presented in separate specifications, complemented with informative resources.

2.2.3 Web Content Accessibility

Perhaps the most significant Web Accessibility standard is established by the Web Content Accessibility Guidelines (WCAG) [10]. The current stable version, published in 2008, is the WCAG 2.0.

WCAG 2.0 standard considers the accessibility of Web content. The content accessibility guidelines specification is organized around four abstract principles: Accessible Web content is perceivable, operable, understandable, and robust. Each principle is explained with one or more intuitively understandable design goals, asserted as guidelines.

WCAG 2.0 asserts total 12 guidelines:

1. Perceivable

1. Provide text alternatives for any non-text content so that it can be changed into other forms people need, such as large print, braille, speech, symbols or simpler language.

2. Provide alternatives for time-based media.

3. Create content that can be presented in different ways (for example simpler layout) without losing information or structure.

4. Make it easier for users to see and hear content including separating foreground from background.

2. Operable

1. Make all functionality available from a keyboard.

2. Provide users enough time to read and use content.

3. Do not design content in a way that is known to cause seizures.

4. Provide ways to help users navigate, find content, and determine where they are.

3. Understandable

1. Make text content readable and understandable.

2. Make Web pages appear and operate in predictable ways.

3. Help users avoid and correct mistakes.

4. Robust

1. Maximize compatibility with current and future user agents, including assistive technologies.

For purposes of conformance evaluation, each guideline is associated success criteria. Three levels of conformance are defined: A (lowest), AA, and AAA (highest or "best"). These are designed to meet the needs of different accessibility use cases.

The WCAG 2.0 is also associated with a variety of sufficient and advisory techniques. These explain how to meet the success criteria and beyond, using a specific technology, such as hypertext, scripting, or style sheet content.

When compared to the intuitive notion of accessibility, the main limitation of the WCAG 2.0 is the relatively modest support for evaluating cognitive accessibility. Thus, while the guidelines and the success criteria highlight good things such as the importance of intuitive structures, avoiding unusual words, and the clarity of overall presentation, there is still a need for introducing additional, domain-specific understandability criteria. This is because the WCAG 2.0 considers understandability from a global perspective, without making references to a particular application domain, assumed education, or specific user attributes.

In practice, the WCAG standard provides a normative definition and an evaluation system for accessible Web content: A Web application is accessible if it meets the success criteria for the WCAG 2.0 guidelines (at least) on the level A of conformance.

2.2.4 Authoring Tool Accessibility

The Authoring Tool Accessibility Guidelines (ATAG) provides guidelines for software and services that people use to produce Web pages and other Web content. The currently stable version of the Authoring Tool Accessibility Guidelines is the ATAG 1.0, published in 2000.

The purpose of ATAG 1.0 is twofold: to assist developers in designing authoring tools that produce accessible Web content and to assist developers in creating an accessible authoring interface. In particular, since many Web applications include authoring interfaces, the scope of ATAG is much wider than simply the commercial off-the-shelf Web authoring tools. For instance, consider a Web application that keeps a record of registered users and provides them an authoring interface for managing their contact information.

ATAG 1.0 introduces seven guidelines, associated with checkpoints of three priorities and three conformance levels: A (lowest), AA, and AAA (highest). In brief, ATAG 1.0 aims authoring Web content that is accessible with respect to the WCAG (1.0) specification. As a consequence, some ATAG checkpoints have multiple priorities, capturing the relationship with the WCAG (1.0) conformance levels.

The current ATAG 1.0 guidelines are as follows [11]:

1. Support accessible authoring practices.
2. Generate standard markup.
3. Support the creation of accessible content.
4. Provide ways of checking and correcting inaccessible content.
5. Integrate accessibility solutions into the overall "look and feel".
6. Promote accessibility in help and documentation.
7. Ensure that the authoring tool is accessible to authors with disabilities.

It is worth noticing that ATAG requires that the authoring tools and its documentation are themselves accessible. This is significant since it people experiencing accessibility problems are more likely to be interested in authoring accessible content.

The forthcoming version, ATAG 2.0 will probably change the wording and the organization of the guidelines somewhat. In particular, ATAG 2.0 is expected to reflect the abstract principles and the structure of WCAG 2.0.

2.2.5 User Agent Accessibility

The User Agent Accessibility Guidelines (UAAG) considers the accessibility of the Web user agent, in particularly with respect to Web content accessibility. In this context, user agents include Web browsers, media players, and assistive technologies.

The purpose of the UAAG 1.0 is to provide guidelines for designing accessible Web user agents. As a consequence, the audience of UAAG is much smaller than of WCAG or ATAG. In brief, the UAAG points out the user agent implementation principles for interacting with accessible content, with a special requirement of being able to communicate with other software, especially assistive technologies.

UAAG 1.0 introduces 12 guidelines, associated with checkpoints of three priorities and three conformance levels: A (lowest), AA, and AAA (highest). Informative resources about different techniques are also available. Unlike the other guidelines, UAAG also briefly considers challenges such as accessible installation and user control over their environment when accessing the Web. In addition, the UAAG 1.0 defines a system called conformance profile labels. This supports developing and documenting (specialized) user agents that conform only to a subset of all conceivable accessibility features.

The UAAG 1.0 guidelines are as follows [12]:

1. Support input and output device-independence.
2. Ensure user access to all content.
3. Allow configuration not to render some content that may reduce accessibility.
4. Ensure user control of rendering.
5. Ensure user control of user interface behavior.
6. Implement interoperable application programming interfaces.
7. Observe operating environment conventions.
8. Implement specifications that benefit accessibility.
9. Provide navigation mechanisms.
10. Orient the user.
11. Allow configuration and customization.
12. Provide accessible user agent documentation and help.

The forthcoming UAAG 2.0 will probably change the wording and the organization of the guidelines a bit. Again, UAAG 2.0 is expected to reflect the abstract principles and structure of WCAG 2.0 and the related documents.

2.2.6 Accessible Rich Internet Applications Suite

The Accessible Rich Internet Applications Suite (WAI-ARIA) defines a way to make dynamic content and advanced user interface controls accessible to people, regardless of disability. This includes content developed with Ajax, HTML, JavaScript, and related technologies [13].

The basic idea of WAI-ARIA is that complex web applications become inaccessible when assistive technologies cannot determine the semantics behind portions of a document. Accessibility problems may also arise when the user is unable to effectively navigate to all parts of documents in a usable way.

2.2.7 Challenges posed by Web 2.0

Web 2.0 is characterized by rich visual contents, user centric in form and contents. User, who was earlier at receiving end, has become the content provider and web-author. The role of site owner has been reduced to merely managerial and business logic provider. This development has posed a lot of challenges to assistive technology. End users while authoring the web may not follow the accessibility guidelines due to unawareness. Thus, certain contents provided by them may not be accessible to the assistive technology. Second, dynamic features of web 2.0 prevent screen readers to correctly access the web content. The content narrated by the screen reader may completely change during narration and prevent the screen reader to correctly render the contents to the user. Rich contents like embedded Image/ button links and anchors remain inaccessible to the screen readers.

2.3 Existing Systems

Various systems have been developed using approaches like content analysis, document reading rules, context summary, summary/gist based, semantic analysis, sequential information flow in web pages etc. But these systems have a number of issues which make them less usable. First, they are essentially screen readers or their extension. Second, they provide only browsing and do not support other applications like mail, form-filling, transaction, chat etc. A brief survey of some important existing browsing systems has been made in this section.

Some of the most popular screen-readers are JAWS [14] and IBM’s Home Page Reader [15]. JAWS is a popular state-of-art screen reader developed by Freedom Scientific. JAW 13.7 is the current stable version which supports Windows 7 Operating System. Besides sequential access of web content, it has rich set of key shortcuts that can be used by visually impaired users to access the web.

“Emacspeak” [16] is a free screen reader for Emacs developed by T. V. Raman and first released in May 1995; it is tightly integrated with Emacs, allowing it to render intelligible and useful content rather than parsing the graphics.

Brookes Talk [17] is a web browser developed in Oxford Brookes University in 90’s. Brookes Talk provides function keys for accessing the web page. Brookes Talk reads out the webpage using speech synthesis in words, sentences and paragraph mode by parsing the web page content. It also uses some mechanism for searching the suitable results using search engines and supports a conceptual model of website too. It supports modeling of information on web page and summarizes the web page content.

Csurf [18] is developed by Stony Brook University. Csurf is context based browsing system. Csurf brings together content analysis, natural language processing and machine learning algorithm to help visually disabled to quickly identify relevant information. Csurf is composed of interface manager, context analyzer, browser object from tress processor and dialog generator. Csurf web browser uses the functionality of voice XML, JSAPI, freeTTS, Sphinx, JREXAPI, etc.

Aster (Audio system for technical reading)[19], developed by T. V. Raman, permits visually disabled individuals to manually define their own document reading rules. Aster is implemented by using Emacs as a main component for reading. It recognizes the markup language as logical structure of web page internally. Then user can either listen to entire document or any part of it.

Some researchers have also proposed to extract the web content using semantics [20].

Hearsay [21] is developed at Stony Brook University. It is a multimodal dialog system in which browser reads the webpage under the control of the user. It analyzes the web page content like HTML, DOM tree, segments web page and on the basis of this generates VoiceXML dialogues.

A Vernacular Speech Interface for People with visual Impairment named “Shruti” has been developed at Media Lab Asia research hub at IIT Kharagpur, India. It is an embedded Indian language Text-to-Speech system that accepts text inputs in two Indian languages - Hindi and Bengali, and produces near natural speech output.

Shruti-Drishti, [22] is a Computer Aided Text-to-Speech and Text-to-Braille System developed in collaboration with CDAC Pune and Webel Mediatronics Ltd, (WML) Kolkata. This is an integrated Text-to-Speech and Text-to-Braille system which enables persons with visual impairment to access the electronic documents from the conference websites in speech and Braille form.

Screen reading software SAFA (Screen Access For All) [23] has been developed by Media Lab Asia research hub at IIT Kharagpur in collaboration with National Association for the Blind, New Delhi in Vernacular language to enable the visually disabled persons to use PC. This enables a person with visual impairment to operate PC using speech output. It gives speech output support for windows environment and for both English and Hindi scripts.

As far as general surfing is concerned, above mentioned screen readers are important and useful tool to the visually disabled. But, in case of complex tasks like information query, complex navigation, form-filling or some transaction, they do not work to the level of satisfaction. Screen Readers provide accessibility through abundant use of shortcut keys for which visually disabled have to be trained. Also, the screen readers need to be purchased and installed on the local machine which prevents them to use the internet on any public terminal.

Despite their shortcomings, Screen Readers have been the primary tool for using internet by visually challenged. Unfortunately, most of the popular and workable screen readers are proprietary and bear a heavy price tag. For example, cost of JAWS, a popular screen reader by Freedom Scientific for single user license is around $900. This cost is 10 times higher than that of Windows7 operating system! The cost is evidently too high to be afforded by an average Indian individual with visual disability. High cost is also attributed by small product market for assistive tools. There are ATs in freeware domain but they are not popular since most of them may not provide adequate functionalities.

2.4 Text to Speech (TTS) on Web

Prospects of TTS on web are gaining momentum gradually. At present, fetching mp3 on a remote web service is the only standard way for converting text to speech. APIs used for this purpose are proprietary and provide text to speech services, e.g. BrowseAloud [24] is a TTS service using which a web site can be speech enabled. Google Translate Service also has a TTS feature. Although many websites have provision of reading its contents, but it is limited to playing the content as a single mp3 file. There is no provision for interactive navigation and form filling in most of them. Implementation of TTS as a browser extension would go in a big way to simplify the text to speech related issues in future.

WebAnywhere [25] is an open source online TTS developed at Washington University for surfing the web. It requires no special software to be installed on the client machine and, therefore, enables visually disabled people to access the web from any computer. It can also be used as a tool to test the accessibility of a website under construction. WebAnywhere generates speech remotely and uses pre-fetching strategies designed to reduce perceived latency. It also uses a server side transformational proxy that makes web pages appear to come from local server to overcome cross-site scripting restrictions. On the client side, Javascript is used to support user interaction by deciding which sound to be played by the sound player.

Like screen readers, WebAnyWhere reads out the elements in sequential order by default. Although few shortcut keys are assigned to control the page elements, user has to make an assessment of the whole page in order to proceed further. In websites with poor accessibility design, user may be trapped during a complex navigation.

Although WebAnyWhere is a step forward in the direction of online installation-free accessibility, it has certain limitations: As the contents in WebAnyWhere are received through a third party, they may not be treated reliable. Fear of malware attacks, phishing etc. is associated with such access. Secure sites cannot be accessed using this approach as they do not allow manipulating their contents. This is a major drawback since most of the important tasks like bank transaction, filling examination forms, using e-mail services etc. are performed over secure sites. These drawbacks compromise the usability of WebAnyWhere and limit it to an information access tool only.

Chapter 3 WACTA, the Speech based Web Browser

3.1INTRODUCTION

In confirmation to the “Research by Application Development” philosophy adopted and as a part of this research work, an enhanced speech based web browser named ‘WACTA’ has been designed and developed with a vision of providing an improved yet affordable and easy to use web browser for visually challenged users. The system has been implemented for Microsoft Windows 7 Operating System in programming language C# and .NET framework 4.0. Microsoft Speech API (SAPI 5.0) has been used for narration of the text and input feedback.

WACTA web Browser has several unique features that distinct it from other screen readers. First, the speech based web browser has been implemented completely using managed code. So far, web browsers have been implemented using unmanaged code. Implementation in managed code offers better reliability as automatic garbage collection is possible in managed code only. Second, speech features have been integrated in the WACTA browser itself. It can be used in many modes depending upon the user need and user disability level. Thus, there are auto sequential, manual sequential, newsreader mode, interactive mode, analytical mode and mouse Glimpse mode (for partially blind user) which can be chosen from among menu or using the key shortcuts. At any time, visually challenged user can make switch over from one mode to another. It aids to minimizing the time required to complete a given task.

3.2 System Design

This section describes the design issues, conceptual architecture, and interactions among various sub components of the proposed system.

3.2.1 Conceptual Architecture

Fig. 3.1 describes the conceptual architecture of the WACTA web browser. The system is based on narration of the webpage in a controlled way as well speech feedback for user input. The requested web page is loaded on the browser. The web page is parsed to obtain its constituent elements. It is then sent to the Text to Speech (TTS) to generate the speech equivalent of the desired text. User Input is given by keyboard input on the web browser address bar. A speech based feedback is generated for each key press thereby assuring the user of correct key press. Various modes of access have been provided to control the web content in a desired way.

illustration not visible in this excerpt

Fig 3.1: Schematic diagram of the WACTA Web Browser

The proposed System has the following modes of working:

3.2.2 Link Navigation Mode

In this mode, user can navigate among links to quickly reach to the webpage to get desired information. Link navigation mode can help understanding the map of the website. Thus, s/he can access the basic structure of the website.

3.2.3 Navigate All Mode

In the second mode, each element of the web page is traversed by the user in a controlled manner. This mode provides better control for moving around the web page and is helpful in form filling, making queries etc.

3.2.4 Newsreader Mode

This mode may be useful for news reading, gathering knowledge on some topic, e-learning etc. In newsreader mode, the contents of the webpage are narrated automatically by the browser.

3.2.5 Page Analytics Mode

In this mode, web page details e.g. web site domain name, details of links, images, forms etc. is narrated in the desired way.

3.2.6 Query Mode

In this mode, user can locate the desired information within the website by inputting a keyword in the textbox.

3.2.7 Text Glimpse through mouse Mode

In this mode, the browser speaks out glimpses of the portion on which mouse is right clicked by the user. Visually challenged users rely on keyboard to perform user inputs; therefore mouse is not normally useful to them. However, this mode is primarily suitable for users with low vision who may not read the content of the text visible to them. Thus, s(he) can directly go to the portion of their interest on the web page using mouse click after listening the underlying text.

3.2.8 Switch over among modes

Switch over to any of the available modes can be made by using designated key shortcut at any time. Thus, a visually challenged user may make the combination of the modes to perform navigation around the webpage in a more controlled way.

3.2.9 Key Shortcuts

Besides the above stated three modes, several key shortcuts identical to those available with the screen reader JAWS are also provided. Key Shortcuts are a convenient and preferred mode of working by visually challenged users. Only problem is they are to be learnt and remembered. The usage proficiency is increased with their frequent use. Some of the important functionalities for which key shortcuts have been provided are: go to the next form element, Go to the next heading; Go to the top of next column, Go to Begin of the page, Go to the Address Bar etc.

3.2.10 Informed Search

Using various available modes and features intuitively, visually challenged user may access his/ her text of interest very quickly. Thus, rather than making blind sequential depth first search, the WACTA user makes use of informed search to locate the information on the webpage.

Fig. 3.2 depicts the Use Case diagram for WACTA Web Browser. The six modes of usage makes the complete system using which a visually challenged user can perform an informed search to get the content of his/ her interest. At any time, user can make switch over to another mode using the assigned key shortcut for that mode. Thus, visually challenged user can make intelligent guesses to access the relevant content with the help of available modes. Mouse Glimpse mode may be used by partially visually challenged users to listen a portion of web page they find difficult to read normally.

illustration not visible in this excerpt

Fig. 3.2: Use-Case Diagram of WACTA Web Browser

3.3 System Development and Implementation

The following subsections describe the implementation details of speech based web browser WACTA for visually challenged users.

3.3.1 Platform and Language

Microsoft Windows 7 has been the chosen operating system for the implementation due to its popularity and wide acceptance. Further, due to flexibility provided by it in the development of applications, .Net platform with C# language has been used for development of the prototype Framework. Microsoft Language Interface provides better interface in term of functionality and processing power.

3.3.2 .NET Framework

Microsoft .NET is software that connects information, people, System and devices. It spans clients, servers, and developer tools. It consists of all kinds of software, including web based applications, smart client applications and XML web services. It also contains components to facilitate integration and of sharing of data and functionality over a network through standard, platform independent protocols such as XML, SOAP and HTTP. Developer tools such as Microsoft Visual Studio.NET 2008 provide an integrated development environment for maximizing developer productivity with .NET framework.

3.3.3 The WebBrowser Class

To create the skeleton of the speech based web browser WACTA, the WebBrowser class (Namespace: System.Windows.Forms) has been used. The WebBrowser class offers rich features and functionalities that can be used to control the output of various web page elements in the desired order.

Table 3.1 summarizes the attributes, methods, and events of WebBrowser Class used in the WACTA Web Browsing System.

illustration not visible in this excerpt

Table 3.1(ii): Methods of WebBrowser Class

illustration not visible in this excerpt

Table 3.1(iii): Events of WebBrowser class used

3.3.4 Web Page Analytics

To get the access to various elements of a web page, its DOM structure is explored. Document Object Model (DOM) is an Application Programming Interface (API) for valid HTML and well formed XML documents. It is based on an object structure that closely resembles the structure of the documents it models. It allows applications to dynamically access content, structure and style of the documents. DOM is not restricted to a specific platform or programming language. Figure 3.3 represents the DOM structure for a web page.

Fig. 3.3: DOM Structure of a Web Page

Following classes have been used in extracting the HTML elements from web pages:

3.3.5 HtmlDocument class

This class provides top level programmatic access to an HTML document hosted by the WebBrowser Control. It belongs to System. Windows.Forms namespace. Properties, methods and Events of this class used in the implementation of WACTA are described in Table 3.2(i), (ii) and (iii) respectively. When the document has focus, but no element of the document has been given focus, ActiveElement returns the element corresponding to the <BODY> tag. If the document does not have focus, ActiveElement returns null.

illustration not visible in this excerpt

Table 3.2(iii): Events of HtmlDocument class used

3.3.6 HtmlElement Class

HtmlElement class represents an HTML element inside of a web page. It belongs to System.Windows.Forms namespace. Properties, methods and Events of this class used in the implementation of WACTA are described in Table 3.3(i), (ii) and (iii) respectively.

illustration not visible in this excerpt

Table 3.3(iii): Events of HtmlElement class used

3.3.7 HtmlElementCollection class

HtmlElementCollection class Defines a collection of HtmlElement objects.

Properties of this class used in the implementation of WACTA are described in Table 3.4.

illustration not visible in this excerpt

Table 3.4: Properties of HtmlElementCollection class used

3.3.8 SpeechSynthesizer Class

SpeechSynthesizer class of .NET Framework has been used to speech enable the webpage using the WACTA web browser. The class belongs to System.Speech.Synthesis Namespace. SpeechSynthesizer class Provides access to the functionality of the installed a speech synthesis engine, i.e. Microsoft Speech API 5.0 (SAPI).

To ensure the consistency in speech output/ feedback in various modes of working, only single instance of SpeechSynthesizer class has been created and used throughout the application. This approach prevents the running of multiple narrator instances simultaneously. Properties, methods and Events of this class used in the implementation of WACTA are described in Table 3.5 (i), (ii) and (iii) respectively.

illustration not visible in this excerpt

Table 3.5 (ii): Events of SpeechSynthesizer class used

illustration not visible in this excerpt

Table 3.5(iii): Methods of SpeechSynthesizer class used

3.3.9 Newsreader style Navigation

This module reads out the text as displayed on the web page in Newsreader style. The mode does not inform or permit to interact with UI elements on the webpage. At any instance of time, the narration can be pause by pressing CONTROL key. The narration can be discarded altogether by pressing ESC key.

The module is implemented using the InnerText property of the HtmlElement class. InnerText property of the html element for Body tag of the Web page returns the complete text content of the web page in sequential order. Table 3.6 shows the code snippet for it.

illustration not visible in this excerpt

Table 3.6 Newsreader style Navigation

Table 3.6: Newsreader Style Navigation

3.3.10 Tab based Navigation

The Tab key is the primary mechanism for navigating of a web page by visually challenged user. The Tab key visits only those controls with a tab stop. Tab Key based navigation allows to traverse through all the focusable User Interface (UI) elements. These elements have their tabIndex property set to a positive integer value. Tab based traversal allows user to visit the focusable elements in the increasing order of the tabIndex value of the elements. Normally, Html Links, Anchors and Form elements are assigned tabIndex value to enable them to get focus. However, any html element can be enabled by the web designer to get focus. Tab based navigation is useful for tasks like form filling, choosing a website from search results etc.

The code for Tab key based navigation has been implemented in the handler for ‘Focusing’ event of HtmlElement currently getting user input focus i.e., ActiveElement. The Focusing event is registered in the DocumentCompleted method of the WebBrowser class. This can be viewed in the code snippet in the Table 3.7.

illustration not visible in this excerpt

Table 3.7: Tab based Navigation

3.3.11 Up/Down key based Navigation

Up / down key based traversal allows user to visit each HTML elements of the webpage in forward / backward order of their creation. This mode has been implemented in the event handler of PreviewKeyDown event of the WebBrowser class. Registering of this event is made in the DocumentCompleted method as shown in Table 3.8(a). The handler stores each element underneath the BODY Tag in the object of HtmlElementCollection type. Problems encountered in this implementation is that InnerText method of HtmlElement object returns the text which includes the text of all the children nodes. Thus, it is required to separate the text of the current node only so that it can be spoken out. This issue has been resolved by selecting only those children of HtmlElementCollection that have a single child. Still, some elements may not be covered using this method for which string processing is used. This code is listed in the Table 3.8.

illustration not visible in this excerpt

Table 3.8: Up/Down key based Navigation

3.3.12 Query Mode

Query Mode can be used by the Visually Challenged user to search the website for an input text string. This feature makes the use of Google’s “Search within the Site” feature. Code glimpse for the same is shown in the Table 3.9.

illustration not visible in this excerpt

Table 3.9: Query mode of WACTA

3.3.13 Analytical Mode

This mode speaks out the page statistics of the web page e.g. Title, Headings, Domain, Background color, links, image Alt text etc. This gives the visually challenged user the context of the webpage very quickly. The code snippet for the same is shown in Table 3.10.

illustration not visible in this excerpt

Table 3.10: Analytic mode of WACTA

3.3.14 Text Glimpse Mode

This mode is useful for users having low vision due to which they may not read the text on webpage without difficulty. Although, they can view the web page and its various regions but this mode allows user to right click on some point of screen and if there is some text underlying the point, it is spoken out by the narrator. Thus, the user with low vision can get his topic of interest without the need to listen sequentially all the text of the web page. The mode has been implemented on ContextMenuShowing event of HtmlDocument class. The handler makes the use of GetElementFromPoint method of HtmlDocument class. The code snippet is shown in Table 3.11

illustration not visible in this excerpt

Table 3.11: Text Glimpse Mode of WACTA

[...]

Excerpt out of 124 pages

Details

Title
Speech as Interface in Web Applications for Visually Challenged
Course
Computer Science and Engineering - Human Computer Interaction
Author
Year
2013
Pages
124
Catalog Number
V311374
ISBN (eBook)
9783668103597
ISBN (Book)
9783668103603
File size
2661 KB
Language
English
Keywords
speech, interface, applications, visually, challenged
Quote paper
Prabhat Verma (Author), 2013, Speech as Interface in Web Applications for Visually Challenged, Munich, GRIN Verlag, https://www.grin.com/document/311374

Comments

  • No comments yet.
Look inside the ebook
Title: Speech as Interface in Web Applications for Visually Challenged



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free