Creating quality and accessible scanned documents

Introduction

Instructors may provide scanned documents as course reading materials, such as some pages in a reference book to create softcopy in PDF for students. Some students may prefer scanning book chapters for different courses because many textbooks are often expensive. Compared to physical copy of the materials, digital copy of reading materials would allow a relatively more convenient sharing among instructors and students.
Scanned materials may be inaccessible to some students with disabilities such as some students with visual impairment. Scanned images of text would be inaccessible to assistive technologies such as screen readers. The scanned images of text need to be converted to make it more accessible to assistive technologies.
In this section, there are some generic and useful resources about creating quality scanned documents. These resources are not exhaustive or definitive.
There are different PDF editing programmes, such as Adobe Acrobat Pro and Foxit PDF Editor Pro. Note that the accessibility-related functions of these PDF editing programmes may vary across the programme versions and/or operating systems.
- For example, currently Foxit PDF Editor Pro in Windows can support both accessibility check and accessibility-related editing functions (such as tagging). The Mac version supports accessibility check but no other accessibility-related editing functions (such as tagging).
- However, it is important to note that software and computer programmes are constantly and rapidly developing along with changing accessibility functions. Statements on this page may no longer represent the current status of the software.
In this section, we use Adobe Acrobat Pro in MacOS for illustrative purpose. The sets of commands may be slightly different across the types of PDF editing programmes and computer systems, but the general ideas still apply. The recommended practices are not exhaustive.
It is always good to communicate with students, understand their learning and access needs, and modify the teaching and learning materials in response.

Overview of suggested practices

Target materials availability

1. Solicit existing electronic version of the target materials

Scanning and proofreading

1. Scanning techniques
2. Optical character recognition (OCR) and proofreading

Follow-up settings for accessible PDF

1. Tagging and document settings for accessible PDF

Target materials availability

1. Solicit existing electronic version of the target materials.

Before scanning the materials, try to solicit existing electronic version of the target materials wherever possible to facilitate the conversion into alternative accessible formats, such as audio books and Braille version, through the university library, local community resources, or international resources.
Examples of local community resources:
- Centralised Braille Production Centre, The Hong Kong Society for the Blind It provides Braille transcription services, such as textbooks for university students with visual impairment, reference reading materials, examination papers, Braille e-books, and Braille audio e-books. It maintains the Braille ebook Reservation System.
- The Information Accessibility Centre, The Hong Kong Society for the Blind It provides library services for people with visual impairment. Its recording studio would produce talking books and periodicals.
Examples of international resources:
- International Websites Offering Accessible Books; Accessible Books Consortium (ABC)
- Bookshare, United States It provides accessible ebooks for people with print disabilities.
- RNIB Bookshare, United Kingdom It provides accessible textbooks and resources for learners with disabilities.

Scanning and proofreading

2. Scanning techniques

Inspect the quality of the materials to be scanned. If the materials contain large number of damaged pages and/or unclear text (such as handwritten notes, blurry text, or highlighted text), then the materials may not be suitable for producing quality scanned versions. Consider looking for another copy, or manually inputting the text to create digital version.
Check whether the scanner specification could support high resolution and sufficient contrast for high quality scanning and subsequent format conversion.
Avoid crooked pages, shadows from the spine of bound materials (such as a book), unaligned pages, wrong orientation of pages, uncropped pages, reducing page size, or cut-off text.
Try to ensure each page of the materials is properly aligned with the scanner glass and is flat against the scanner glass as much as possible by unbinding and/or applying manual pressure.
- It would help avoid crooked and unaligned pages with shadows from the spine of the bound materials. The curve or shadows would distort the page content. Unaligned pages may cut off certain parts of the text of the page.
Scan the materials upright to prevent users from the need to rotate the pages.
Consider cropping a double-page scan page into a single-page scan. It would avoid reducing the original size of the page content.
An example of scanner for mobile device is the Adobe Scan Mobile App. It allows users to scan and upload the materials to Acrobat Pro DC for further editing and accessibility settings.

3. Optical character recognition (OCR) and proofreading

Image PDF created by scanning the source documents is not a “searchable PDF” and the text content is not “real”. It is inaccessible to assistive technologies such as screen readers. Further procedures are needed to enhance the accessibility of the image PDF, such as optical character recognition (OCR) followed by tagging procedures.
Searchable PDF means the text is searchable by shortcut keys like Ctrl + F. “Real” text means it is fully editable or selectable. The text can be selected and/or highlighted by cursor; copied and pasted.
Searchable PDF is indeed more accessible to all users as they can more easily select, highlight, edit, or reflow the real text content in the PDF.

Screenshots of two different PDFs. The one with only sentences highlighted has captions that read: “PDF document with ‘real’ text. The text can be selected, highlighted by cursor, or edited”. The screenshot with the entire page highlighted has captions that read: “Scanned image of text without ‘real’ text. The text cannot be selected, highlighted by cursor, nor edited”.

Optical Character Recognition (OCR) is the process of converting images of text into “real” text that is searchable, recognizable, readable, and editable.
In this section, we use Adobe Acrobat Pro in MacOS for illustrative purpose. The sets of commands may be slightly different across the types of PDF editing programmes and computer systems, but the general ideas still apply. The recommended practices are not exhaustive.
To perform OCR in Adobe Acrobat Pro:

1. Go to Tools > Enhance Scans or Scan & OCR > Recognize Text > In This File. Select In Multiple Files if you need to convert multiple files at the same time.
2. Go to Settings > Specify the page range to be converted and document language. For the Output, select Editable Text and Images. Select OK.
3. Select Recognize Text. Then OCR will begin. Note that the resulting text might not be completely correspond to the original text. The quality of the OCR depends on various factors, such as the quality of the scan and font (e.g., handwritten text might be more difficult to be accurately recognized).
4. Proofread and fix the converted text in the resulting PDF. Select Recognize Text > Correct Recognized Text. Suspected characters are automatically highlighted. You may go through each suspected character one by one. Accept or fix the suspected characters as needed.
5. Proofread and fix the converted text in the resulting PDF again manually.

Follow-up settings for accessible PDF

4. Tagging and document settings for accessible PDF

Refer to the page about creating accessible PDF of this Toolkit for the details of the follow-up settings, such as tagging.

References

Creating High Quality Scans. California State University San Macros.
Good and bad electronic or digital materials. Griffith University.
High-quality scans. University of Washington.
Appendix 3: Tips for using Scanners and OCR, Guidelines for Producing Accessible E-text (2018). Round Table on Information Access for People with Print Disabilities Inc (Round Table).