A better scan
About this copy
This is a scanned copy of the 4th printing, 1998. It's shared for reading, and for improving the Markdown copy in our Github repo.
How it was made
@pronoiac had the spine / binding removed and fed the pages through a scanner. Steps and software used:
- scanner gave 600dpi grayscale, as 3.6 gigabytes of png files
- Scantailor Advanced (in Docker): deskew the pages and render the pages as 300dpi black and white (1-bit) tiffs - 30 megabytes
- tiff2pdf and pdfunite: turn those many tiffs into one pdf
- OCRmyPDF: OCR with Tesseract, add title and author to the pdf, apply lossless JBIG2 compression - 24 megabytes
Other notes
- It’s higher resolution, though an older printing (4th printing, 1998) than the previous scan (6th printing, 2001).
- OCR is better than the previous scan - searching for keywords or phrases usually works
- why not the grayscale PNGs: space constraints on Github releases, and dubious value for space
- ebooks from the Markdown version are getting closer
- see #137 for some of the thoughts behind this release