With this setup, we can get approximately 2.5 kilobytes per A4 page. Printing and OCRing an encrypted GPG file encoded in base16 Scanning quality: 400 DPI (works better than 600 DPI for GOCR).
Encoding: base16 in lower case (lower case recognition works better in GOCR).)Īfter weeks of trials and errors, I was finally able to get 100% accuracy for automatically recognizing hexadecimal data (i.e. To sum up, OCR of digital data is hard, and the problem space of OCR has several dimensions. base16/hexadecimal is still tough because a few characters tend to confuse OCR engines ("1" is sometimes recognized as "l", "7" as "J").base32 does not work because there are a few pairs of confusing characters ("1" and "l", "0" and "o").base64 is a nightmare, in particular because of the upper-case/lower-case problem.This means that most usual data encodings cannot be recognized with 100% accuracy: OCR is confused by the letters that look very similar in upper-case and lower case, (eg "s" and "S").OCR is confused by similarly looking characters (eg "1" and "l").For OCR of digital data, we need that 100% of characters are correctly recognized (i.e. Otherwise, the resulting scanned data is corrupted and in many cases unusable (an encrypted message cannot be read, a compressed file cannot be decompressed). When you OCR digital data, you need perfect recognition, with no error. There are many use cases for that, such as backuping cryptographic keys ( GPG keys on paper) or sending encrypted messages over regular mail (see Paper encryption technology)īy working on paper storage, I discovered that OCR of digital data is a hard problem.
I want to print digital data, to store digital data on paper.