Yesterday I had to take screenshots of a book digitised right at the start of the OCR revolution. I had to do this as the OCR layer is full of errors, and my PC/screen smooths out harsh lines to make it easier for OCR readers to recognise blurry characters.
It’s a bit.. umm.. difficult as we are still not past the effects of flooding. So I’m getting a real time cause and effect of stress on physiological processes.
I now have 256 pages as images in a mix of languages and regiolect and I’m now at the point I can look at them and say to myself: “Yes. These are words. Words of wordiness. That’s an a. That is definitely an a.”
It’s a bit tricky as well as this is what I want. I want transcriptions so I can agree or disagree with translations. But even tidied, formatted… no there is no translation software trained on 12-16thC spelling. And Grammar. Across three languages.
At any other time this is something I can workaround. I use a totally different set of skills and let my mind work subconsciously and consciously on a solution.
Luckily yesterday I was able to lay down another coat of faux-namel on some metal plaques which did let that happen. There is something so very soothing in dipping a brush into paint (well a medium that I mixed with real metal powders) and brushing and smooshing layers down.
But it’s still work. It’s still using my mind, and brain, and spine, and muscles, and skills. Even if it is soothing in the moment I’m here on the other side of the night.
Over time this is progress.
I’ve put each page into a subfolder for each trade and I’m definitely correct about some overlaps I haven’t read about so that’s nifty. If only I could go from “these are words made up of letters” to my usual ability to work out what a sentence means.