Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 4.1
-
Fix Version/s: 4.2
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:any
-
Assignee Priority:P1
-
ICEsoft Forum Reference:
Description
What seems to be happening is that we are substituting incorrectly the fonts for the OCR layer with a font that doesn't have the same width as the one used to generate the PDF. I've attached a screen shot which introduces an alpha value into the renderting stack so you can see the OCR text behind the image text.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Finally had a change to take a close look at this selection issue. The PDF in question expose a small bug in a context parser where we where concatenating the horizontal text scaling number against the previous value. So if more then one "Tz" was specified per text block we would gradually shrink the text.
For example
81 Tz
65 Tz
First scale is 81% of the font width, followed by 65% of the previous value. The correct handling of this is to treat each as separate scales. Once the logic was adjust the text selection seem to correspond more directly with the original graphic/ocr capture.
Took a while to find this one.