[PDF-200] OCR font substitution error - ICEsoft JIRA Issue Tracker

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.1
Fix Version/s: 4.2
Component/s: Core/Parsing
Labels:
None
Environment:
any

Assignee Priority:
P1
ICEsoft Forum Reference:
http://www.icefaces.org/JForum/posts/list/17114.page

Description

What seems to be happening is that we are substituting incorrectly the fonts for the OCR layer with a font that doesn't have the same width as the one used to generate the PDF. I've attached a screen shot which introduces an alpha value into the renderting stack so you can see the OCR text behind the image text.

Activity

Hide

Permalink

Patrick Corless added a comment - 25/Feb/11 4:10 PM

Finally had a change to take a close look at this selection issue. The PDF in question expose a small bug in a context parser where we where concatenating the horizontal text scaling number against the previous value. So if more then one "Tz" was specified per text block we would gradually shrink the text.

For example

81 Tz
65 Tz

First scale is 81% of the font width, followed by 65% of the previous value. The correct handling of this is to treat each as separate scales. Once the logic was adjust the text selection seem to correspond more directly with the original graphic/ocr capture.

Took a while to find this one.

Show

Patrick Corless added a comment - 25/Feb/11 4:10 PM Finally had a change to take a close look at this selection issue. The PDF in question expose a small bug in a context parser where we where concatenating the horizontal text scaling number against the previous value. So if more then one "Tz" was specified per text block we would gradually shrink the text. For example 81 Tz 65 Tz First scale is 81% of the font width, followed by 65% of the previous value. The correct handling of this is to treat each as separate scales. Once the logic was adjust the text selection seem to correspond more directly with the original graphic/ocr capture. Took a while to find this one.

People

Assignee:

Patrick Corless

Reporter:

Patrick Corless

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

11/Aug/10 1:06 PM

Updated:

29/Mar/12 11:52 AM

Resolved:

25/Feb/11 4:10 PM