Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 6.3.1
-
Fix Version/s: 6.3.3
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:any
-
Support Case References:Support Case 14388:- https://icesoft.my.salesforce.com/5000g00001wrrDX
Description
A uses has reported that when extracting text they are seeing the text being drawn out vertically:
h
e
l
l
o
instead of hello.
We've seen this issue in the past and have some corrective code to detect and adjust for the shift. Further investigation is needed.
h
e
l
l
o
instead of hello.
We've seen this issue in the past and have some corrective code to detect and adjust for the shift. Further investigation is needed.
The page rotation generally just states if the page should be rotated in the viewer. In this particular case the page is rotated from portrait to landscape.
As you suggested the Tm operator is suspect as in one case it's 0 1 -1 -0 241.44 126.231 Tm. We have code that tries to detect this type of encoding when sorting the document text for extraction. I'll need to play around with it a bit more but will hopefully have something soon.