Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 5.1.1
-
Fix Version/s: 5.1.2
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:All
-
Support Case References:Support Case #13268 - https://icesoft.my.salesforce.com/50070000010b6bY
Description
When copying/extracting the text from a PDF file in the viewer, the pasted text is not in the same format as the text on the viewer. For example:
Viewer text: 31.12.2013
Pasted text:
3
1
.
1
2
.
2
0
1
3
Viewer text: 31.12.2013
Pasted text:
3
1
.
1
2
.
2
0
1
3
The PDF in question contains a landscape page view. For some strange reason the text is layed out using a a portrait layout. As a result the coordinates move along the y-plane instead of the usual x-plane which explains why our page extraction algorithm breaks down. I've added a fix which looks for the negative y shear value in the Tm matrix which is responsible for the rotation.