[PDF-854] Copied text from PDF is pasted incorrectly - ICEsoft JIRA Issue Tracker

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 5.1.1
Fix Version/s: 5.1.2
Component/s: Core/Parsing
Labels:
None
Environment:
All

Support Case References:
Support Case #13268 - https://icesoft.my.salesforce.com/50070000010b6bY

Description

When copying/extracting the text from a PDF file in the viewer, the pasted text is not in the same format as the text on the viewer. For example:

Viewer text: 31.12.2013

Pasted text:
3
1
.
1
2
.
2
0
1
3

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Patrick Corless added a comment - 03/Feb/15 9:35 AM

The PDF in question contains a landscape page view. For some strange reason the text is layed out using a a portrait layout. As a result the coordinates move along the y-plane instead of the usual x-plane which explains why our page extraction algorithm breaks down. I've added a fix which looks for the negative y shear value in the Tm matrix which is responsible for the rotation.

Show

Patrick Corless added a comment - 03/Feb/15 9:35 AM The PDF in question contains a landscape page view. For some strange reason the text is layed out using a a portrait layout. As a result the coordinates move along the y-plane instead of the usual x-plane which explains why our page extraction algorithm breaks down. I've added a fix which looks for the negative y shear value in the Tm matrix which is responsible for the rotation.

Hide

Permalink

Patrick Corless added a comment - 17/Feb/15 3:35 PM

I've rework the new line detection code to take a few more units of measure into consideration before creating a new line of text. This seems to fix the document in question with regards to text extraction. Tripple clicking on a line now selects all the text that visually represents a line of text.

Show

Patrick Corless added a comment - 17/Feb/15 3:35 PM I've rework the new line detection code to take a few more units of measure into consideration before creating a new line of text. This seems to fix the document in question with regards to text extraction. Tripple clicking on a line now selects all the text that visually represents a line of text.

Hide

Permalink

Patrick Corless added a comment - 26/Feb/15 2:53 PM

Marking as resolved.

Show

Patrick Corless added a comment - 26/Feb/15 2:53 PM Marking as resolved.

People

Assignee:

Patrick Corless

Reporter:

Arran Mccullough

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

27/Jan/15 10:48 AM

Updated:

01/Apr/15 3:00 PM

Resolved:

26/Feb/15 2:53 PM