ICEpdf
  1. ICEpdf
  2. PDF-854

Copied text from PDF is pasted incorrectly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.1.1
    • Fix Version/s: 5.1.2
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      All

      Description

      When copying/extracting the text from a PDF file in the viewer, the pasted text is not in the same format as the text on the viewer. For example:

      Viewer text: 31.12.2013

      Pasted text:
       3
       1
      .
       1
       2
      .
       2
       0
       1
      3

        Activity

        Hide
        Patrick Corless added a comment -

        The PDF in question contains a landscape page view. For some strange reason the text is layed out using a a portrait layout. As a result the coordinates move along the y-plane instead of the usual x-plane which explains why our page extraction algorithm breaks down. I've added a fix which looks for the negative y shear value in the Tm matrix which is responsible for the rotation.

        Show
        Patrick Corless added a comment - The PDF in question contains a landscape page view. For some strange reason the text is layed out using a a portrait layout. As a result the coordinates move along the y-plane instead of the usual x-plane which explains why our page extraction algorithm breaks down. I've added a fix which looks for the negative y shear value in the Tm matrix which is responsible for the rotation.
        Hide
        Patrick Corless added a comment -

        I've rework the new line detection code to take a few more units of measure into consideration before creating a new line of text. This seems to fix the document in question with regards to text extraction. Tripple clicking on a line now selects all the text that visually represents a line of text.

        Show
        Patrick Corless added a comment - I've rework the new line detection code to take a few more units of measure into consideration before creating a new line of text. This seems to fix the document in question with regards to text extraction. Tripple clicking on a line now selects all the text that visually represents a line of text.
        Hide
        Patrick Corless added a comment -

        Marking as resolved.

        Show
        Patrick Corless added a comment - Marking as resolved.

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Arran Mccullough
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: