ICEpdf
  1. ICEpdf
  2. PDF-1073

Consolidate Page text extraction sorting calls

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 6.1.3
    • Fix Version/s: 6.2
    • Component/s: API, Core/Parsing
    • Labels:
      None
    • Environment:
      any

      Description

      A community member has is migrating from 4.x to 6.x and has run up against a few regressions with the expected results of the page text extraction calls. I've done a little digging around and it would appear that the docment.getPageText() method calls does not execute the same extraction algorithms as page.getPageText() call.

      This bug is a place holder to review the text extraction API and make srue the non visual page extraction calls have the same sorting calls as the visual page extraction calls.

        Activity

        Hide
        Patrick Corless added a comment -

        I've reviewed our code and things seems to be in order. The sorting and formatting takes place in the PageText call ArrayList<LineText> getPageLines(). The document and Page calls getPageText() and getPageViewText() work as the javadoc suggests, that is they change the parser config and getPageText() can be a lot faster for straight up extraction with no page image capture.

        I've also touched up the viewer ri text extraction calls and the extraction examples to use the fontProperties manager to speed up the start time of the examples.

        Show
        Patrick Corless added a comment - I've reviewed our code and things seems to be in order. The sorting and formatting takes place in the PageText call ArrayList<LineText> getPageLines(). The document and Page calls getPageText() and getPageViewText() work as the javadoc suggests, that is they change the parser config and getPageText() can be a lot faster for straight up extraction with no page image capture. I've also touched up the viewer ri text extraction calls and the extraction examples to use the fontProperties manager to speed up the start time of the examples.
        Hide
        Patrick Corless added a comment -

        Marking as fixed.

        Show
        Patrick Corless added a comment - Marking as fixed.

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: