ICEpdf
  1. ICEpdf
  2. PDF-438

Extracting text from document doesn't work properly.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.3.2
    • Fix Version/s: 4.3.4
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      ICEpdf PRO 4.3.2, ICEpdf Viewer

      Description

      While extracting text from attached document I have found that line:

      "last flight (if one was defined for that flight). Regardless of the data,"

      consists of 2 LineText objects:
      1. "last flight (if one was" and
      2. "defined for that flight). Regardless of the data,".

      It looks like space between words "was" and "defined" is missing so if I would search for the word "defined" you will not find it.

      Adding space manualy between LineText objects causes problem in different line:

      "airport reference point latitude/longitude position shows adjacent to the".

      It consists of:
      1. "airport reference p" and
      2. "oint latitude/longitude position shows adjacent to the".

      If I put space between them I will get "airport reference p oint latitude/longitude position shows adjacent to the" and searching for a word "point" fails.
      1. example.pdf
        47 kB
        Evgheni Sadovoi

        Activity

        Repository Revision Date User Message
        ICEsoft Public SVN Repository #30469 Fri Aug 10 14:23:20 MDT 2012 patrick.corless PDF-438 updated text extraction new line detection to round to the nearest int. We had a few corner cases where extra line spaces were being inserted, because of float numbers precision issues.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Evgheni Sadovoi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: