ICEpdf
  1. ICEpdf
  2. PDF-1173

Document.getPageText hangs on some files

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 6.2.4
    • Fix Version/s: None
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      Windows 10 64-bit, Java 1.8.0_131-b11

      Description

      Hello,

      When I try to extract text from some PDFs, it's getting stuck on Document.getPageText().

      It looks like an infinite loop in the parser code. I found the following similar tickets reported and fixed in the past:
      http://jira.icesoft.org/browse/PDF-846
      http://jira.icesoft.org/browse/PDF-689

      Here is the thread dump:

      "reactor1" #13 prio=5 os_prio=0 tid=0x000000001d488800 nid=0xe1c runnable [0x000000002163e000]
         java.lang.Thread.State: RUNNABLE
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.a.a(Unknown Source)
      at org.icepdf.core.util.content.NContentParser.parseText(Unknown Source)
      at org.icepdf.core.util.content.NContentParser.parseTextBlocks(Unknown Source)
      at org.icepdf.core.pobjects.Page.getText(Page.java:1571)
      - locked <0x0000000080025d28> (a org.icepdf.core.pobjects.Page)
      at org.icepdf.core.pobjects.Document.getPageText(Document.java:1174)

      Unfortunately, I can't provide a sample document, because it's very sensitive data. But I'm happy to run any tests to debug the issue locally and report the results to you, if it's possible.

      Thanks.

        Activity

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Igor R
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: