Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 5.0.1
-
Fix Version/s: 5.0.2
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:Pro, JAI, -Xmx512m
-
Salesforce Case Reference:
Description
If memory isn't allocated, the test PDF will run out of java heap memory. I've added 512mb which does seem to help but eventually the sample PDF viewer will crash with the following:
WARNING: Fatal error parsing PDF file stream.
java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:85)
at java.util.Stack.pop(Stack.java:67)
at org.icepdf.core.util.Parser.getObject(Parser.java:143)
at org.icepdf.core.util.LazyObjectLoader.loadObject(LazyObjectLoader.java:73)
at org.icepdf.core.util.Library.getObject(Library.java:123)
at org.icepdf.core.pobjects.PageTree.getPageOrPagesPotentiallyNotInitedFromRefe
renceAt(PageTree.java:238)
at org.icepdf.core.pobjects.PageTree.getPagePotentiallyNotInitedByRecursiveInde
x(PageTree.java:257)
at org.icepdf.core.pobjects.PageTree.getPage(PageTree.java:326)
at org.icepdf.core.pobjects.Document.getPageText(Document.java:1119)
at test.IcePdfCrashTest.callPageText(IcePdfCrashTest.java:38)
at test.IcePdfCrashTest.main(IcePdfCrashTest.java:24)
java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:85)
at java.util.Stack.pop(Stack.java:67)
at org.icepdf.core.util.Parser.getObject(Parser.java:143)
at org.icepdf.core.util.LazyObjectLoader.loadObject(LazyObjectLoader.java:73)
at org.icepdf.core.util.Library.getObject(Library.java:123)
at org.icepdf.core.pobjects.PageTree.getPageOrPagesPotentiallyNotInitedFromRefe
renceAt(PageTree.java:238)
at org.icepdf.core.pobjects.PageTree.getPagePotentiallyNotInitedByRecursiveInde
x(PageTree.java:257)
at org.icepdf.core.pobjects.PageTree.getPage(PageTree.java:326)
at org.icepdf.core.pobjects.Document.getPageText(Document.java:1119)
at test.IcePdfCrashTest.callPageText(IcePdfCrashTest.java:38)
at test.IcePdfCrashTest.main(IcePdfCrashTest.java:24)
May 30, 2013 4:33:55 PM org.icepdf.core.util.Parser getObject
Is this simply a case of needing more memory? The PDF is 300mb+.
WARNING: Fatal error parsing PDF file stream.
java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:85)
at java.util.Stack.pop(Stack.java:67)
at org.icepdf.core.util.Parser.getObject(Parser.java:143)
at org.icepdf.core.util.LazyObjectLoader.loadObject(LazyObjectLoader.java:73)
at org.icepdf.core.util.Library.getObject(Library.java:123)
at org.icepdf.core.pobjects.PageTree.getPageOrPagesPotentiallyNotInitedFromRefe
renceAt(PageTree.java:238)
at org.icepdf.core.pobjects.PageTree.getPagePotentiallyNotInitedByRecursiveInde
x(PageTree.java:257)
at org.icepdf.core.pobjects.PageTree.getPage(PageTree.java:326)
at org.icepdf.core.pobjects.Document.getPageText(Document.java:1119)
at test.IcePdfCrashTest.callPageText(IcePdfCrashTest.java:38)
at test.IcePdfCrashTest.main(IcePdfCrashTest.java:24)
java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:85)
at java.util.Stack.pop(Stack.java:67)
at org.icepdf.core.util.Parser.getObject(Parser.java:143)
at org.icepdf.core.util.LazyObjectLoader.loadObject(LazyObjectLoader.java:73)
at org.icepdf.core.util.Library.getObject(Library.java:123)
at org.icepdf.core.pobjects.PageTree.getPageOrPagesPotentiallyNotInitedFromRefe
renceAt(PageTree.java:238)
at org.icepdf.core.pobjects.PageTree.getPagePotentiallyNotInitedByRecursiveInde
x(PageTree.java:257)
at org.icepdf.core.pobjects.PageTree.getPage(PageTree.java:326)
at org.icepdf.core.pobjects.Document.getPageText(Document.java:1119)
at test.IcePdfCrashTest.callPageText(IcePdfCrashTest.java:38)
at test.IcePdfCrashTest.main(IcePdfCrashTest.java:24)
May 30, 2013 4:33:55 PM org.icepdf.core.util.Parser getObject
Is this simply a case of needing more memory? The PDF is 300mb+.
The reason that 5.0 requires more memory to load this PDF is because of a failure when loading the xref table and as a result a linear traversal of the PDF must take place. I've isolated the issue to a number parsing issue in the Parser class. The file in question is quite quite long at 307MB and as result the number that represents the xref is quite large.
In 5.0 a customer number parsing algorithm was introduced that would return a float. Unfortunately for this file the float is too small and a double needs to be used. I've updated parser to returns doubles for all numbers. Not optimal but easier then putting in a specially case implementation.