Details
-
Type: Improvement
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 4.3
-
Fix Version/s: 5.0.0 alpha1, 5.0.0 beta1, 5.0
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:any
Description
When building the PostScript calculator for type 4 function support I did quite a bit of research into parsing techniques. The end result was relatively quick parsing engine. Once this work was completed I started working on a new PDF Content Parser system using the same techniques. In theory the new parser should be in the order of 50x faster the current one.
The ContentParser in ICEpdf is tightly coupled with the the generic Parser class. The Parser class feeds the Content Parser tokens for processing. This Parser is multipurpose handling both stream and dictionary parsing as well as providing tokens in a page content stream. The main problem here is that content stream operand tokens are returned as strings from the parser and then .equals is used by the content Parser to execute a found command. There are 90 plus operand tokens which is a a lot of comparison that we could be doing more efficiently.
One further problem with the Parser class is that it assumes that a content stream is always well formed and that operands, names and number will always be white space separated. This is not the case and a new setup should be able to determine tokens even if spaces are not present.
I've already done quite a bit of work on this. I will likely create a 4.3 branch and use the trunk to start checking in work for this optimization.
The ContentParser in ICEpdf is tightly coupled with the the generic Parser class. The Parser class feeds the Content Parser tokens for processing. This Parser is multipurpose handling both stream and dictionary parsing as well as providing tokens in a page content stream. The main problem here is that content stream operand tokens are returned as strings from the parser and then .equals is used by the content Parser to execute a found command. There are 90 plus operand tokens which is a a lot of comparison that we could be doing more efficiently.
One further problem with the Parser class is that it assumes that a content stream is always well formed and that operands, names and number will always be white space separated. This is not the case and a new setup should be able to determine tokens even if spaces are not present.
I've already done quite a bit of work on this. I will likely create a 4.3 branch and use the trunk to start checking in work for this optimization.
Activity
Patrick Corless
created issue -
Patrick Corless
made changes -
Field | Original Value | New Value |
---|---|---|
Salesforce Case | [] | |
Fix Version/s | 5.0 [ 10314 ] |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #32658 | Fri Dec 07 11:24:14 MST 2012 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/Parser.java
MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/ImageUtility.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/Lexer.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/TilingPattern.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/LexerTest.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #32677 | Sun Dec 09 10:35:38 MST 2012 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java
MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/ImageUtility.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/Lexer.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/LexerTest.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33684 | Wed Feb 27 17:25:29 MST 2013 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/annotations/TextAnnotation.java
|
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33685 | Wed Feb 27 17:28:06 MST 2013 | patrick.corless | |
Files Changed | ||||
ADD
/icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/acroform/FieldDictionary.java
ADD /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/annotations/WidgetAnnotation.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/annotations/Annotation.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/TilingPattern.java ADD /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/acroform MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/annotations/LinkAnnotation.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33686 | Wed Feb 27 17:34:30 MST 2013 | patrick.corless | |
Files Changed | ||||
DEL
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/OperandNames.java
DEL /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/Lexer.java DEL /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/OperatorFactory.java DEL /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/LexerTest.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33721 | Fri Mar 01 14:09:10 MST 2013 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/viewer/src/org/icepdf/ri/common/views/PageViewComponentImpl.java
MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/Page.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/Library.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33771 | Tue Mar 05 13:36:28 MST 2013 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/io/SeekableInputConstrainedWrapper.java
|
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33780 | Tue Mar 05 16:32:50 MST 2013 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/Document.java
|
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33792 | Wed Mar 06 08:21:24 MST 2013 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/AbstractContentParser.java
|
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33793 | Wed Mar 06 08:22:53 MST 2013 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/Form.java
|
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #33803 | Wed Mar 06 13:51:07 MST 2013 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/AbstractContentParser.java
|
Patrick Corless
made changes -
Status | Open [ 1 ] | Resolved [ 5 ] |
Fix Version/s | 5.0.0 beta1 [ 10677 ] | |
Fix Version/s | 5.0.0 alpha1 [ 10676 ] | |
Resolution | Fixed [ 1 ] |
Patrick Corless
made changes -
Status | Resolved [ 5 ] | Closed [ 6 ] |
A rather massive check in will follow these comments. A new content parser has been cut into the core. New features include but not limited to.
PDF-13has been solved and multiple threads can now access a document's pages.