Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 5.1.1
-
Fix Version/s: 5.1.2
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:N/A
Description
I have a PDF where the page contents directory contains two streams. The first one consists of
{code}
/Basemap_Form Do
{code}
the second one starts with
{code}
q
1.0 0.0 0.0 1.0 103.680000 51.840000 cm
/LGIT:W Do
Q
{code}
OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.
During parsing the parser sees the following sequence of tokens
{code}
/Basemap_Form
Doq
1.0
0.0
...
{code}
The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.
Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.
{code}
/Basemap_Form Do
{code}
the second one starts with
{code}
q
1.0 0.0 0.0 1.0 103.680000 51.840000 cm
/LGIT:W Do
Q
{code}
OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.
During parsing the parser sees the following sequence of tokens
{code}
/Basemap_Form
Doq
1.0
0.0
...
{code}
The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.
Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #44159 | Tue Mar 03 10:21:34 MST 2015 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/OContentParser.java
|
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #44134 | Fri Feb 27 13:26:54 MST 2015 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/util/content/OContentParser.java
MODIFY /icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/pobjects/Page.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #44131 | Fri Feb 27 08:50:34 MST 2015 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/pobjects/Page.java
|