Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 5.1.1
-
Fix Version/s: 5.1.2
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:N/A
Description
I have a PDF where the page contents directory contains two streams. The first one consists of
{code}
/Basemap_Form Do
{code}
the second one starts with
{code}
q
1.0 0.0 0.0 1.0 103.680000 51.840000 cm
/LGIT:W Do
Q
{code}
OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.
During parsing the parser sees the following sequence of tokens
{code}
/Basemap_Form
Doq
1.0
0.0
...
{code}
The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.
Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.
{code}
/Basemap_Form Do
{code}
the second one starts with
{code}
q
1.0 0.0 0.0 1.0 103.680000 51.840000 cm
/LGIT:W Do
Q
{code}
OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.
During parsing the parser sees the following sequence of tokens
{code}
/Basemap_Form
Doq
1.0
0.0
...
{code}
The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.
Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Added the extra space while we assemble the streams[]. Marking as resolved.