Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 5.1.1
-
Fix Version/s: 5.1.2
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:N/A
Description
I have a PDF where the page contents directory contains two streams. The first one consists of
{code}
/Basemap_Form Do
{code}
the second one starts with
{code}
q
1.0 0.0 0.0 1.0 103.680000 51.840000 cm
/LGIT:W Do
Q
{code}
OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.
During parsing the parser sees the following sequence of tokens
{code}
/Basemap_Form
Doq
1.0
0.0
...
{code}
The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.
Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.
{code}
/Basemap_Form Do
{code}
the second one starts with
{code}
q
1.0 0.0 0.0 1.0 103.680000 51.840000 cm
/LGIT:W Do
Q
{code}
OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.
During parsing the parser sees the following sequence of tokens
{code}
/Basemap_Form
Doq
1.0
0.0
...
{code}
The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.
Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Patrick Corless
made changes -
Status | Resolved [ 5 ] | Closed [ 6 ] |
Patrick Corless
made changes -
Status | Open [ 1 ] | Resolved [ 5 ] |
Resolution | Fixed [ 1 ] |
Patrick Corless
made changes -
Field | Original Value | New Value |
---|---|---|
Fix Version/s | 5.1.2 [ 11872 ] |
Pepijn Van Eeckhoudt
created issue -