[PDF-841] OContentParser concatenates content streams incorrectly - ICEsoft JIRA Issue Tracker

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 5.1.1
Fix Version/s: 5.1.2
Component/s: Core/Parsing
Labels:
None
Environment:
N/A

Description

I have a PDF where the page contents directory contains two streams. The first one consists of
{code}
/Basemap_Form Do
{code}
the second one starts with
{code}
q
1.0 0.0 0.0 1.0 103.680000 51.840000 cm
/LGIT:W Do
Q
{code}

OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.

During parsing the parser sees the following sequence of tokens
{code}
/Basemap_Form
Doq
1.0
0.0
...
{code}

The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.

Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.

Activity

Repository	Revision	Date	User	Message
ICEsoft Public SVN Repository	#44159	Tue Mar 03 10:21:34 MST 2015	patrick.corless	~~PDF-841~~ fixed stream concatenation issue with missing spaces between streams.
				Files Changed
				MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/OContentParser.java

Repository	Revision	Date	User	Message
ICEsoft Public SVN Repository	#44134	Fri Feb 27 13:26:54 MST 2015	patrick.corless	~~PDF-841~~ fixed stream concatenation issue with missing spaces between streams.
				Files Changed
				MODIFY /icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/util/content/OContentParser.java MODIFY /icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/pobjects/Page.java

Repository	Revision	Date	User	Message
ICEsoft Public SVN Repository	#44131	Fri Feb 27 08:50:34 MST 2015	patrick.corless	~~PDF-841~~ touched up the content stream appending to include a space between each stream.
				Files Changed
				MODIFY /icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/pobjects/Page.java

People

Assignee:

Patrick Corless

Reporter:

Pepijn Van Eeckhoudt

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

16/Dec/14 4:40 AM

Updated:

01/Apr/15 3:01 PM

Resolved:

27/Feb/15 8:56 AM