ICEpdf
  1. ICEpdf
  2. PDF-841

OContentParser concatenates content streams incorrectly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.1.1
    • Fix Version/s: 5.1.2
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      N/A

      Description

      I have a PDF where the page contents directory contains two streams. The first one consists of
      {code}
      /Basemap_Form Do
      {code}
      the second one starts with
      {code}
      q
      1.0 0.0 0.0 1.0 103.680000 51.840000 cm
      /LGIT:W Do
      Q
      {code}

      OContentParser#parse receives these streams as two byte[] objects which are then concatenated using a ByteDoubleArrayInputStream which presents the byte[]s as a single concatenated InputStream. This is causes incorrect parsing.

      During parsing the parser sees the following sequence of tokens
      {code}
      /Basemap_Form
      Doq
      1.0
      0.0
      ...
      {code}

      The Doq token is the result of concatenating the two streams without introducing a white-space character to separate the Do and q tokens. The PDF spec on Page/Contents states that the division between streams is always at a lexical token boundary, so the parser needs to insert a token boundary between the streams somehow.

      Using org.icepdf.core.io.SequenceInputStream with a ' ' separator character resolves the parsing problem.

        Activity

        Pepijn Van Eeckhoudt created issue -
        Patrick Corless made changes -
        Field Original Value New Value
        Fix Version/s 5.1.2 [ 11872 ]
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #44131 Fri Feb 27 08:50:34 MST 2015 patrick.corless PDF-841 touched up the content stream appending to include a space between each stream.
        Files Changed
        Commit graph MODIFY /icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/pobjects/Page.java
        Hide
        Patrick Corless added a comment -

        Added the extra space while we assemble the streams[]. Marking as resolved.

        Show
        Patrick Corless added a comment - Added the extra space while we assemble the streams[]. Marking as resolved.
        Patrick Corless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #44134 Fri Feb 27 13:26:54 MST 2015 patrick.corless PDF-841 fixed stream concatenation issue with missing spaces between streams.
        Files Changed
        Commit graph MODIFY /icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/util/content/OContentParser.java
        Commit graph MODIFY /icepdf/branches/icepdf-5.0.1/icepdf/core/src/org/icepdf/core/pobjects/Page.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #44159 Tue Mar 03 10:21:34 MST 2015 patrick.corless PDF-841 fixed stream concatenation issue with missing spaces between streams.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/OContentParser.java
        Patrick Corless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Pepijn Van Eeckhoudt
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: