ICEpdf
  1. ICEpdf
  2. PDF-563

Number object in object stream is incorrectly read as a String object

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.3.2, 5.0
    • Fix Version/s: 5.0.1
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      Windows 7 x64 (N/A)
    • Workaround Description:
      Hide
      We can work-around the specific crash mentioned above by checking the Rotate dictionary for a String and replacing it with a number (this being a recurring problem on a client's site due to the peculiar structure of PDFs generated by their scanner) but the underlying parse error could have broader repercussions.
      Show
      We can work-around the specific crash mentioned above by checking the Rotate dictionary for a String and replacing it with a number (this being a recurring problem on a client's site due to the peculiar structure of PDFs generated by their scanner) but the underlying parse error could have broader repercussions.

      Description

      The attached PDF was generated by a MFD scanner and structures the instruction for a page's /Rotate directive in a rather unwieldy way such that an attempt to render it produces this exception:

      java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
       at org.icepdf.core.pobjects.Page.getPageRotation(Page.java:1166)
       at org.icepdf.core.pobjects.Page.getTotalRotation(Page.java:1139)
       at org.icepdf.core.pobjects.Page.getSize(Page.java:966)
       at org.icepdf.core.pobjects.Page.getSize(Page.java:936)
       at org.icepdf.core.pobjects.Document.getPageDimension(Document.java:873)
       at ourcode.renderPage
       
      The document structure is as such: rather than having a nice and easy '/Rotate 0' entry (or no entry at all!), the page dictionary has a /Rotate 14 0 R reference to an object residing in an object stream (11) containing no other objects, where the target object text is simply a single '0' character. A bug in the parser means that this is read as a String object rather than a Number object. getPageRotation then fails as it expects (quite rightly!) a Number.

      Looking at the code in Parser, the problem appears to be that the Number case isn't handled correctly when a number co-incides with the end of a buffer in Parser.getToken(), as it gets caught by the code on line 806 to avoid losing the last grouping of tokens:

                  // if ther are no more bytes (-1) then we should return previous
                  // stringBuffer value, otherwise the last grouping of tokens will
                  // be ignored, which is very bad.
                  if (currentByte >= 0) {
                      currentChar = (char) currentByte;
                  } else {
                      return stringBuffer.toString(); // <<-- here
                  }

      I'm attaching a patch against 5.0.0 which fixes the situation and seems to work on the couple of regular PDFs I threw at it, but I'm sure your testing procedures are much more robust! Our patch just breaks us out of the char reading loop at this point instead, and attempts to identify a number token based on the first character in the token and uses this to determine whether a Float parse should be attempted or the token returned as a string. The patch might also lead to an improperly terminated string literal token being returned, too, but I'm not sure that's important.

      We're currently using 5.0.0 pro but 4.3.2 has the same issue too.
      1. acceptable.pdf
        69 kB
        Ben Day
      2. PDF-563-Parser-patch.diff
        2 kB
        Ben Day

        Activity

        Hide
        Patrick Corless added a comment -

        Thanks for the patch Ben. I'll run it though QA and see if there are any side effects.

        Show
        Patrick Corless added a comment - Thanks for the patch Ben. I'll run it though QA and see if there are any side effects.
        Hide
        Patrick Corless added a comment -

        Closing

        Show
        Patrick Corless added a comment - Closing

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Ben Day
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: