ICEpdf
  1. ICEpdf
  2. PDF-571

Linear traversal offsets are incorrect

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 6.0
    • Fix Version/s: 6.0
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      ICEpdf 5.0.0 open source release

      Description

      When a PDF file is decoded using linear traversal, the offsets in the file are calculated during parsing rather than using offsets from the xref tables. There seem to be errors in the offset calculation causing parsing errors later on.

      The snippet below mimics a scenario where a file was decoded, soft references were cleared and the LazyObjectLoader subsequently attempts to reload an object. This results in an EmptyStackException due to the file offset being incorrect.

      {code}Document d = new Document();
      d.setFile( "USGS NJ_Jersey_City_20110609_TM_geo.pdf" );
      Page page = d.getPageTree().getPage( 0 );
      page.init();

      Library library = page.getLibrary();
      Reference reference = new Reference( 496, 0 );
      library.getObject( reference );
      Field lolField = library.getClass().getDeclaredField( "lazyObjectLoader" );
      lolField.setAccessible( true );
      LazyObjectLoader lol = ( LazyObjectLoader ) lolField.get( library );
      lol.loadObject( reference );{code}

        Activity

        Hide
        Pepijn Van Eeckhoudt added a comment -

        The test file USGS NJ_Jersey_City_20110609_TM_geo.pdf is too large to attach to this issue. It can be obtained from the USGS at http://ims.er.usgs.gov/gda_services/download?item_id=5268881&quad=Jersey%20City&state=NJ&grid=7.5X7.5&series=TNM%20GeoPDF

        Show
        Pepijn Van Eeckhoudt added a comment - The test file USGS NJ_Jersey_City_20110609_TM_geo.pdf is too large to attach to this issue. It can be obtained from the USGS at http://ims.er.usgs.gov/gda_services/download?item_id=5268881&quad=Jersey%20City&state=NJ&grid=7.5X7.5&series=TNM%20GeoPDF
        Hide
        Patrick Corless added a comment -

        Thanks for posting this one, very cool file. There is a parsing error when the documents trailer, I'll try to sneak this into 5.0.1 as its a great test for layers.

        Show
        Patrick Corless added a comment - Thanks for posting this one, very cool file. There is a parsing error when the documents trailer, I'll try to sneak this into 5.0.1 as its a great test for layers.
        Hide
        Pepijn Van Eeckhoudt added a comment - - edited

        Related to layers, I think the current visibility determination is not quite correct. In 5.0.0 only the visibility value of the leaf nodes is checked rather than taking into account the visibility of the parents as well. I've created PDF-573 for this issue.

        Show
        Pepijn Van Eeckhoudt added a comment - - edited Related to layers, I think the current visibility determination is not quite correct. In 5.0.0 only the visibility value of the leaf nodes is checked rather than taking into account the visibility of the parents as well. I've created PDF-573 for this issue.
        Hide
        Patrick Corless added a comment -

        Check in a change that fixes the xref table so this file can correctly be traversed. I'll check the offset with a simpler file.

        Show
        Patrick Corless added a comment - Check in a change that fixes the xref table so this file can correctly be traversed. I'll check the offset with a simpler file.
        Hide
        Pepijn Van Eeckhoudt added a comment - - edited

        There were a couple of things in the code that looked suspicious and seem to have some effect on the offsets. I didn't have sufficient time to get things to a working state.

        • BufferedMarkedInputStream#reset contains 'if (markpos > 0)' this should be 'if (markpos >= 0)'. You hit this in the case where reset is called immediately after a buffer fill.
        • Document has two variants for skipPastAnyPrefixJunk which return different results. In the test document objectsOffset ended up being 9 iirc where it should actually just be zero. It would be best if there is only one variant of that function.
        • Parser creates an instance of BufferedMarkedInputStream, but does not initialize fillCount with the current position in the stream. I guess objectsOffset could compensate for that, but it seems like a fairly brittle solution.
        • BufferedMarkedInputStream#fillCount is a pretty bad name for that field
        Show
        Pepijn Van Eeckhoudt added a comment - - edited There were a couple of things in the code that looked suspicious and seem to have some effect on the offsets. I didn't have sufficient time to get things to a working state. BufferedMarkedInputStream#reset contains 'if (markpos > 0)' this should be 'if (markpos >= 0)'. You hit this in the case where reset is called immediately after a buffer fill. Document has two variants for skipPastAnyPrefixJunk which return different results. In the test document objectsOffset ended up being 9 iirc where it should actually just be zero. It would be best if there is only one variant of that function. Parser creates an instance of BufferedMarkedInputStream, but does not initialize fillCount with the current position in the stream. I guess objectsOffset could compensate for that, but it seems like a fairly brittle solution. BufferedMarkedInputStream#fillCount is a pretty bad name for that field
        Hide
        Patrick Corless added a comment -

        Marking as fixed.

        Show
        Patrick Corless added a comment - Marking as fixed.
        Hide
        Patrick Corless added a comment -

        Marking as closed

        Show
        Patrick Corless added a comment - Marking as closed

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Pepijn Van Eeckhoudt
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: