Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 4.4.1, 5.0
-
Fix Version/s: 5.0.3
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:any
-
Salesforce Case Reference:
Description
The PDF in question (on support drive) contains a compressed object stream which is technically malformed:
bytes 7 0 5 136 4 194 3 243 9 275 10 323 1 330 2 420<</CreationDate<443A32303133303332363038333732392D303827303027>/Creator(HP Smart Document Scan Software 3.60)/Producer(OmniPageCSDK18)>><</Dest[1 0 R/XYZ 0 785 null]/Parent 4 0 R/Title(Page 1)>><</Count 1/First 5 0 R/Last 5 0 R/Type/Outlines>>[/PDF/Text/ImageB/ImageC/ImageI]<</ProcSet 3 0 R/XObject<</ImagePart_0 6 0 R>>>>[8 0 R]<</Contents 10 0 R/MediaBox[0 0 612 785]/Parent 2 0 R/Resources 9 0 R/Rotate 0/Type/Page>><</Count 1/Kids[1 0 R]/Type/Pages>>
Our parser is expected a space between 420 and << and keeps on going until the next number is encountered and as a result reports in the incorrect byte offset for the object 2.
bytes 7 0 5 136 4 194 3 243 9 275 10 323 1 330 2 420<</CreationDate<443A32303133303332363038333732392D303827303027>/Creator(HP Smart Document Scan Software 3.60)/Producer(OmniPageCSDK18)>><</Dest[1 0 R/XYZ 0 785 null]/Parent 4 0 R/Title(Page 1)>><</Count 1/First 5 0 R/Last 5 0 R/Type/Outlines>>[/PDF/Text/ImageB/ImageC/ImageI]<</ProcSet 3 0 R/XObject<</ImagePart_0 6 0 R>>>>[8 0 R]<</Contents 10 0 R/MediaBox[0 0 612 785]/Parent 2 0 R/Resources 9 0 R/Rotate 0/Type/Page>><</Count 1/Kids[1 0 R]/Type/Pages>>
Our parser is expected a space between 420 and << and keeps on going until the next number is encountered and as a result reports in the incorrect byte offset for the object 2.
Good news there is a fairly easy parser tweak to make sure we stop parsing the number when non digit is encountered. Closing.