Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.1
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      ICEpdf OS, PRO version if OK.

      Description

      The PDF in question has CID font and a respective toUniocde CMap file. There is a small amount of text that is not being mapped correctly from CID to unicode. Further investigation is needed to look into the cause of the mapping issue. My guess is that the CMap parser is incomplete.

        Activity

        Patrick Corless created issue -
        Hide
        Patrick Corless added a comment -

        CID test file

        Show
        Patrick Corless added a comment - CID test file
        Patrick Corless made changes -
        Field Original Value New Value
        Attachment Test document for decoding issue.pdf [ 11749 ]
        Patrick Corless made changes -
        Salesforce Case []
        Fix Version/s 3.1 [ 10181 ]
        Ken Fyten made changes -
        Salesforce Case []
        Assignee Priority P1
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #19368 Wed Oct 07 17:58:57 MDT 2009 patrick.corless PDF-17 - fixed issues with parsing multiple entires for beginbfchar and beginbfrange definitions in the same cmap definition and added support for the cmap beginbfrange notation <src1> <srcn> [<dest1> <dest2> ...]
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/fonts/ofont/CMap.java
        Patrick Corless made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Patrick Corless added a comment -

        This turned out to be a very interesting bug. The file in question showed a couple issue with our cmap parsers for both the Pro and OS version of ICEpdf.

        Both the pro and os version of the cmap did not correct handle the cmap entries with values for beginbfrange in the format <src1> <srcn> [<dest1> <dest2> ...]

        Also the OS version did not correctly handle multiple entires for beginbfchar and beginbfrange definitions.

        As a result both the OS and PRO version do a much better job at text extraction and font substitution. I've increased the severity of the cmap parsing errors so that they will be more visible when they occur, hopefully this will help identify any future issue.

        Show
        Patrick Corless added a comment - This turned out to be a very interesting bug. The file in question showed a couple issue with our cmap parsers for both the Pro and OS version of ICEpdf. Both the pro and os version of the cmap did not correct handle the cmap entries with values for beginbfrange in the format <src1> <srcn> [<dest1> <dest2> ...] Also the OS version did not correctly handle multiple entires for beginbfchar and beginbfrange definitions. As a result both the OS and PRO version do a much better job at text extraction and font substitution. I've increased the severity of the cmap parsing errors so that they will be more visible when they occur, hopefully this will help identify any future issue.
        Patrick Corless made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Patrick Corless added a comment -

        ICEpdf 3.1.0 has been released, closing issues.

        Show
        Patrick Corless added a comment - ICEpdf 3.1.0 has been released, closing issues.
        Patrick Corless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: