Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 6.0
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      PRO

      Description

      The attache file combines CIDFontType2 Asian and Roman glyphs. For some reason the encoding is not correctly being applied and the incorrect glyphs are being rendered.

      Further investigation is needed but this should be fixable.
      1. assian_test.pdf
        1.09 MB
        Patrick Corless

        Activity

        Patrick Corless created issue -
        Hide
        Patrick Corless added a comment -

        sample file

        Show
        Patrick Corless added a comment - sample file
        Patrick Corless made changes -
        Field Original Value New Value
        Attachment assian_test.pdf [ 13083 ]
        Hide
        Patrick Corless added a comment -

        I've taken a closer look at this issue and it comes down two name object encode hex digits. For example font names in question are enocoded as follows:

        #b7#bd#d5#fd#b3#ac#b4#d6#ba#da_GBK+ZEMJ7y-1

        Where each #XX represent a 2-digit hexadecima code. The current code parses the hex format into an integer and inserts the resulting character code in the into the string. There doesn't seem to be anything wrong with this approach but Java Strings don't treat them as unicode.

        I have a workaround code that formats the #xx hex into standard Java Unicode for example #b7 = \u00b7. However I don't know if this is what the end user is expected.

        The class org.icepdf.core.pobjects.Name would be updated as follows:

        /**

        • Utility Method converting Name object hext notation to ascii. For
        • example #41 should be represented as 'A'. The hex format will always
        • be #XX where XX is a 2 digit hex value. The spec says that # can't be
        • used in a string but I guess we'll see.
          *
        • @param name PDF name object string to be checked for hex codes.
        • @return full ascii encoded name string.
          */
          private String convertHexChars(StringBuilder name) {
          // we need to search for an instance of # and try and convert to hex
          try {
          for (int i = 0; i < name.length(); i++)
          Unknown macro: { if (name.charAt(i) == HEX_CHAR) { // convert digits to hex. name.delete(i, i + 3); name.insert(i, convert(name.substring(i + 1, i + 3))); } }

          } catch (Throwable e)

          { logger.warning("Error parsing hexadecimal characters."); // we are going to bail on any exception and just return the original // string. return name.toString(); }

          return name.toString();
          }

        /**

        • Converts a hext string to formated unicode string.
        • @param hex 2-digit hex number.
        • @return
          */
          private String convert(String hex) {
          StringBuilder output = new StringBuilder();
          output.append("
          u"); // standard unicode format.
          for (int j = 0, max = 4 - hex.length(); j < max; j++) { output.append("0"); }

          output.append(hex.toLowerCase());
          return output.toString();

        }

        Any feed back on these potential workaround would be appreciated. If it's a valid fix I can add it to the core code base.

        Show
        Patrick Corless added a comment - I've taken a closer look at this issue and it comes down two name object encode hex digits. For example font names in question are enocoded as follows: #b7#bd#d5#fd#b3#ac#b4#d6#ba#da_GBK+ZEMJ7y-1 Where each #XX represent a 2-digit hexadecima code. The current code parses the hex format into an integer and inserts the resulting character code in the into the string. There doesn't seem to be anything wrong with this approach but Java Strings don't treat them as unicode. I have a workaround code that formats the #xx hex into standard Java Unicode for example #b7 = \u00b7. However I don't know if this is what the end user is expected. The class org.icepdf.core.pobjects.Name would be updated as follows: /** Utility Method converting Name object hext notation to ascii. For example #41 should be represented as 'A'. The hex format will always be #XX where XX is a 2 digit hex value. The spec says that # can't be used in a string but I guess we'll see. * @param name PDF name object string to be checked for hex codes. @return full ascii encoded name string. */ private String convertHexChars(StringBuilder name) { // we need to search for an instance of # and try and convert to hex try { for (int i = 0; i < name.length(); i++) Unknown macro: { if (name.charAt(i) == HEX_CHAR) { // convert digits to hex. name.delete(i, i + 3); name.insert(i, convert(name.substring(i + 1, i + 3))); } } } catch (Throwable e) { logger.warning("Error parsing hexadecimal characters."); // we are going to bail on any exception and just return the original // string. return name.toString(); } return name.toString(); } /** Converts a hext string to formated unicode string. @param hex 2-digit hex number. @return */ private String convert(String hex) { StringBuilder output = new StringBuilder(); output.append(" u"); // standard unicode format. for (int j = 0, max = 4 - hex.length(); j < max; j++) { output.append("0"); } output.append(hex.toLowerCase()); return output.toString(); } Any feed back on these potential workaround would be appreciated. If it's a valid fix I can add it to the core code base.
        Patrick Corless made changes -
        Workaround Exists [Yes]
        Salesforce Case []
        Fix Version/s 4.3 [ 10266 ]
        Fix Version/s 4.2.2 [ 10265 ]
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #27164 Thu Jan 12 08:53:12 MST 2012 patrick.corless PDF-288 updated name class to convert the "#Hex" notation to unicode, so font names can be more easily read.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/Name.java
        Hide
        Patrick Corless added a comment -

        I've applied the naming parsing change but the document in question still has a mix of embedded CID font and non embedded CID fonts so getting it to fully render will be difficult without the fonts used to encode it.

        marking issue as won't fix for now.

        Show
        Patrick Corless added a comment - I've applied the naming parsing change but the document in question still has a mix of embedded CID font and non embedded CID fonts so getting it to fully render will be difficult without the fonts used to encode it. marking issue as won't fix for now.
        Patrick Corless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Ken Fyten made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Patrick Corless made changes -
        Resolution Won't Fix [ 2 ]
        Status Closed [ 6 ] Reopened [ 4 ]
        Patrick Corless made changes -
        Fix Version/s 5.2 [ 10970 ]
        Fix Version/s 4.3 [ 10266 ]
        Hide
        Patrick Corless added a comment -

        After a bunch of work we are know correctly rendering most japan and chinese based document regardless of the fonts being embedded or not. The core operating system still needs to have a fonts that can render the unicode characters.

        Show
        Patrick Corless added a comment - After a bunch of work we are know correctly rendering most japan and chinese based document regardless of the fonts being embedded or not. The core operating system still needs to have a fonts that can render the unicode characters.
        Patrick Corless made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Patrick Corless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: