I've taken a closer look at this issue and it comes down two name object encode hex digits. For example font names in question are enocoded as follows:
#b7#bd#d5#fd#b3#ac#b4#d6#ba#da_GBK+ZEMJ7y-1
Where each #XX represent a 2-digit hexadecima code. The current code parses the hex format into an integer and inserts the resulting character code in the into the string. There doesn't seem to be anything wrong with this approach but Java Strings don't treat them as unicode.
I have a workaround code that formats the #xx hex into standard Java Unicode for example #b7 = \u00b7. However I don't know if this is what the end user is expected.
The class org.icepdf.core.pobjects.Name would be updated as follows:
/**
- Utility Method converting Name object hext notation to ascii. For
- example #41 should be represented as 'A'. The hex format will always
- be #XX where XX is a 2 digit hex value. The spec says that # can't be
- used in a string but I guess we'll see.
*
- @param name PDF name object string to be checked for hex codes.
- @return full ascii encoded name string.
*/
private String convertHexChars(StringBuilder name) {
// we need to search for an instance of # and try and convert to hex
try {
for (int i = 0; i < name.length(); i++)
Unknown macro: { if (name.charAt(i) == HEX_CHAR) {
// convert digits to hex.
name.delete(i, i + 3);
name.insert(i, convert(name.substring(i + 1, i + 3)));
} }
} catch (Throwable e)
{
logger.warning("Error parsing hexadecimal characters.");
// we are going to bail on any exception and just return the original
// string.
return name.toString();
}
return name.toString();
}
/**
}
Any feed back on these potential workaround would be appreciated. If it's a valid fix I can add it to the core code base.
sample file