Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 5.0.5
-
Component/s: Core/Parsing, Font Engine
-
Labels:None
-
Environment:PRO
-
Salesforce Case Reference:
Description
The file in question contains fairly simple text content that is in readable ascii. For some reason the font engine is not correctly mapping the the character ID to the proper glyph id.
This one took quite a while to narrow down. As it turns out the incorrect encoding was being set, the fonts in question are using an Identity-H encoding and define a toUnicode cmap as a name. In our font code we don't parse a named toUnicode cmap and as a result the fall back code assigns an identity mapping for both the font and the toUnicodeMap which is incorrect and is responsible for the jumbled characters. I've added some code to detect and assign the correct cmap for a named toUnicode value. This addresses the rendering issue as well as the text extraction issue with the document in question.