Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 4.0
-
Fix Version/s: 4.0.1
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:OS rendering core
-
ICEsoft Forum Reference:
-
Workaround Description:See the comments in the main issue tracker thread.
Description
The forum poster noticed that the extracted text in a PDF document was missing some punctuation. I dog a little deeper and found that we where correctly apply the character encoding if a toUnicode cmap does not exist.
The following is a quick patch for the problem but will not be the final fix. I will update the interface code to make sure we handle this in a more generic maner.
Step One.
Create a new method in org.icepdf.core.pobjects.fonts.ofont.Ofont.java
public char toUnicode(char c1) {
char c = toUnicode==null?getCharDiff(c1):c1;
c = getCMapping(c);
if (!awtFont.canDisplay(c)) {
c |= 0xF000;
}
if (!awtFont.canDisplay(c)) {
c = findAlternateSymbol(c);
}
return c;
}
Step 2
Call the new toUnicode method from the content parser. Just after the 'charValue' is defined in drawString(..) ~ line 2160 add the following code.
if (textState.currentfont instanceof org.icepdf.core.pobjects.fonts.ofont.OFont){
charValue = ((org.icepdf.core.pobjects.fonts.ofont.OFont)
textState.currentfont).toUnicode(unmodifiedDisplayText.charAt(i));
}
Once again this is not an official patch just a work around.
The following is a quick patch for the problem but will not be the final fix. I will update the interface code to make sure we handle this in a more generic maner.
Step One.
Create a new method in org.icepdf.core.pobjects.fonts.ofont.Ofont.java
public char toUnicode(char c1) {
char c = toUnicode==null?getCharDiff(c1):c1;
c = getCMapping(c);
if (!awtFont.canDisplay(c)) {
c |= 0xF000;
}
if (!awtFont.canDisplay(c)) {
c = findAlternateSymbol(c);
}
return c;
}
Step 2
Call the new toUnicode method from the content parser. Just after the 'charValue' is defined in drawString(..) ~ line 2160 add the following code.
if (textState.currentfont instanceof org.icepdf.core.pobjects.fonts.ofont.OFont){
charValue = ((org.icepdf.core.pobjects.fonts.ofont.OFont)
textState.currentfont).toUnicode(unmodifiedDisplayText.charAt(i));
}
Once again this is not an official patch just a work around.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Issues appears to be resolved.