Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 4.0
-
Fix Version/s: 4.0.1
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:OS rendering core
-
ICEsoft Forum Reference:
-
Workaround Description:See the comments in the main issue tracker thread.
Description
The forum poster noticed that the extracted text in a PDF document was missing some punctuation. I dog a little deeper and found that we where correctly apply the character encoding if a toUnicode cmap does not exist.
The following is a quick patch for the problem but will not be the final fix. I will update the interface code to make sure we handle this in a more generic maner.
Step One.
Create a new method in org.icepdf.core.pobjects.fonts.ofont.Ofont.java
public char toUnicode(char c1) {
char c = toUnicode==null?getCharDiff(c1):c1;
c = getCMapping(c);
if (!awtFont.canDisplay(c)) {
c |= 0xF000;
}
if (!awtFont.canDisplay(c)) {
c = findAlternateSymbol(c);
}
return c;
}
Step 2
Call the new toUnicode method from the content parser. Just after the 'charValue' is defined in drawString(..) ~ line 2160 add the following code.
if (textState.currentfont instanceof org.icepdf.core.pobjects.fonts.ofont.OFont){
charValue = ((org.icepdf.core.pobjects.fonts.ofont.OFont)
textState.currentfont).toUnicode(unmodifiedDisplayText.charAt(i));
}
Once again this is not an official patch just a work around.
The following is a quick patch for the problem but will not be the final fix. I will update the interface code to make sure we handle this in a more generic maner.
Step One.
Create a new method in org.icepdf.core.pobjects.fonts.ofont.Ofont.java
public char toUnicode(char c1) {
char c = toUnicode==null?getCharDiff(c1):c1;
c = getCMapping(c);
if (!awtFont.canDisplay(c)) {
c |= 0xF000;
}
if (!awtFont.canDisplay(c)) {
c = findAlternateSymbol(c);
}
return c;
}
Step 2
Call the new toUnicode method from the content parser. Just after the 'charValue' is defined in drawString(..) ~ line 2160 add the following code.
if (textState.currentfont instanceof org.icepdf.core.pobjects.fonts.ofont.OFont){
charValue = ((org.icepdf.core.pobjects.fonts.ofont.OFont)
textState.currentfont).toUnicode(unmodifiedDisplayText.charAt(i));
}
Once again this is not an official patch just a work around.
Activity
Patrick Corless
created issue -
Patrick Corless
made changes -
Field | Original Value | New Value |
---|---|---|
Salesforce Case | [] | |
Fix Version/s | 4.0.1 [ 10228 ] | |
Affects Version/s | 4.0 [ 10222 ] |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #20973 | Tue Mar 16 11:00:52 MDT 2010 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/src/org/icepdf/core/util/ContentParser.java
MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/fonts/FontFile.java MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/fonts/ofont/OFont.java |
Patrick Corless
made changes -
Status | Open [ 1 ] | Resolved [ 5 ] |
Resolution | Fixed [ 1 ] |
Ken Fyten
made changes -
Status | Resolved [ 5 ] | Closed [ 6 ] |
Issues appears to be resolved.