Details
-
Type: New Feature
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 4.0 - Beta
-
Fix Version/s: 4.0
-
Component/s: Core/Parsing
-
Labels:None
-
Environment:Windows, Mac
Description
In the ICEpdf standard, it says that inside of dictionaries, strings can be either in PDFDocEncoding or 16 bit BE (big endian) Unicode. To complicate things, there may be certain dictionary strings that are in UTF-8. That has to be investigated. Right now our parser is just making strings from the bytes, which means we're only handling ASCII correctly. Accented characters using the top 8th bit are not necessarily being handled right. Java defaults to using the platform encoding, so WinAnsi on Windows and MacRoman on the Mac. Have to see what on Linux. Some documentation shows PDFDocEncoding to be similar to, if not the same as Latin1. We have to investigate if there is something in the specification for overriding the PDFDocEncoding default to specify a specific one. Then we need the Parser to use the correct encoding to create the Java strings, so we're not corrupting the inputs.
Issue Links
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Mark Collette
created issue -
Mark Collette
made changes -
Mark Collette
made changes -
Patrick Corless
made changes -
Status | Open [ 1 ] | Resolved [ 5 ] |
Fix Version/s | 4.0 [ 10222 ] | |
Resolution | Fixed [ 1 ] |
Patrick Corless
made changes -
Status | Resolved [ 5 ] | Closed [ 6 ] |