Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 6.2.5
-
Fix Version/s: 6.3
-
Component/s: Core/Parsing, Viewer RI
-
Labels:None
-
Environment:any
Description
A client has provided a test case of document exported from open office and saved as a PDF and PDF/A formats. The PDF/A document when searched did not return any results where the PDF version worked more or less as expected.
Activity
Patrick Corless
created issue -
Patrick Corless
made changes -
Field | Original Value | New Value |
---|---|---|
Fix Version/s | 6.3 [ 13093 ] |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #52043 | Tue Oct 31 11:57:57 MDT 2017 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/core-awt/src/main/java/org/icepdf/core/search/DocumentSearchController.java
MODIFY /icepdf/trunk/icepdf/core/core-awt/src/main/java/org/icepdf/core/pobjects/OptionalContents.java MODIFY /icepdf/trunk/icepdf/core/core-awt/src/main/java/org/icepdf/core/pobjects/OptionalContentGroup.java MODIFY /icepdf/trunk/icepdf/core/core-awt/src/main/java/org/icepdf/core/pobjects/graphics/text/WordText.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #52044 | Tue Oct 31 12:01:47 MDT 2017 | patrick.corless | Highlight Annotation tool now inserts selected text into content by default. Also add code to trim 160 and other line breaks from the annotations /content entry. |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/utility/search/SearchPanel.java
MODIFY /icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/search/DocumentSearchControllerImpl.java MODIFY /icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/util/PropertiesManager.java MODIFY /icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/tools/HighLightAnnotationHandler.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #52048 | Tue Oct 31 20:18:49 MDT 2017 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/AnnotationColorPropertyPanel.java
MODIFY /icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/views/annotations/LockIcon.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #52140 | Mon Dec 11 13:05:05 MST 2017 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/core/core-awt/src/main/java/org/icepdf/core/pobjects/graphics/ICCBased.java
MODIFY /icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/utility/search/SearchPanel.java MODIFY /icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/search/DocumentSearchControllerImpl.java MODIFY /icepdf/trunk/icepdf/core/core-awt/src/main/java/org/icepdf/core/pobjects/Page.java MODIFY /icepdf/trunk/icepdf/core/core-awt/src/main/java/org/icepdf/core/pobjects/graphics/text/PageText.java |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #52168 | Thu Dec 14 11:24:34 MST 2017 | patrick.corless | |
Files Changed | ||||
MODIFY
/icepdf/trunk/icepdf/viewer/viewer-awt/src/main/java/org/icepdf/ri/common/search/DocumentSearchControllerImpl.java
|
Patrick Corless
made changes -
Status | Open [ 1 ] | Resolved [ 5 ] |
Resolution | Fixed [ 1 ] |
Patrick Corless
made changes -
Status | Resolved [ 5 ] | Closed [ 6 ] |
After quite a bit of debugging it was found that the PDF/a document used a non breaking spacer char 160 instead of the usual 32. I took a good look as how search is currently setup and have made some fairly far reaching changes. Search term will no longer take into account white space when applying matches. The end result is a some high quality search results. However given the huge number of variations on PDF notation we'll need to kick the tires a bit to makes sure we didn't break anything.
The PDF/A document also had another peculiarity with regards to all text in the document being written as marked content, with two names, "Marked" and "Artifact". While debugging the problems in general I noticed the the optionalContentGroup class was not correctly generating the correct hash value based on the optional content name and similarly for the equals. The single issue was causing a lot of duplicate text to be stored in the PageText Model.