Details
Description
A client has submitted a patch to improve the detection of duplicated words that sometimes occur in PDF documents created using Chrystal Reports. The PDF in question plot out out a bunch of text followed by the same text plotted out again.
We had added some experimental code that was activated with -Dorg.icepdf.core.views.page.text.trim.duplicates=true . This code tried to look for duplicate text by comparing text based on a mid point. The client has come back with an improved algorithm where a key is generated based on the words bounds and text. Any text that has a duplicate key is trimmed.
This code should work just fine going forward. We'll have to run a QA test for text extraction to be sure though.
We had added some experimental code that was activated with -Dorg.icepdf.core.views.page.text.trim.duplicates=true . This code tried to look for duplicate text by comparing text based on a mid point. The client has come back with an improved algorithm where a key is generated based on the words bounds and text. Any text that has a duplicate key is trimmed.
This code should work just fine going forward. We'll have to run a QA test for text extraction to be sure though.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Field | Original Value | New Value |
---|---|---|
Fix Version/s | 5.0.7 [ 11470 ] |
Attachment | PageText.java.patch [ 17202 ] |
Fix Version/s | 5.1 [ 10675 ] | |
Fix Version/s | 5.0.7 [ 11470 ] |
Status | Open [ 1 ] | Resolved [ 5 ] |
Fix Version/s | 5.0.7 [ 11470 ] | |
Fix Version/s | 5.1 [ 10675 ] | |
Resolution | Fixed [ 1 ] |
Status | Resolved [ 5 ] | Closed [ 6 ] |