Details
Description
A client has submitted a patch to improve the detection of duplicated words that sometimes occur in PDF documents created using Chrystal Reports. The PDF in question plot out out a bunch of text followed by the same text plotted out again.
We had added some experimental code that was activated with -Dorg.icepdf.core.views.page.text.trim.duplicates=true . This code tried to look for duplicate text by comparing text based on a mid point. The client has come back with an improved algorithm where a key is generated based on the words bounds and text. Any text that has a duplicate key is trimmed.
This code should work just fine going forward. We'll have to run a QA test for text extraction to be sure though.
We had added some experimental code that was activated with -Dorg.icepdf.core.views.page.text.trim.duplicates=true . This code tried to look for duplicate text by comparing text based on a mid point. The client has come back with an improved algorithm where a key is generated based on the words bounds and text. Any text that has a duplicate key is trimmed.
This code should work just fine going forward. We'll have to run a QA test for text extraction to be sure though.
Activity
- All
- Comments
- History
- Activity
- Remote Attachments
- Subversion
Patch has been applied and as shipped with 5.0.7