[PDF-957] Search highlight not working for some text in PDF - ICEsoft JIRA Issue Tracker

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 6.0.1, 6.0.2
Fix Version/s: 6.1.1
Component/s: Core/Parsing
Labels:
None
Environment:
All

Support Case References:
Support Case #13657 - https://icesoft.my.salesforce.com/5007000001YEpXG

Description

When searching for text with the provided PDF file in the ICEpdf Viewer, it highlights more area than just the specific text.

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Download All

Attachments

demo_pdffile.pdf

06/Jan/16 8:37 AM

100 kB

Flextronics International

exampler8.png

227 kB

06/Jan/16 8:37 AM
exampleu11png.png

234 kB

06/Jan/16 8:37 AM

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Patrick Corless added a comment - 13/Jan/16 1:25 PM

In this selection case the postscript looks as follows:

/F1 12.84 Tf 0 1 -1 0 0 0 Tm 240.1 -698.6 TD[(R)]TJ
9.12 0 TD[(3)]TJ
-9.12 14.4 TD[(R)]TJ
9.12 0 TD[(4)]TJ
-9.12 14.4 TD[(R)]TJ
9.12 0 TD[(5)]TJ
-9.12 14.52 TD[(R)]TJ
9.12 0 TD[(6)]TJ
-9.12 14.4 TD[(R)]TJ
9.12 0 TD[(7)]TJ
-9.12 -72.24 TD[(R)]TJ
9.12 0 TD[(2)]TJ
-25.2 -28.08 TD[(R)]TJ
9.12 0 TD[(8)]TJ
6.96 129.2 TD[(R)]TJ
9.12 0 TD[(9)]TJ

We have code that tries to property detect vertical writing but in this case I think the issue might be around new line detection as the letters are plotted out one by one and we need to look at Y to figure out if a word break is needed.

Show

Patrick Corless added a comment - 13/Jan/16 1:25 PM In this selection case the postscript looks as follows: /F1 12.84 Tf 0 1 -1 0 0 0 Tm 240.1 -698.6 TD [(R)] TJ 9.12 0 TD [(3)] TJ -9.12 14.4 TD [(R)] TJ 9.12 0 TD [(4)] TJ -9.12 14.4 TD [(R)] TJ 9.12 0 TD [(5)] TJ -9.12 14.52 TD [(R)] TJ 9.12 0 TD [(6)] TJ -9.12 14.4 TD [(R)] TJ 9.12 0 TD [(7)] TJ -9.12 -72.24 TD [(R)] TJ 9.12 0 TD [(2)] TJ -25.2 -28.08 TD [(R)] TJ 9.12 0 TD [(8)] TJ 6.96 129.2 TD [(R)] TJ 9.12 0 TD [(9)] TJ We have code that tries to property detect vertical writing but in this case I think the issue might be around new line detection as the letters are plotted out one by one and we need to look at Y to figure out if a word break is needed.

Hide

Permalink

Patrick Corless added a comment - 22/Mar/16 2:24 PM

This is a trick PDF but I think I have a solution that should have minimal impact on other users. As suspect the text layout change from horizontal to vertical introduce a couple corner cases when we try and find words. The first fix was to try and detect a change in orientation on the 'tm' change. The other corner case was related to our word space detection with the vertically oriented text. In such a case the x values with not change only the y value. This is the riskiest change and will require quite a bit of testing. But the PDF in question does behave correctly now with regards to search.

Show

Patrick Corless added a comment - 22/Mar/16 2:24 PM This is a trick PDF but I think I have a solution that should have minimal impact on other users. As suspect the text layout change from horizontal to vertical introduce a couple corner cases when we try and find words. The first fix was to try and detect a change in orientation on the 'tm' change. The other corner case was related to our word space detection with the vertically oriented text. In such a case the x values with not change only the y value. This is the riskiest change and will require quite a bit of testing. But the PDF in question does behave correctly now with regards to search.

Hide

Permalink

Patrick Corless added a comment - 22/Mar/16 2:33 PM

Marking a fixed.

Show

Patrick Corless added a comment - 22/Mar/16 2:33 PM Marking a fixed.

People

Assignee:

Patrick Corless

Reporter:

Arran Mccullough

Votes:

1 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

05/Jan/16 10:01 AM

Updated:

25/Jan/18 12:58 PM

Resolved:

22/Mar/16 2:33 PM