ICEpdf
  1. ICEpdf
  2. PDF-957

Search highlight not working for some text in PDF

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 6.0.1, 6.0.2
    • Fix Version/s: 6.1.1
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      All

      Description

      When searching for text with the provided PDF file in the ICEpdf Viewer, it highlights more area than just the specific text.
      1. demo_pdffile.pdf
        100 kB
        Flextronics International
      1. exampler8.png
        227 kB
      2. exampleu11png.png
        234 kB

        Activity

        Arran Mccullough created issue -
        Flextronics International made changes -
        Field Original Value New Value
        Attachment demo_pdffile.pdf [ 22031 ]
        Attachment exampler8.png [ 22032 ]
        Attachment exampleu11png.png [ 22033 ]
        Hide
        Patrick Corless added a comment -

        In this selection case the postscript looks as follows:

        /F1 12.84 Tf 0 1 -1 0 0 0 Tm 240.1 -698.6 TD[(R)]TJ
        9.12 0 TD[(3)]TJ
        -9.12 14.4 TD[(R)]TJ
        9.12 0 TD[(4)]TJ
        -9.12 14.4 TD[(R)]TJ
        9.12 0 TD[(5)]TJ
        -9.12 14.52 TD[(R)]TJ
        9.12 0 TD[(6)]TJ
        -9.12 14.4 TD[(R)]TJ
        9.12 0 TD[(7)]TJ
        -9.12 -72.24 TD[(R)]TJ
        9.12 0 TD[(2)]TJ
        -25.2 -28.08 TD[(R)]TJ
        9.12 0 TD[(8)]TJ
        6.96 129.2 TD[(R)]TJ
        9.12 0 TD[(9)]TJ

        We have code that tries to property detect vertical writing but in this case I think the issue might be around new line detection as the letters are plotted out one by one and we need to look at Y to figure out if a word break is needed.

        Show
        Patrick Corless added a comment - In this selection case the postscript looks as follows: /F1 12.84 Tf 0 1 -1 0 0 0 Tm 240.1 -698.6 TD [(R)] TJ 9.12 0 TD [(3)] TJ -9.12 14.4 TD [(R)] TJ 9.12 0 TD [(4)] TJ -9.12 14.4 TD [(R)] TJ 9.12 0 TD [(5)] TJ -9.12 14.52 TD [(R)] TJ 9.12 0 TD [(6)] TJ -9.12 14.4 TD [(R)] TJ 9.12 0 TD [(7)] TJ -9.12 -72.24 TD [(R)] TJ 9.12 0 TD [(2)] TJ -25.2 -28.08 TD [(R)] TJ 9.12 0 TD [(8)] TJ 6.96 129.2 TD [(R)] TJ 9.12 0 TD [(9)] TJ We have code that tries to property detect vertical writing but in this case I think the issue might be around new line detection as the letters are plotted out one by one and we need to look at Y to figure out if a word break is needed.
        Patrick Corless made changes -
        Fix Version/s 6.1.1 [ 12975 ]
        Hide
        Patrick Corless added a comment -

        This is a trick PDF but I think I have a solution that should have minimal impact on other users. As suspect the text layout change from horizontal to vertical introduce a couple corner cases when we try and find words. The first fix was to try and detect a change in orientation on the 'tm' change. The other corner case was related to our word space detection with the vertically oriented text. In such a case the x values with not change only the y value. This is the riskiest change and will require quite a bit of testing. But the PDF in question does behave correctly now with regards to search.

        Show
        Patrick Corless added a comment - This is a trick PDF but I think I have a solution that should have minimal impact on other users. As suspect the text layout change from horizontal to vertical introduce a couple corner cases when we try and find words. The first fix was to try and detect a change in orientation on the 'tm' change. The other corner case was related to our word space detection with the vertically oriented text. In such a case the x values with not change only the y value. This is the riskiest change and will require quite a bit of testing. But the PDF in question does behave correctly now with regards to search.
        Hide
        Patrick Corless added a comment -

        Marking a fixed.

        Show
        Patrick Corless added a comment - Marking a fixed.
        Patrick Corless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #48541 Tue Mar 22 14:38:23 MDT 2016 patrick.corless PDF-957 added some coner case logic for better word detection of vertically layed out text.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/PageText.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/LineText.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/util/content/AbstractContentParser.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/WordText.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #48543 Tue Mar 22 14:39:07 MDT 2016 patrick.corless PDF-957 added some coner case logic for better word detection of vertically layed out text.
        Files Changed
        Commit graph MODIFY /icepdf/branches/icepdf-6.1.0/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/LineText.java
        Commit graph MODIFY /icepdf/branches/icepdf-6.1.0/icepdf/core/src/org/icepdf/core/util/content/AbstractContentParser.java
        Commit graph MODIFY /icepdf/branches/icepdf-6.1.0/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/WordText.java
        Commit graph MODIFY /icepdf/branches/icepdf-6.1.0/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/PageText.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #48592 Fri Apr 01 09:50:13 MDT 2016 patrick.corless PDF-957 added some coner case logic for better word detection of vertically layed out text.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/PageText.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #48593 Fri Apr 01 09:50:27 MDT 2016 patrick.corless PDF-957 added some coner case logic for better word detection of vertically layed out text.
        Files Changed
        Commit graph MODIFY /icepdf/branches/icepdf-6.1.0/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/PageText.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #48596 Fri Apr 01 10:55:18 MDT 2016 patrick.corless PDF-957 further tweaks to detecting text rotatiotn and word breaks.
        Files Changed
        Commit graph MODIFY /icepdf/branches/icepdf-6.1.0/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/LineText.java
        Commit graph MODIFY /icepdf/branches/icepdf-6.1.0/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/PageText.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #48598 Fri Apr 01 14:01:12 MDT 2016 patrick.corless PDF-957 added some coner case logic for better word detection of vertically layed out text.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/PageText.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/LineText.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #49466 Mon Nov 07 15:23:18 MST 2016 patrick.corless PDF-957 added some coner case logic for better word detection of vertically layed out text.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/PageText.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/LineText.java
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/WordText.java
        Repository Revision Date User Message
        ICEsoft Public SVN Repository #49527 Tue Nov 08 13:14:40 MST 2016 patrick.corless PDF-957 add som corner case logic for better word detection.
        Files Changed
        Commit graph MODIFY /icepdf/trunk/icepdf/core/src/org/icepdf/core/pobjects/graphics/text/LineText.java
        Patrick Corless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Arran Mccullough
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: