ICEpdf
  1. ICEpdf
  2. PDF-994

Question: For the italic strings in Documents (.docx/.doc) converted to pdf, the PageText is returning by chunking the string into characters instead of String/word.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 6.1
    • Fix Version/s: 6.4
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      Windows and centos

      Description

      For the italic strings in Documents (.docx/.doc), the PageText is returning by chunking the string into characters instead of String/word.

      The scenario is something like below:

      In our application, we upload a doc/docx with italic fonts and convert to pdf using the aspose library. Then our application will perform highlighting for some specific words like URL, IPADDRESS etc in the converted pdf. The rendering of the pdf is perfectly fine with the italic fonts. However, the find function in the icepdf is not working for italic words. It returns 0 results for the find for italic words. Eventually, our application can't perform the highlight as it could not find the words. But, noticed the icepdf PageText is returned as characters array for the italic fonts i.e. www.google.com will return as PageText : w,w,w,.,g,o,o,g,l,e,.,c,o,m instead of www,.,google,.,com. This is ths issue. Please suggest what might be causing this. Your help is highly appreciated.
      1. IcePdf_with_italics.pdf
        13 kB
        Madhavi Katreddy
      1. Highlight.JPG
        58 kB

        Activity

        Hide
        Patrick Corless added a comment -

        It hard to say for sure what is happening without a sample file but you could try one of the system properties:

        -Dorg.icepdf.core.views.page.text.autoSpace=false
        or
        -Dorg.icepdf.core.views.page.text.spaceFraction=1

        Show
        Patrick Corless added a comment - It hard to say for sure what is happening without a sample file but you could try one of the system properties: -Dorg.icepdf.core.views.page.text.autoSpace=false or -Dorg.icepdf.core.views.page.text.spaceFraction=1
        Hide
        Madhavi Katreddy added a comment -

        Thank you for your suggestion. I have implemented the same and attached are the sample file and the screen shot after implementing the below System.setProperty("org.icepdf.core.views.page.text.spaceFraction",1);

        The attached screen shot depicts the overall length of the italic font highlight is more than the normal font highlight. And the highlighting is overlapping too. Please suggest.

        Show
        Madhavi Katreddy added a comment - Thank you for your suggestion. I have implemented the same and attached are the sample file and the screen shot after implementing the below System.setProperty("org.icepdf.core.views.page.text.spaceFraction",1); The attached screen shot depicts the overall length of the italic font highlight is more than the normal font highlight. And the highlighting is overlapping too. Please suggest.
        Hide
        Patrick Corless added a comment -

        The width of the bound box for the italic fonts is a bit wider then one might expect and results in the darker overlapping highlight. We have tried moving to a blending mode for the highlight but have had issue on some Linux implementations. We'll keep this in mind for a future enhancement.

        Show
        Patrick Corless added a comment - The width of the bound box for the italic fonts is a bit wider then one might expect and results in the darker overlapping highlight. We have tried moving to a blending mode for the highlight but have had issue on some Linux implementations. We'll keep this in mind for a future enhancement.

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Madhavi Katreddy
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: