[PDF-994] Question: For the italic strings in Documents (.docx/.doc) converted to pdf, the PageText is returning by chunking the string into characters instead of String/word. - ICEsoft JIRA Issue Tracker

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 6.1
Fix Version/s: 6.4
Component/s: Core/Parsing
Labels:
None
Environment:
Windows and centos

Description

For the italic strings in Documents (.docx/.doc), the PageText is returning by chunking the string into characters instead of String/word.

The scenario is something like below:

In our application, we upload a doc/docx with italic fonts and convert to pdf using the aspose library. Then our application will perform highlighting for some specific words like URL, IPADDRESS etc in the converted pdf. The rendering of the pdf is perfectly fine with the italic fonts. However, the find function in the icepdf is not working for italic words. It returns 0 results for the find for italic words. Eventually, our application can't perform the highlight as it could not find the words. But, noticed the icepdf PageText is returned as characters array for the italic fonts i.e. www.google.com will return as PageText : w,w,w,.,g,o,o,g,l,e,.,c,o,m instead of www,.,google,.,com. This is ths issue. Please suggest what might be causing this. Your help is highly appreciated.

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Download All

Attachments

IcePdf_with_italics.pdf

09/May/16 9:46 PM

13 kB

Madhavi Katreddy

Highlight.JPG

58 kB

09/May/16 9:46 PM

Activity

There are no subversion log entries for this issue yet.

People

Assignee:

Patrick Corless

Reporter:

Madhavi Katreddy

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

05/May/16 8:56 PM

Updated:

28/Oct/16 8:05 AM