ICEpdf
  1. ICEpdf
  2. PDF-1316

Explore enhancement request to text extraction

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 6.3.2
    • Fix Version/s: 6.3.3
    • Component/s: API, Core/Parsing, Viewer RI
    • Labels:
      None
    • Environment:
      any

      Description

      STEPS TO REPRODUCE:
      1. Open the attached pdf
      2. Go to page 7
      3. Highlight the fractions after question 11
      4. Listen to the speech

      ACTUAL RESULTS:
      "nine, six, eighteen, ten, fifteen" are read before "twelve, eight, twenty four, sixteen, twenty".

      EXPECTED RESULTS:
      It should read "nine over twelve, six over eight, eighteen over twenty four, ten over sixteen, fifteen over twenty".

      ADDITIONAL INFORMATION:
      Alt text is present on the spread sheet which should overwrite the way the text is spoken.

      Further information from our developer:
      To speak selected text, we use `DocumentViewController.getSelectedText()`. But for words that have "<rdf:Alt>" property (in "<?xpacket ...><x:xmpmeta...>") we would like to speak user-defined alternative text instead of the printed one.
      Is there any way to extract metadata for `PageText`, `LineText`, `WordText` and `GlyphText`?
      If not, could you please add `getMetadata()` method to those classes so that we can access "<?xpacket>" dictionaries? Even binary `byte[]` result works for us, we will parse XML on our side.

        Activity

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Patrick Corless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: