Details
Description
Hello,
I am used to the searchPage method of the DocumentSearchController and it generally works fine.
Unfortunately, it fails in the following case:
Searching for "Article 45 of the Constitution" returns 0 match.
I tried using the API (I.e DocumentSearchController.searchPage) and via the Search tab of the viewer.
This term exists in the document and Acrobat Reader found several occurrences.
In order to be sure that it is not an issue with the cmap, I checked the cids and unics of a page containing this pattern (p 239)
and they seem correct:
cid=0x41:unic=A:0x41
cid=0x72:unic=r:0x72
cid=0x74:unic=t:0x74
cid=0x69:unic=i:0x69
cid=0x63:unic=c:0x63
cid=0x6c:unic=l:0x6c
cid=0x65:unic=e:0x65
cid=0x20:unic= :0x20
cid=0x34:unic=4:0x34
cid=0x35:unic=5:0x35
cid=0x20:unic= :0x20
cid=0x6f:unic=o:0x6f
cid=0x66:unic=f:0x66
cid=0x20:unic= :0x20
cid=0x74:unic=t:0x74
cid=0x68:unic=h:0x68
cid=0x65:unic=e:0x65
cid=0x20:unic= :0x20
cid=0x43:unic=C:0x43
cid=0x6f:unic=o:0x6f
cid=0x6e:unic=n:0x6e
cid=0x73:unic=s:0x73
cid=0x74:unic=t:0x74
cid=0x69:unic=i:0x69
cid=0x74:unic=t:0x74
cid=0x75:unic=u:0x75
cid=0x74:unic=t:0x74
cid=0x69:unic=i:0x69
cid=0x6f:unic=o:0x6f
cid=0x6e:unic=n:0x6e
Moreover, this term is not split over multiple lines.
The pdf is accessible on http://www.itu.int/dms_pub/itu-s/oth/02/02/S02020000244501PDFE.pdf
I am really puzzled, thank you very much for your help.
I am used to the searchPage method of the DocumentSearchController and it generally works fine.
Unfortunately, it fails in the following case:
Searching for "Article 45 of the Constitution" returns 0 match.
I tried using the API (I.e DocumentSearchController.searchPage) and via the Search tab of the viewer.
This term exists in the document and Acrobat Reader found several occurrences.
In order to be sure that it is not an issue with the cmap, I checked the cids and unics of a page containing this pattern (p 239)
and they seem correct:
cid=0x41:unic=A:0x41
cid=0x72:unic=r:0x72
cid=0x74:unic=t:0x74
cid=0x69:unic=i:0x69
cid=0x63:unic=c:0x63
cid=0x6c:unic=l:0x6c
cid=0x65:unic=e:0x65
cid=0x20:unic= :0x20
cid=0x34:unic=4:0x34
cid=0x35:unic=5:0x35
cid=0x20:unic= :0x20
cid=0x6f:unic=o:0x6f
cid=0x66:unic=f:0x66
cid=0x20:unic= :0x20
cid=0x74:unic=t:0x74
cid=0x68:unic=h:0x68
cid=0x65:unic=e:0x65
cid=0x20:unic= :0x20
cid=0x43:unic=C:0x43
cid=0x6f:unic=o:0x6f
cid=0x6e:unic=n:0x6e
cid=0x73:unic=s:0x73
cid=0x74:unic=t:0x74
cid=0x69:unic=i:0x69
cid=0x74:unic=t:0x74
cid=0x75:unic=u:0x75
cid=0x74:unic=t:0x74
cid=0x69:unic=i:0x69
cid=0x6f:unic=o:0x6f
cid=0x6e:unic=n:0x6e
Moreover, this term is not split over multiple lines.
The pdf is accessible on http://www.itu.int/dms_pub/itu-s/oth/02/02/S02020000244501PDFE.pdf
I am really puzzled, thank you very much for your help.
Activity
Field | Original Value | New Value |
---|---|---|
Fix Version/s | 5.1.2 [ 11872 ] |
Status | Open [ 1 ] | Resolved [ 5 ] |
Resolution | Fixed [ 1 ] |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #44074 | Tue Feb 17 09:21:50 MST 2015 | patrick.corless | |
Files Changed | ||||
![]() |
Repository | Revision | Date | User | Message |
ICEsoft Public SVN Repository | #44075 | Tue Feb 17 09:22:10 MST 2015 | patrick.corless | |
Files Changed | ||||
![]() |
Status | Resolved [ 5 ] | Closed [ 6 ] |
Thanks for posting this issue. We're currently working with a customer on improving our text extraction ordering. In the document in question an extra space is being added to work. A fix for this should soon be available.