ICEpdf
  1. ICEpdf
  2. PDF-186

Out of memory error when converting large GeometricPath to an Area

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 4.0.1
    • Fix Version/s: 5.0
    • Component/s: Core/Parsing
    • Labels:
      None
    • Environment:
      windows xp, jdk 1.5

      Description

      I have a reasonably sized pdf file at http://www.localhappeningsmagazine.com/20100410APR-MAYFlipbook/April.May_all_pages.pdf

      I am trying to convert this pdf file to image using document.getPageImage() API. At page 43 of pdf, it simply hangs indefinitely.
      Is there a way to know what is causing it?

      Should there be a configurable timeout option so it doesnt hang forever.

      The code is here: http://www.icefaces.org/JForum/posts/list/17077.page#63300

      Thanks

        Activity

        Sandeep M created issue -
        Hide
        Patrick Corless added a comment -

        Page 44 of the document in question has a few very complicated geometric paths containing more then 6000 points. For some reason Java runs out of memory when converting the geometric path to an area. The complex geometric path is used to clip graphics on the page which give a cut out effect.

        Some time will have to spent to further understand why Java geom is having problems with this many data sets or type. It might be possible to divide the geometric shape in to smaller area's and then do an intersect of the areas and thus avoid the bottle neck of doing them all at once.

        Show
        Patrick Corless added a comment - Page 44 of the document in question has a few very complicated geometric paths containing more then 6000 points. For some reason Java runs out of memory when converting the geometric path to an area. The complex geometric path is used to clip graphics on the page which give a cut out effect. Some time will have to spent to further understand why Java geom is having problems with this many data sets or type. It might be possible to divide the geometric shape in to smaller area's and then do an intersect of the areas and thus avoid the bottle neck of doing them all at once.
        Patrick Corless made changes -
        Field Original Value New Value
        Summary Pdf 2 image convertor hangs indefinitely Out of memory error when converting large GeometricPath to an Area
        ICEsoft Forum Reference http://www.icefaces.org/JForum/posts/list/0/17077.page
        Salesforce Case []
        Component/s Core [ 10022 ]
        Hide
        Sandeep M added a comment -

        Hi Patrick, Is it possible for you to share how you went on debugging and identifying this issue. I have some spare time and can take a look. Just I am not aware of how to turn on the logs for ICE pdf.

        Show
        Sandeep M added a comment - Hi Patrick, Is it possible for you to share how you went on debugging and identifying this issue. I have some spare time and can take a look. Just I am not aware of how to turn on the logs for ICE pdf.
        Hide
        Patrick Corless added a comment -

        This one is actually pretty hard to track down as the PDF content stream is big enough to cut out java.util.logging. I had to do some pausing and stepping with the debugger to figure out where the bottle neck was.

        As mentioned earlier it is page 44 that causing the problem. There are several content streams the define the clipping area around the images on the page that produce the cutout effect. In theses content streams there are a large number of geometric operators that are added to our shapes stack when the 'w" token is encountered, this takes place in the ContentParser class. You can do a search for "PdfOps.W_TOKEN", this is where we handle geometric path that has been build up by previous PostScrip operands.

        At the end of the PdfOps.W_TOKEN block, graphicState.setClip is called which is where things blow up. In the setClip method Area area = new Area(newClip); causes the out of memory error. I had fooled around with some code that would copy the newClip operands into smaller shapes but it turned out that the ContentPaser was spending an enormous amount of time just trying to process all the geometric path information.

        I'm guessing there must be another way to handle Area area = new Area(newClip); and later transforms using a different combinations of java.awt.geom.

        Show
        Patrick Corless added a comment - This one is actually pretty hard to track down as the PDF content stream is big enough to cut out java.util.logging. I had to do some pausing and stepping with the debugger to figure out where the bottle neck was. As mentioned earlier it is page 44 that causing the problem. There are several content streams the define the clipping area around the images on the page that produce the cutout effect. In theses content streams there are a large number of geometric operators that are added to our shapes stack when the 'w" token is encountered, this takes place in the ContentParser class. You can do a search for "PdfOps.W_TOKEN", this is where we handle geometric path that has been build up by previous PostScrip operands. At the end of the PdfOps.W_TOKEN block, graphicState.setClip is called which is where things blow up. In the setClip method Area area = new Area(newClip); causes the out of memory error. I had fooled around with some code that would copy the newClip operands into smaller shapes but it turned out that the ContentPaser was spending an enormous amount of time just trying to process all the geometric path information. I'm guessing there must be another way to handle Area area = new Area(newClip); and later transforms using a different combinations of java.awt.geom.
        Hide
        Sandeep M added a comment -

        I guess there is a memory leak or poor handling code in Area area = new Area(newClip);
        After increasing the heap space to -Xmx1024m , the processing went through but was very slow.

        Also java community suggests not to use the slow Area whenever possible. So one of the solution would be to rewrite parts of the code using Area classes or live with it by increasing heap size and slowness.. Dont know what JDK alternatives are available for clipping.

        The exception trace is below when heap size is default:
        ----------------------------------------------------------------------------------------
        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at sun.awt.geom.AreaOp.resolveLinks(AreaOp.java:434)
        at sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:374)
        at sun.awt.geom.AreaOp.calculate(AreaOp.java:141)
        at java.awt.geom.Area.pathToCurves(Area.java:177)
        at java.awt.geom.Area.<init>(Area.java:108)
        at org.icepdf.core.pobjects.graphics.GraphicsState.setClip(GraphicsState.java:568)
        at org.icepdf.core.util.ContentParser.parse(ContentParser.java:512)
        at org.icepdf.core.pobjects.Page.init(Page.java:369)
        at org.icepdf.core.pobjects.Page.paint(Page.java:449)
        at org.icepdf.core.pobjects.Page.paint(Page.java:421)
        at org.icepdf.core.pobjects.Page.paint(Page.java:402)
        at org.icepdf.core.pobjects.Document.getPageImage(Document.java:1039)
        at com.test.common.util.IcePdf2ImageUtil.pdfToImage(IcePdf2ImageUtil.java:79)
        at com.test.common.util.IcePdf2ImageUtil.main(IcePdf2ImageUtil.java:29)
        --------------------------------------------------------------------------------------------------------------

        Show
        Sandeep M added a comment - I guess there is a memory leak or poor handling code in Area area = new Area(newClip); After increasing the heap space to -Xmx1024m , the processing went through but was very slow. Also java community suggests not to use the slow Area whenever possible. So one of the solution would be to rewrite parts of the code using Area classes or live with it by increasing heap size and slowness.. Dont know what JDK alternatives are available for clipping. The exception trace is below when heap size is default: ---------------------------------------------------------------------------------------- Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at sun.awt.geom.AreaOp.resolveLinks(AreaOp.java:434) at sun.awt.geom.AreaOp.pruneEdges(AreaOp.java:374) at sun.awt.geom.AreaOp.calculate(AreaOp.java:141) at java.awt.geom.Area.pathToCurves(Area.java:177) at java.awt.geom.Area.<init>(Area.java:108) at org.icepdf.core.pobjects.graphics.GraphicsState.setClip(GraphicsState.java:568) at org.icepdf.core.util.ContentParser.parse(ContentParser.java:512) at org.icepdf.core.pobjects.Page.init(Page.java:369) at org.icepdf.core.pobjects.Page.paint(Page.java:449) at org.icepdf.core.pobjects.Page.paint(Page.java:421) at org.icepdf.core.pobjects.Page.paint(Page.java:402) at org.icepdf.core.pobjects.Document.getPageImage(Document.java:1039) at com.test.common.util.IcePdf2ImageUtil.pdfToImage(IcePdf2ImageUtil.java:79) at com.test.common.util.IcePdf2ImageUtil.main(IcePdf2ImageUtil.java:29) --------------------------------------------------------------------------------------------------------------
        Hide
        Patrick Corless added a comment -

        Thanks for posting your findings. I'll look into alternatives to using the Area class.

        Show
        Patrick Corless added a comment - Thanks for posting your findings. I'll look into alternatives to using the Area class.
        Patrick Corless made changes -
        Salesforce Case []
        Fix Version/s 5.0 [ 10314 ]
        Hide
        Patrick Corless added a comment -

        Closing do to inactivity and the likely hood we are looking at a JDK bug. Will open if requested by the client.

        Show
        Patrick Corless added a comment - Closing do to inactivity and the likely hood we are looking at a JDK bug. Will open if requested by the client.
        Patrick Corless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Patrick Corless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Patrick Corless
            Reporter:
            Sandeep M
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: