java - XSLT 3.0 chained burst-streaming with saxon - memory consumption considerations - Stack Overflow

时间： 2025-01-06 admin 业界

I am in process to optimize an existing XML to XML transformation process in terms of memory consumption. We are transforming large multi-GB XML files into a much smaller internal XML structure - the result is less than 10% in size. The transformation is implemented in 4 different XSLT-stages - basically we do:

sourcefile.xml -> xslt1 -> xslt2 -> xslt3 -> xslt4 -> targetfile.xml

This was implemented as a non-streamable chained transformation using the SAX-api with saxon. By now all 4 transformations have been adjusted to be burst-streamable. As part of it we have also changed the calling Java implementation to use the saxon s9api with Xslt30Transformer (test have been done using Saxon 10.6).

We are seeing the following pattern with a test sourcefile of 500 MB.

Memory need with non-streamable chained SAX process: -Xmx 2GB
Memery need for a single streaming trafo-phase: less than 200MB, runs nicely with -Xmx200M - that would be perfect.
if we chain the streamable trafos using multiple trafo.asDocumentDestination(nextTrafo) we need 4GB of memory to run it, otherwise it stops with "GC Overhead limit exceeded".

We can reproduce this with only chaining 2 of the 4 trafos:

separate execution with Xmx200M runs for both.
chained execution needs -Xmx4G to run.

We are asking ourselves whether this is to be expected and normal, as with chained streaming there might be a need to buffer the data between the trafos? or is there an issue in our implementation...

So we could obviously save to disk between the trafos and have 4 separate steps each consuming 200MB - but is this really optimal with streaming to disk inbetween the trafos?

Is this behaviour to be expected or is there something wrong in our implementation?

--- edited, added more information that seems to indicate that streaming is working also in the chained case:

I see a partially written result.xml even in the chained case when I set -Xmx500M and I get an OOM: about 10% of the expected result.xml is there on the filesystem.
I also see in the logs that both chained trafos are started almost at the same time - immediately after compiling and I see a running log of the second trafo (a line for each processed document).
the running log starts fast and then about 15sec later gets slower (as GC starts hitting) and then finally stops shortly before the OOM-Exception.
I only see the 3 small trees (2x 500nodes, 1x 30000nodes) being built for loading static mapping-tables, the same trees are built when I sucessfully run a single streaming trafo with a 500MB source file and Xmx200M - no other trees visible in the logs.

I will now try with newest saxon 12.x release (was using 10.6 for tests as the customer is still using this in prod currently)

If this does not change anything, will then reduce / isolate the case and launch a support ticket as Michael suggested below.

our (simplified) code looks like this:

// trafo1234 are Xslt30Transformer we got using xsltCompilerpile().load30()

Serializer finalDest = trafo4.newSerializer(Files.newOutputStream(outFile));
StreamSource input = Files.newInputStream(inFile);
trafo1.applyTemplates(input, 
   trafo2.asDocumentDestination( 
      trafo3.asDocumentDestination( 
        trafo4.asDocumentDestination(finalDest))));

PS: is there an easy way to see/measure the memory-needs of a transformation? So far the only way I found was to play with -Xmx until I got an OOM-Exception

sourcefile.xml -> xslt1 -> xslt2 -> xslt3 -> xslt4 -> targetfile.xml

We are seeing the following pattern with a test sourcefile of 500 MB.

Memory need with non-streamable chained SAX process: -Xmx 2GB
Memery need for a single streaming trafo-phase: less than 200MB, runs nicely with -Xmx200M - that would be perfect.
if we chain the streamable trafos using multiple trafo.asDocumentDestination(nextTrafo) we need 4GB of memory to run it, otherwise it stops with "GC Overhead limit exceeded".

We can reproduce this with only chaining 2 of the 4 trafos:

separate execution with Xmx200M runs for both.
chained execution needs -Xmx4G to run.

We are asking ourselves whether this is to be expected and normal, as with chained streaming there might be a need to buffer the data between the trafos? or is there an issue in our implementation...

So we could obviously save to disk between the trafos and have 4 separate steps each consuming 200MB - but is this really optimal with streaming to disk inbetween the trafos?

Is this behaviour to be expected or is there something wrong in our implementation?

--- edited, added more information that seems to indicate that streaming is working also in the chained case:

I see a partially written result.xml even in the chained case when I set -Xmx500M and I get an OOM: about 10% of the expected result.xml is there on the filesystem.
I also see in the logs that both chained trafos are started almost at the same time - immediately after compiling and I see a running log of the second trafo (a line for each processed document).
the running log starts fast and then about 15sec later gets slower (as GC starts hitting) and then finally stops shortly before the OOM-Exception.
I only see the 3 small trees (2x 500nodes, 1x 30000nodes) being built for loading static mapping-tables, the same trees are built when I sucessfully run a single streaming trafo with a 500MB source file and Xmx200M - no other trees visible in the logs.

I will now try with newest saxon 12.x release (was using 10.6 for tests as the customer is still using this in prod currently)

If this does not change anything, will then reduce / isolate the case and launch a support ticket as Michael suggested below.

our (simplified) code looks like this:

// trafo1234 are Xslt30Transformer we got using xsltCompiler.compile().load30()

Serializer finalDest = trafo4.newSerializer(Files.newOutputStream(outFile));
StreamSource input = Files.newInputStream(inFile);
trafo1.applyTemplates(input, 
   trafo2.asDocumentDestination( 
      trafo3.asDocumentDestination( 
        trafo4.asDocumentDestination(finalDest))));

PS: is there an easy way to see/measure the memory-needs of a transformation? So far the only way I found was to play with -Xmx until I got an OOM-Exception

Share Improve this question edited yesterday asked yesterday Reto 3,1391 gold badge23 silver badges33 bronze badges

4 GB memory use for 500 MB sample files sounds a bit as if streaming is not done. Not sure what could be the reason for that, however, license not found/working and falling back to HE could be one reason. But you say a single transformation does streaming and consumes 200 MB so that suggests the license is found/is working. Not sure why that would fail for the Java code you have shown unless there you unintentionally don't use the license. All kind of speculation, however. – Martin Honnen Commented yesterday
thanks @MartinHonnen - I added more info to the question that indicates to me, that streaming is actually working even in the chained case. I will try to reduce/isolate the issue now. – Reto Commented yesterday

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

It certainly looks as if your attempt to achieve a streamed pipeline using Xslt30Transformer.asDocumentDestination() isn't working, and that the intermediate trees (or one of them, at least) is being built in memory. It's not obvious from the information supplied why this should be. The key test being applied by Saxon is initialMode().isDeclaredStreamable(); if that returns false then it's going to build the tree.

I would suggest creating a repro that demonstrates the problem and sending it over to us, preferably as a support request at saxonica.plan.io so it doesn't get lost.

Setting the Processor/Configuration property Feature.TIMING should result in messages indicating that a tree is being built, if this is the case.

Following the suggestion of @michael Kay above in the comments I created a java heap dump on OOM using the parameter -XX:+HeapDumpOnOutOfMemoryError and analyzed it using the Eclipse Memory Analizer. I quickly identfied that the problem is with a accumulator, that used 95% of the memory.

There are only 4 xsl:accumulator statements in our code, so it was easy to simply comment out one by one to find the offending one.

Looks like what causes the OOM is the accumulator I copied from the answer of Michael to my last SO question, I tried his suggestion.

Causing the OOM is this code:

<xsl:accumulator name="DruckDatum" initial-value="'empty'" streamable="yes" as="xs:string">
    <xsl:accumulator-rule match="ExportKopf/DruckDatum/text()" select="string(.)"/>
</xsl:accumulator>

<xsl:attribute name="ErstellZt"><xsl:sequence select="tfu:getIsoDateFromString(accumulator-after('DruckDatum'))"/>T00:00:00</xsl:attribute>
 

<xsl:function name="tfu:getIsoDateFromString" as="xs:date?" visibility="public">
    <xsl:param name="param1" as="xs:string?"/>
    <xsl:if test="exists($param1) and matches($param1, '([0-9]{2}\.){2}[0-9]{4}')">
      <xsl:sequence select="xs:date(concat(substring($param1, 7, 4), '-', substring($param1, 4, 2), '-', substring($param1, 1, 2)))"/>
    </xsl:if>
  </xsl:function>

as soon as this accumulator is in the code AND there is at least one accumulator-after() call somewhere, I get the OOM.

Replacing the select="string(.)" with select="$value"or select="string()" does not change the memory behaviour (but $value does not give the correct value anymore). Only weird thing is the OOM only happens when used in a "streamed" AND "chained" transormation, when called as a single streamed trafo it was working fine...

Heap dump using Saxon EE 10.6