Alex Corrigan

Ditching Traditional XML Parsing

A simple alternative for manipulating XML

I have been working on some test automation that involves publishing various XML messages over JMS and checking how and what the system under test did with them. There is about 25 of them at last count, and each is distinct. Distinct in that they come from different source systems. They each share the same base schema and some common elements, but at the same time each has something unique about it.

There is not a lot I care about in these messages other than a hand-full of, perhaps no more than 10 or so, specific field values which I want to manipulate before publishing in order to run different scenarios. For example, prices, instruments, currencies and trading counter parties. It is not an uncommon task when testing APIs or other message based interfaces.

There are some established Java libraries out there for this kind of thing:

  • You can use something like JAXB to unmarshal an XML string to an object first before changing the necessary fields, marshalling it as new string and then publishing it. The objects need to be defined first by way of generating them from the XML schema.

  • Or you can go the XPATH route and parse the XML string to a document object and edit the fields that way.

I only wanted to touch a few of fields in each message so originally went with the XPATH approach. It worked for a while but I eventually got frustrated with the number of paths I was having to manage. Despite each of my template messages being based of the same schema, the fields were not located in the same path in one message to the next.

Eventually, probably in an actual fit of rage (I don’t recall now), I ripped up the rule book and chose to just treat each of the messages as a string. Pure and simple. In each message I placed tokens that represented the particular data types to be substituted when parsing it. So for those prices I used a token like {{price}} and with the power of regular expressions just replaced it for a real price from the scenario being tested.

Since implementing this I haven’t regretted it once. Sometimes I feel I have broken an unspoken rule by not using one of the other approaches, but at the end of the day my solution works. It is simple, reliable and allows me to move on with less pain.

The inspiration for this, at the time, was from HTML templating engines like Jinja and Handlebars.js. What my code is doing though is probably most like Mustache, a more recent discovery. I have yet to investigate if I could actually use the Java library of this instead.

It is possible that there may be some sacrifices though regards to the performance of parsing and manipulating those strings. I haven’t measured it. To be honest, performance was not a major concern for me, it is fast enough for my needs.