If All Else Fails, try Sikuli!

I'm not a massive fan of automating GUIs in general, but if I really have to...

Published: 2017-03-06 21:00:00

I have often tried to avoid automating GUIs believing there are quicker just as valid approaches to automation via APIs or or other back-end services. But there comes a time when you run out of options and just can not avoid the GUI. Perhaps that is the only interface that exposes a particular data set, or the only entry point to trigger some action in the application. It happens.

With specific regards to Java Swing GUIs, a situation I find myself in currently, there are perhaps only three options availble for automation. Firstly, good luck if you go with any of these. The options line up like this:

  • Employ an established (expensive) automation test tool
  • Build a Java agent
  • Automate whatever you see on the screen

You could put your money where your mouth is and buy something to get the job done. There are a few tools out there that can do this, even when working with Java Swing clients. Anything from Hewlitt Packard's UFT (previously QTP, if you buy the Java plugin as well - is that still the case?) to Marathon. There is nothing wrong with these or any of the many others in between. I personally just find them bloated, expensive and difficult to integrate with other frameworks. And if I have a problem to solve now, I want to solve it now, not engage with a sales team.

Building a Java agent I am sure is a fun past time. God knows I have tried! I have been trying to do this for a couple of years now and not yet found the magic formula to get it to work. You need to build a Java agent that hooks in to the JVM hosting the application under test and then enumerate all the GUI object classes. Then you need interact with them some how, and that involves some low level server / client architecture with RPC. And... And... And it is too much effort, unless you know what you're doing. I dont... Yet (I am am still working on this, and I know there are people out there who have done it already!).

The third option does not care about the technology of what you are trying to control, as long as you can see it on your screen. It is open source. It is the true alternative to options 1 and 2. Take a look at SikuliX. I will not go into the details of how it works, you can find that out, suffice to say it uses a combination of image recognition to locate on screen coordinates of "things" along with the Java robot API (I believe). All I can say is that it works! And you can have a proof of concept up and running in an afternoon, which is important. There are times you will struggle with it, and other times you will need to be a bit creative. My advice for more reliable automation would be to prefer keyboard shortcuts over clicking things you see on screen. Only go down the image recognition route as a last resort, when you really can not get what you want any other way.

If you absolutely must automate a GUI, especially if it is a Java Swing client, do take a look at SikuliX first.