February 10, 2005

Useful but probably common XPath function

I don't work with XML often enough to have this always at the ready, so: normalize-space() will take care of chomping sequences of whitespace to a single space. So when you're trying to match something like:

<p>This is my
Great Page Number Three
summary
</p>

You can just use something like this and it will do the right thing (using dom4j if that matters):

 String path = "//p[contains( normalize-space( text() ), 'This is my Great Page Number' )]";
 List matching = doc.selectNodes( path );

(...and when I say 'XPath function' I actually mean 'XSLT function also used by XPath'. But since I'm just using XPath right now...)

UPDATE: There seems to be a bug with 'normalize-space' in the XPath engine (jaxen) shipped with dom4j -- sometimes I get a 'StringIndexOutOfBoundsException', sometimes I don't -- it might have to do with single-space strings (referenced as JAXEN-22), but I'm not sure. In any case, you can grab the code from CVS, modify the ant 'jar' task to not depend on 'test' since one or more tests fail, run 'ant' and replace your jaxen jar with the newly built one. It's worked so far...

Next: Using a lot lately: Factory + implementation autodiscovery
Previous: Replaying the Ancient Art of War