You are here

Parsing XML

24 February, 2015 - 11:15

Here is a simple application that parses some XML and extracts some data elements from the XML:

import xml.etree.ElementTree as ETdata = '''<person>    <name>Chuck</name>    <phone type="intl">        +1 734 303 4456    </phone>    <email hide="yes"/></person>'''

tree = ET.fromstring(data)print 'Name:',tree.find('name').textprint 'Attr:',tree.find('email').get('hide')

Calling fromstring converts the string representation of the XML into a ’tree’ of XML nodes. When the XML is in a tree, we have a series of methods which we can call to extract portions of data from the XML.

The find function searches through the XML tree and retrieves a node that matches the specified tag. Each node can have some text, some attributes (i.e. like hide) and some “child” nodes. Each node can be the top of a tree of nodes.

Name: ChuckAttr: yes

Using an XML parser such as ElementTree has the advantage that while the XML in this example is quite simple, it turns out there are many rules regarding valid XML and using ElementTree allows us to extract data from XML without worrying about the rules of XML syntax.