The lxml.objectify sub-package is extremely handy for parsing and creating XML. In this article, we will show how to create XML using the lxml package. We’ll start with some simple XML and then try to replicate it. Let’s get started!
In past articles, I have used the following silly example XML for demonstration purposes:
1181251680 040000008200E000 1181572063 1800 Bring pizza home 1234360800 1800 Check MS Office website for updates 604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800 dismissed
Let’s see how we can use lxml.objectify to recreate this XML:
from lxml import etree, objectify #---------------------------------------------------------------------- def create_appt(data): """ Create an appointment XML element """ appt = objectify.Element("appointment") appt.begin = data["begin"] appt.uid = data["uid"] appt.alarmTime = data["alarmTime"] appt.state = data["state"] appt.location = data["location"] appt.duration = data["duration"] appt.subject = data["subject"] return appt #---------------------------------------------------------------------- def create_xml(): """ Create an XML file """ xml = '''''' root = objectify.fromstring(xml) root.set("reminder", "15") appt = create_appt({"begin":1181251680, "uid":"040000008200E000", "alarmTime":1181572063, "state":"", "location":"", "duration":1800, "subject":"Bring pizza home"} ) root.append(appt) uid = "604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800" appt = create_appt({"begin":1234360800, "uid":uid, "alarmTime":1181572063, "state":"dismissed", "location":"", "duration":1800, "subject":"Check MS Office website for updates"} ) root.append(appt) # remove lxml annotation objectify.deannotate(root) etree.cleanup_namespaces(root) # create the xml string obj_xml = etree.tostring(root, pretty_print=True, xml_declaration=True) try: with open("example.xml", "wb") as xml_writer: xml_writer.write(obj_xml) except IOError: pass #---------------------------------------------------------------------- if __name__ == "__main__": create_xml()
Let’s break this down a bit. We will start with the create_xml function. In it we create an XML root object using the objectify module’s fromstring function. The root object will contain zAppointment as its tag. We set the root’s reminder attribute and then we call our create_appt function using a dictionary for its argument. In the create_appt function, we create an instance of an Element (technically, it’s an ObjectifiedElement) that we assign to our appt variable. Here we use dot-notation to create the tags for this element. Finally we return the appt element back and append it to our root object. We repeat the process for the second appointment instance.
The next section of the create_xml function will remove the lxml annotation. If you do not do this, you’re XML will end up looking like the following:
1181251680 040000008200E000 1181572063 1800 Bring pizza home 1234360800 604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800 1181572063 dismissed 1800 Check MS Office website for updates
To remove all that unwanted annotation, we call the following two functions:
objectify.deannotate(root) etree.cleanup_namespaces(root)
The last piece of the puzzle is to get lxml to generate the the XML itself. Here we use lxml’s etree module to do the hard work:
obj_xml = etree.tostring(root, pretty_print=True)
The tostring function will return a nice string of the XML and if you set pretty_print to True, it will usually return the XML in a nice format too.
Update 11/2020 for Python 3
The code in the previous section does not output the “prettified” XML to the file in Python 3. You must jump through a couple more hoops to make it work correctly. Here is an updated version of the code that does work. Tested in Python 3.9 on Mac OS:
from lxml import etree, objectify from io import BytesIO def create_appt(data): """ Create an appointment XML element """ appt = objectify.Element("appointment") appt.begin = data["begin"] appt.uid = data["uid"] appt.alarmTime = data["alarmTime"] appt.state = data["state"] appt.location = data["location"] appt.duration = data["duration"] appt.subject = data["subject"] return appt def create_xml(): """ Create an XML file """ xml = '''''' root = objectify.fromstring(xml) root.set("reminder", "15") appt = create_appt({"begin":1181251680, "uid":"040000008200E000", "alarmTime":1181572063, "state":"", "location":"", "duration":1800, "subject":"Bring pizza home"} ) root.append(appt) uid = "604f4792-eb89-478b-a14f-dd34d3cc6c21-1234360800" appt = create_appt({"begin":1234360800, "uid":uid, "alarmTime":1181572063, "state":"dismissed", "location":"", "duration":1800, "subject":"Check MS Office website for updates"} ) root.append(appt) # remove lxml annotation objectify.deannotate(root) etree.cleanup_namespaces(root) # create the xml string parser = etree.XMLParser(remove_blank_text=True) file_obj = BytesIO(etree.tostring(root)) tree = etree.parse(file_obj, parser) try: with open("example.xml", "wb") as xml_writer: tree.write(xml_writer, pretty_print=True) except IOError: pass if __name__ == "__main__": create_xml()
This is based on a solution found on StackOverflow.
You need to add a new import at the top of the file to get BytesIO. Then at the end of your code, you need modify your code to look like this:
# create the xml string parser = etree.XMLParser(remove_blank_text=True) file_obj = BytesIO(etree.tostring(root)) tree = etree.parse(file_obj, parser) try: with open("example.xml", "wb") as xml_writer: tree.write(xml_writer, pretty_print=True) except IOError: pass
This adds a new XML parser object that removes blank text from your root. This will happen after you turn the root into a byte string that is itself turned into a file-like object using BytesIO. Give it a shot and you should end up with a file that contains properly indented XML code.
Wrapping Up
Now you know how to use lxml’s objectify module to create XML. It’s a pretty handy interface and quite Pythonic for the most part.
Related Reading
Mike, you can write:
obj_xml = etree.tostring(root, pretty_print=True, xml_declaration=True)
to output the first declaration line. You can also add encoding=”UTF-8″ to be sure of your encoding.
I was trying to find that command as I knew I had seen it, but for some reason I couldn’t find it when I was writing the article. Thanks for the assist!
I use lxml often but never have used objectify. What is the benefit of using it over creating an ElementTree and adding elements with Element, SubElement? thanks for the article!
I think the thing I like most about objectify is that it turns XML documents into objects so I can access the tags in a very Pythonic way. Creating XML in objectify is actually pretty similar to ElementTree.
I am encountering the following error and encoding UTF-8 passes through xml variable causes it. Any idea to preserve UTF-8 format without error ?
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
I dont wamt to have the
In the xml, but when i try to remove them, i get this error:lxml.etree.XMLSyntaxError: Start tag expected, ‘<' not found, line 2, column 5
Any idea?