There’s a C++ XML library called RapidXML which is perfect for most non-enterprise uses of XML. I wouldn’t call this a tutorial, but I hope this ends up helping someone. The documentation isn’t very explicit on how to output an XML declaration, for example.
How to create your XML from scratch and then output this XML into a string, with an XML declaration:
<?xml version="1.0" encoding="utf-8"?>
<rootnode version="1.0" type="example">
<childnode/>
</rootnode>
using namespace rapidxml;
xml_document<> doc;
// xml declaration
xml_node<>* decl = doc.allocate_node(node_declaration);
decl->append_attribute(doc.allocate_attribute("version", "1.0"));
decl->append_attribute(doc.allocate_attribute("encoding", "utf-8"));
doc.append_node(decl);
// root node
xml_node<>* root = doc.allocate_node(node_element, "rootnode");
root->append_attribute(doc.allocate_attribute("version", "1.0"));
root->append_attribute(doc.allocate_attribute("type", "example"));
doc.append_node(root);
// child node
xml_node<>* child = doc.allocate_node(node_element, "childnode");
root->append_node(child);
std::string xml_as_string;
// watch for name collisions here, print() is a very common function name!
print(std::back_inserter(xml_as_string), doc);
// xml_as_string now contains the XML in string form, indented
// (in all its angle bracket glory)
std::string xml_no_indent;
// print_no_indenting is the only flag that print() knows about
print(std::back_inserter(xml_as_string), doc, print_no_indenting);
// xml_no_indent now contains non-indented XML
Parsing and traversing an XML document like this one:
<?xml version="1.0" encoding="utf-8"?>
<rootnode version="1.0" type="example">
<childnode entry="1">
<evendeepernode attr1="cat" attr2="dog"/>
<evendeepernode attr1="lion" attr2="wolf"/>
</childnode>
<childnode entry="2">
</childnode>
</rootnode>
void traverse_xml(std::string input_xml)
{
// (input_xml contains the above XML)
// make a safe-to-modify copy of input_xml
// (you should never modify the contents of an std::string directly)
vector<char> xml_copy(input_xml.begin(), input_xml.end());
xml_copy.push_back('\0');
// only use xml_copy from here on!
xml_document<> doc;
// we are choosing to parse the XML declaration
// parse_no_data_nodes prevents RapidXML from using the somewhat surprising
// behavior of having both values and data nodes, and having data nodes take
// precedence over values when printing
// >>> note that this will skip parsing of CDATA nodes <<<
doc.parse<parse_declaration_node | parse_no_data_nodes>(&xml_copy[0]);
// alternatively, use one of the two commented lines below to parse CDATA nodes,
// but please note the above caveat about surprising interactions between
// values and data nodes (also read http://www.ffuts.org/blog/a-rapidxml-gotcha/)
// if you use one of these two declarations try to use data nodes exclusively and
// avoid using value()
//doc.parse<parse_declaration_node>(&xml_copy[0]); // just get the XML declaration
//doc.parse<parse_full>(&xml_copy[0]); // parses everything (slowest)
// since we have parsed the XML declaration, it is the first node
// (otherwise the first node would be our root node)
string encoding = doc.first_node()->first_attribute("encoding")->value();
// encoding == "utf-8"
// we didn't keep track of our previous traversal, so let's start again
// we can match nodes by name, skipping the xml declaration entirely
xml_node<>* cur_node = doc.first_node("rootnode");
string rootnode_type = cur_node->first_attribute("type")->value();
// rootnode_type == "example"
// go straight to the first evendeepernode
cur_node = cur_node->first_node("childnode")->first_node("evendeepernode");
string attr2 = cur_node->first_attribute("attr2")->value();
// attr2 == "dog"
// and then to the second evendeepernode
cur_node = cur_node->next_sibling("evendeepernode");
attr2 = cur_node->first_attribute("attr2")->value();
// now attr2 == "wolf"
}