In this post, I review the XPath semantics of LINQ to XML, and show some concrete examples of how I use those semantics in the XPath-in-Content-Controls approach to Open XML WordprocessingML document generation.
This post is the 15th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series
In the post Generating Open XML WordprocessingML Documents using XPath in Content Controls I show an example XML document that I use to drive the document generation process. To run the examples in this post, copy and save that XML document as Data.xml in the bin/debug directory, so that the example code can load that XML document.
To use the XPath extensions of LINQ to XML, in addition to the using directive for System.Xml.Linq, you need a using directive for System.Xml.XPath. This brings a few extension methods into scope. The following example demonstrates the XPathSelectElements extension method (an extension method on the XElement class).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;
class Program
{
static void Main(string[] args)
{
XElement data = XElement.Load("Data.xml");
IEnumerable<XElement> customers = data.XPathSelectElements("./Customer");
foreach (var customer in customers)
Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value,
customer.Element("Name").Value);
}
}
When you run this example with the data file shown in Generating Open XML WordprocessingML Documents using XPath in Content Controls, it outputs:
1:Andrew
2:Bob
3:Celcin
If you were to write this snippet using only LINQ to XML (not using the XPath extensions), it would look like this:
XElement data = XElement.Load("Data.xml");
IEnumerable<XElement> customers = data.Elements("Customer");
foreach (var customer in customers)
Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value,
customer.Element("Name").Value);
Performance of the XPath Extension Methods
With the small example that I show in the video of the performance of the XPath-in-Content-Controls approach, we are still IO bound! A couple of years ago, I did some rudimentary performance analysis of the XPath extension methods, and props to Ion Vasilian, the LINQ to XML developer who wrote those extension methods. They perform really well. If I recall correctly, on the particular tests that I selected, they were only about 30% slower than using the LINQ to XML axis methods (which are amazingly fast).
As an aside, Microsoft employees are often restricted from discussing actual performance numbers for various APIs. There are lots of good reasons for this – unless you do extensive analysis, making sure that you are covering the actual use cases that customers will encounter, you might make claims that do not hold true in real-world situations, raising liability issues, etc., etc. There are cases where Microsoft employees do discuss actual performance characteristics using specific numbers, but you can be sure that there were lots of meetings where architects, program managers, developers, and test developers discussed all possible ramifications ad-nauseum. But hey, I’m no longer a Microsoft employee, so I can tell you that in my off-the-cuff measurements, I saw that the XPath extension methods were only maybe 30% slower than the LINQ to XML axis methods.
Given that we are still IO bound on a 4-core laptop that is using an Intel solid state drive, and given that this approach can generate literally thousands of documents per minute, the XPath extension methods are fast enough! Good job, Ion!
Evaluating XPath Expressions in Context
In the post where I introduce the XPath-in-Content-Controls approach to document generation, Generating Open XML WordprocessingML Documents using XPath in Content Controls, I discuss how the approach that I take in the template document is analogous to putting together an XSLT style sheet using the ‘push’ approach. That approach can be summarized as follows:
- The SelectDocuments XPath expression is evaluated in the context of the root element of the source XML document.
- The SelectValue XPath expression is evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
- The SelectRows XPath expression is also evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
- The XPath expressions in the prototype row (the second row) of the table are evaluated in the context of each one of the elements in the result set returned by the SelectRows XPath expression.
- As usual with XML, the value of an element is the concatenated text nodes of the element, in other words, the textual content of the element. And, as usual with LINQ to XML, you can determine the value of an element by using the XElement.Value property.
The following example shows how each one of the XPath expressions is evaluated.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;
class Program
{
static void Main(string[] args)
{
XElement data = XElement.Load("Data.xml");
// The following XPath expression is evaluated in the context of
// the root element of the XML document.
IEnumerable<XElement> customers = data.XPathSelectElements("./Customer");
// Each document would be generated in one iteration of the following
// loop.
foreach (var customer in customers)
{
// Assemble the filename for the document. We get the string format
// and the XPath expression from the Config content control.
string fileName = String.Format("File{0}.docx",
customer.XPathSelectElement("./CustomerID").Value);
Console.WriteLine("Generating document: {0}", fileName);
// Retrieve the values referenced by the SelectValue content controls.
// The XPath expression is evaluated in the context of the Customer
// element.
string name = customer.XPathSelectElement("./Name").Value;
string customerID = customer.XPathSelectElement("./CustomerID").Value;
Console.WriteLine("CustomerID:{0}", customerID);
Console.WriteLine("Name:{0}", name);
// Retrieve the set of rows for the table.
IEnumerable<XElement> rows =
customer.XPathSelectElements("./Orders/Order");
foreach (var row in rows)
{
// Retrieve the values for each row, based on the XPath
// expressions in the prototype row.
string productDescription =
row.XPathSelectElement("./ProductDescription").Value;
string quantity =
row.XPathSelectElement("./Quantity").Value;
string orderDate =
row.XPathSelectElement("./OrderDate").Value;
Console.WriteLine(
" ProductDescription:{0} Quantity:{1} OrderDate:{2}",
productDescription, quantity, orderDate);
}
Console.WriteLine();
}
}
}
When you run this example, you will see the following output, which parallels exactly the generated documents.
Generating document: File1.docx
CustomerID:1
Name:Andrew
ProductDescription:Bike Quantity:2 OrderDate:5/1/2002
ProductDescription:Sleigh Quantity:2 OrderDate:11/1/2000
ProductDescription:Plane Quantity:2 OrderDate:2/19/2000
Generating document: File2.docx
CustomerID:2
Name:Bob
ProductDescription:Boat Quantity:2 OrderDate:8/9/2000
ProductDescription:Boat Quantity:4 OrderDate:3/25/2001
ProductDescription:Bike Quantity:1 OrderDate:6/5/2002
Generating document: File3.docx
CustomerID:3
Name:Celcin
ProductDescription:Bike Quantity:2 OrderDate:2/24/2001
ProductDescription:Boat Quantity:4 OrderDate:5/6/2001