Querying for Fields in an Open XML WordprocessingML Document

May 3, 2011 at 4:22 pm · Filed under Open XML, WordprocessingML

I’ve written a blog post at OpenXMLDeveloper.org that presents some code to query an Open XML WordprocessingML document for fields. The code returns the field code for each field in the document. Using this code, it becomes trivial to query a document for all hyperlinks in the document. This will be the subject of my next post at OpenXMLDeveloper.org.

Permalink Comments

Deep Dive into Open XML WordprocessingML Fields and Hyperlinks – Part 2

April 27, 2011 at 6:27 pm · Filed under Open XML, WordprocessingML

I’ve posted the second video in this series on Open XML WordprocessingML fields and hyperlinks. In this video, I show how the MarkupSimplifier application makes the markup for fields more understandable. In addition, I explore the ways in which fields can be nested inside other fields. This is a powerful technique, but the markup requires a bit of explanation. Before watching this video, watch the first video of this deep-dive into Open XML WordprocessingML fields and hyperlinks.

Part 2 shows how MarkupSimplifier can be used to good effect. Also shows what the markup looks like when you nest fields within other fields.

Permalink Comments

Microsoft Interoperability: Open Specifications Developer Center

April 26, 2011 at 4:59 pm · Filed under Interoperability, Open XML

Microsoft is continually upgrading and improving their interoperability documentation. Recently, I received the word about an update to the Open Specifications Developer Center. There is a lot of updated material on the Learn Tab.

Some interesting learning modules:

Interoperability 101: The Basics: Learning module landing page
Introduction to Office Interoperability: Learning module landing page
Introduction to Office Open XML: Learning module landing page
Understanding Office Binary File Formats: Learning module landing page

Here are some new technical articles:

Permalink Comments

Deep Dive into Open XML WordprocessingML Fields and Hyperlinks

April 25, 2011 at 5:45 pm · Filed under Open XML, WordprocessingML

Fields are one of the most powerful components of WordprocessingML markup. You will see field markup in hyper-links, the TOC, dates, page references, calculated values, and much more. I’ve been asked a few questions lately about fields in WordprocessingML markup. Fields are perhaps one of the least understood aspects of WordprocessingML markup, but they are really not very hard. I’ve embarked on a four part series to explain field markup, show some example code that makes it easier to work with fields, and then show some code that reliably retrieves all hyper-links in an Open XML WordprocessingML markup. The following video is the first of this four part series:

Deep dive into Open XML WordprocessingML markup

Permalink Comments (3)

Ease your WordprocessingML Research using the Open XML Markup Simplifier Application

April 22, 2011 at 11:58 am · Filed under Open XML, PowerTools, WordprocessingML

Sometimes when researching Open XML WordprocessingML markup, extraneous markup gets in the way of your research. The extraneous markup makes it harder to see and understand the markup issues at hand. The MarkupSimplifier class (which is part of the PowerTools for Open XML project) can help a lot, but as downloaded from CodePlex, it is only a class. You need to write code to use the class, and if you want to use the markup simplifier as part of your research process, it is inconvenient. I’ve written a small WinForm application that uses the MarkupSimplifier class, and makes the use of the simplifier class much more seamless in your research.

You can download the Markup Simplifier Application at OpenXMLDeveloper.org. The code is attached to the blog post.

I’ve recorded a six minute video that shows the Markup Simplifier application in action:

This video shows how to build and run the Markup Simplifier application.

Permalink Comments

Release of Cross-Platform C Library for Open Packaging Conventions

April 21, 2011 at 11:57 pm · Filed under Open Packaging Conventions, Open XML

Doug Mahugh has announced the release of libOPC version 0.0.1, a new API for Open XML development, on Codeplex last week. From his blog post,

This API is the first open-source cross-platform API for developers working with Open Packaging Convention (OPC) packages as used by Open XML, XPS, and other formats. Full source code is available, and it’s written in portable C99, so can be used on all popular variants of Linux/Unix, Mac OS, Windows, Android, and many other platforms. The API uses other common cross-platform open-source APIs for some of the low-level details, including ZLIB for opening ZIP-compressed packages and libXML for parsing the XML streams from the parts in the package.

This is excellent news!

Permalink Comments

Change the Schema for Simple Free Doc Generation System

April 18, 2011 at 7:33 am · Filed under Document Generation Series, Open XML, WordprocessingML

I’ve posted a short (3 minute) screen-cast that shows how easy it is to change the schema for my simple document generation system that uses XPath expressions in Open XML WordprocessingML content controls. It was super-easy to do – I didn’t rehearse – just sat down and recorded the screen-cast in a single take.

This post is the 16th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Demonstrates changing the schema for this simple document generation system that you configure by writing XPath expressions in content controls.

Permalink Comments (1)

Creating an Open XML Protected Spreadsheet with Locked and Unlocked Cells

April 15, 2011 at 2:47 pm · Filed under Open XML, SpreadsheetML

Today, in response to a question at OpenXMLDeveloper.org, I put together a 7 minute screen-cast that shows how to create a protected spreadsheet with locked and unlocked cells. Similar to the screen-cast that I presented in How to Research Open XML Markup, this screen-cast also is a good example of the approach that I take to research Open XML markup.

Explores the markup necessary to create a protected spreadsheet with locked and unlocked cells.

Permalink Comments (7)

Iterate through all Content Controls in an Open XML WordprocessingML Document

April 11, 2011 at 7:35 am · Filed under Open XML, WordprocessingML

I’ve write a small blog post and example at OpenXMLDeveloper.org that shows how to iterate through all content controls in a word-processing document.

Permalink Comments

Review of XPath Semantics of LINQ to XML

April 8, 2011 at 5:55 am · Filed under Document Generation Series, Open XML, WordprocessingML

In this post, I review the XPath semantics of LINQ to XML, and show some concrete examples of how I use those semantics in the XPath-in-Content-Controls approach to Open XML WordprocessingML document generation.

This post is the 15th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

In the post Generating Open XML WordprocessingML Documents using XPath in Content Controls I show an example XML document that I use to drive the document generation process. To run the examples in this post, copy and save that XML document as Data.xml in the bin/debug directory, so that the example code can load that XML document.

To use the XPath extensions of LINQ to XML, in addition to the using directive for System.Xml.Linq, you need a using directive for System.Xml.XPath. This brings a few extension methods into scope. The following example demonstrates the XPathSelectElements extension method (an extension method on the XElement class).

using System; using System.Collections.Generic; using System.Linq; using System.Xml.Linq; using System.Xml.XPath;

class Program { static void Main(string[] args) { XElement data = XElement.Load("Data.xml"); IEnumerable<XElement> customers = data.XPathSelectElements("./Customer"); foreach (var customer in customers) Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value, customer.Element("Name").Value); } }

When you run this example with the data file shown in Generating Open XML WordprocessingML Documents using XPath in Content Controls, it outputs:

1:Andrew 2:Bob 3:Celcin

If you were to write this snippet using only LINQ to XML (not using the XPath extensions), it would look like this:

XElement data = XElement.Load("Data.xml"); IEnumerable<XElement> customers = data.Elements("Customer"); foreach (var customer in customers) Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value, customer.Element("Name").Value);

Performance of the XPath Extension Methods

With the small example that I show in the video of the performance of the XPath-in-Content-Controls approach, we are still IO bound! A couple of years ago, I did some rudimentary performance analysis of the XPath extension methods, and props to Ion Vasilian, the LINQ to XML developer who wrote those extension methods. They perform really well. If I recall correctly, on the particular tests that I selected, they were only about 30% slower than using the LINQ to XML axis methods (which are amazingly fast).

As an aside, Microsoft employees are often restricted from discussing actual performance numbers for various APIs. There are lots of good reasons for this – unless you do extensive analysis, making sure that you are covering the actual use cases that customers will encounter, you might make claims that do not hold true in real-world situations, raising liability issues, etc., etc. There are cases where Microsoft employees do discuss actual performance characteristics using specific numbers, but you can be sure that there were lots of meetings where architects, program managers, developers, and test developers discussed all possible ramifications ad-nauseum. But hey, I’m no longer a Microsoft employee, so I can tell you that in my off-the-cuff measurements, I saw that the XPath extension methods were only maybe 30% slower than the LINQ to XML axis methods.

Given that we are still IO bound on a 4-core laptop that is using an Intel solid state drive, and given that this approach can generate literally thousands of documents per minute, the XPath extension methods are fast enough! Good job, Ion!

Evaluating XPath Expressions in Context

In the post where I introduce the XPath-in-Content-Controls approach to document generation, Generating Open XML WordprocessingML Documents using XPath in Content Controls, I discuss how the approach that I take in the template document is analogous to putting together an XSLT style sheet using the ‘push’ approach. That approach can be summarized as follows:

The SelectDocuments XPath expression is evaluated in the context of the root element of the source XML document.
The SelectValue XPath expression is evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
The SelectRows XPath expression is also evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
The XPath expressions in the prototype row (the second row) of the table are evaluated in the context of each one of the elements in the result set returned by the SelectRows XPath expression.
As usual with XML, the value of an element is the concatenated text nodes of the element, in other words, the textual content of the element. And, as usual with LINQ to XML, you can determine the value of an element by using the XElement.Value property.

The following example shows how each one of the XPath expressions is evaluated.

using System; using System.Collections.Generic; using System.Linq; using System.Xml.Linq; using System.Xml.XPath;


class Program

{

    static void Main(string[] args)

    {

        XElement data = XElement.Load("Data.xml");
        // The following XPath expression is evaluated in the context of

        // the root element of the XML document.

        IEnumerable<XElement> customers = data.XPathSelectElements("./Customer");
        // Each document would be generated in one iteration of the following

        // loop.

        foreach (var customer in customers)

        {

            // Assemble the filename for the document.  We get the string format

            // and the XPath expression from the Config content control.

            string fileName = String.Format("File{0}.docx",

                customer.XPathSelectElement("./CustomerID").Value);

            Console.WriteLine("Generating document: {0}", fileName);
            // Retrieve the values referenced by the SelectValue content controls.

            // The XPath expression is evaluated in the context of the Customer

            // element.

            string name = customer.XPathSelectElement("./Name").Value;

            string customerID = customer.XPathSelectElement("./CustomerID").Value;

            Console.WriteLine("CustomerID:{0}", customerID);

            Console.WriteLine("Name:{0}", name);

// Retrieve the set of rows for the table. IEnumerable<XElement> rows = customer.XPathSelectElements("./Orders/Order"); foreach (var row in rows) { // Retrieve the values for each row, based on the XPath // expressions in the prototype row. string productDescription = row.XPathSelectElement("./ProductDescription").Value; string quantity = row.XPathSelectElement("./Quantity").Value; string orderDate = row.XPathSelectElement("./OrderDate").Value; Console.WriteLine( " ProductDescription:{0} Quantity:{1} OrderDate:{2}", productDescription, quantity, orderDate); } Console.WriteLine(); } } }
When you run this example, you will see the following output, which parallels exactly the generated documents.

Generating document: File1.docx CustomerID:1 Name:Andrew ProductDescription:Bike Quantity:2 OrderDate:5/1/2002 ProductDescription:Sleigh Quantity:2 OrderDate:11/1/2000 ProductDescription:Plane Quantity:2 OrderDate:2/19/2000


Generating document: File2.docx

CustomerID:2

Name:Bob

  ProductDescription:Boat Quantity:2 OrderDate:8/9/2000

  ProductDescription:Boat Quantity:4 OrderDate:3/25/2001

  ProductDescription:Bike Quantity:1 OrderDate:6/5/2002

Generating document: File3.docx CustomerID:3 Name:Celcin ProductDescription:Bike Quantity:2 OrderDate:2/24/2001 ProductDescription:Boat Quantity:4 OrderDate:5/6/2001

Permalink Comments

Eric White's Blog

Archive for Open XML

Querying for Fields in an Open XML WordprocessingML Document

Deep Dive into Open XML WordprocessingML Fields and Hyperlinks – Part 2

Microsoft Interoperability: Open Specifications Developer Center

Deep Dive into Open XML WordprocessingML Fields and Hyperlinks

Ease your WordprocessingML Research using the Open XML Markup Simplifier Application

Release of Cross-Platform C Library for Open Packaging Conventions

Change the Schema for Simple Free Doc Generation System

Creating an Open XML Protected Spreadsheet with Locked and Unlocked Cells

Iterate through all Content Controls in an Open XML WordprocessingML Document

Review of XPath Semantics of LINQ to XML

Performance of the XPath Extension Methods

Evaluating XPath Expressions in Context

Forums

Developer Content

User

Blog TOC

Archives

Categories

Search