Generating Open XML WordprocessingML Documents
Generating word-processing documents is perhaps the single most compelling use of Open XML. The archetypical case is an insurance company or bank that needs to generate 10’s of thousands of documents per month and archive them and make them available online, send them electronically, or print them and send via post. But there are about a million variations on this theme. In this blog series, I am going to examine the various approaches for document generation. I’m going to present code that demonstrates the various approaches.
This post is the first in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series
I have some goals for the code that I’ll be publishing:
- First and foremost, I want the document generation process to be data-driven from content controls that you configure in a template document.
- The approach that I want to take is that the template designer creates a document, inserts content controls with specific tags, and then inserts specific instructions into each content control.
- The data that we will supply to the document generation process will be a data-centric XML document. I’ll place a few constraints on this document. Some time ago, I wrote about Document-Centric Transforms using LINQ to XML. That post discusses data-centric vs. document-centric XML documents. When generating documents from another data source, such as a SQL database or an internal or secure Web service, the task will be to generate a data centric XML document from that source, and then kick off the document generation process.
- This code should be short and sweet. I don’t want to create some monolithic code base that would require a design process, formalized coding and testing procedures, and the like. The question is: how simple and how powerful can such a system be made? I’m hoping to stay under a 1000 lines of code. But we have some powerful tools at our disposal, most importantly using LINQ to XML in a functional style. Also, I probably will code a few recursive functional transforms.
I am contemplating four approaches for the instructions that the template designer will place in the content controls. The content controls could contain:
- Parameterized XPath expressions: This approach might be the easiest for the template designer to configure.
- XSLT sequence constructors: This approach possible might be the easiest to code. It might be very, very short if you exclude existing code such as transforming OPC back and forth to Flat OPC, OpenXmlCodeTester, and the axes I detailed in Mastering Text in Open XML WordprocessingML Documents. I am contemplating using XSLT 2.0.
- .NET code (either VB or C#): This approach reminds me of code that I presented in OpenXmlCodeTester: Validating Code in Open XML Documents. It might be cool to put a LINQ expression in a content control that projects a collection of rows and columns that become an table in the word-processing document. There could be some cool and easy ways to supply formatting.
- Some XML dialect that I invent as I go along.
I’m not sure which approach I’ll take. I want to play around with all four approaches, and see which one is easiest to use, and which one is easiest to develop. As I start playing around with these (and posting the code as I go along), I’ll make some design decisions, and list my reasons for the decisions.
By the way, I really love to have discussions about these things. If you agree or disagree with any of my design decisions, feel free to chime in. You can register so we can have more of a discussion, or post anonymously, as you like.
In the next post, I’m going to examine template documents, and define exactly what I mean by a template document.