Archive for Document Generation Series

Repeating Content in Document Generation System that uses XPath Expressions in Content Controls

I recently received a very good request for an enhancement to this document generation system.  The request was for a “Repeat” control that works in a similar way to tables, but instead of putting child records into a table, the document generation system generates a repeating section of content.

To review, here is what the template document looked like in the last iteration of this document generation system.  Below, you can see a screen-shot of the template document.  Following that screen-shot, there is a listing of the XML file that contains the data that will be used in the document generation process.

  • The green oval in the template document contains the XPath expression that selects the XML elements that contain the data for each of the documents.  That XPath expression selects the Customer elements in the XML document (also circled with a green oval).
  • Then, having selected the records for documents, the XPath expression in the blue oval selects the child records for the rows in the table.  The context nodes for that XPath expression are the Customer elements selected by the XPath expression in the green oval.  The selected elements in the XML document are encircled by a blue rounded rectangle.
  • And then finally, the XPath expressions circled by red select the values to place in the cells in the table.  In the XML document, the first set of nodes selected by those XPath expressions are also circled with a red oval.

Template1

XML1

The Repeat construct is parallel to that of a table. The following template document is similar in structure to the above template document, except that instead of generating a table, it generates repeating content.

Template2

When generated with the above XML document, the first document in the generated document looks as follows. I have encircled the repeating content with green rounded rectangles:

GenDocs1

Of course, due to the recursive implementation, you can get really elaborate with this setup. You can, for instance have repeating content within repeating content, or conditional content that contains a table within repeating content, and so on.

Download: 12-02-21-Gen-Docs-XPath

Comments (15)

Change the Schema for Simple Free Doc Generation System

I’ve posted a short (3 minute) screen-cast that shows how easy it is to change the schema for my simple document generation system that uses XPath expressions in Open XML WordprocessingML content controls.  It was super-easy to do – I didn’t rehearse – just sat down and recorded the screen-cast in a single take.

This post is the 16th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Demonstrates changing the schema for this simple document generation system that you configure by writing XPath expressions in content controls.

Comments (2)

Review of XPath Semantics of LINQ to XML

In this post, I review the XPath semantics of LINQ to XML, and show some concrete examples of how I use those semantics in the XPath-in-Content-Controls approach to Open XML WordprocessingML document generation.

This post is the 15th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

In the post Generating Open XML WordprocessingML Documents using XPath in Content Controls I show an example XML document that I use to drive the document generation process. To run the examples in this post, copy and save that XML document as Data.xml in the bin/debug directory, so that the example code can load that XML document.

To use the XPath extensions of LINQ to XML, in addition to the using directive for System.Xml.Linq, you need a using directive for System.Xml.XPath. This brings a few extension methods into scope. The following example demonstrates the XPathSelectElements extension method (an extension method on the XElement class).


using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;

class Program
{
    static void Main(string[] args)
    {
        XElement data = XElement.Load("Data.xml");
        IEnumerable<XElement> customers = data.XPathSelectElements("./Customer");
        foreach (var customer in customers)
            Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value,
                customer.Element("Name").Value);
    }
}

When you run this example with the data file shown in Generating Open XML WordprocessingML Documents using XPath in Content Controls, it outputs:


1:Andrew
2:Bob
3:Celcin

If you were to write this snippet using only LINQ to XML (not using the XPath extensions), it would look like this:


XElement data = XElement.Load("Data.xml");
IEnumerable<XElement> customers = data.Elements("Customer");
foreach (var customer in customers)
    Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value,
        customer.Element("Name").Value);

Performance of the XPath Extension Methods

With the small example that I show in the video of the performance of the XPath-in-Content-Controls approach, we are still IO bound! A couple of years ago, I did some rudimentary performance analysis of the XPath extension methods, and props to Ion Vasilian, the LINQ to XML developer who wrote those extension methods. They perform really well. If I recall correctly, on the particular tests that I selected, they were only about 30% slower than using the LINQ to XML axis methods (which are amazingly fast).

As an aside, Microsoft employees are often restricted from discussing actual performance numbers for various APIs. There are lots of good reasons for this – unless you do extensive analysis, making sure that you are covering the actual use cases that customers will encounter, you might make claims that do not hold true in real-world situations, raising liability issues, etc., etc. There are cases where Microsoft employees do discuss actual performance characteristics using specific numbers, but you can be sure that there were lots of meetings where architects, program managers, developers, and test developers discussed all possible ramifications ad-nauseum. But hey, I’m no longer a Microsoft employee, so I can tell you that in my off-the-cuff measurements, I saw that the XPath extension methods were only maybe 30% slower than the LINQ to XML axis methods.

Given that we are still IO bound on a 4-core laptop that is using an Intel solid state drive, and given that this approach can generate literally thousands of documents per minute, the XPath extension methods are fast enough! Good job, Ion!

Evaluating XPath Expressions in Context

In the post where I introduce the XPath-in-Content-Controls approach to document generation, Generating Open XML WordprocessingML Documents using XPath in Content Controls, I discuss how the approach that I take in the template document is analogous to putting together an XSLT style sheet using the ‘push’ approach. That approach can be summarized as follows:

  • The SelectDocuments XPath expression is evaluated in the context of the root element of the source XML document.
  • The SelectValue XPath expression is evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
  • The SelectRows XPath expression is also evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
  • The XPath expressions in the prototype row (the second row) of the table are evaluated in the context of each one of the elements in the result set returned by the SelectRows XPath expression.
  • As usual with XML, the value of an element is the concatenated text nodes of the element, in other words, the textual content of the element. And, as usual with LINQ to XML, you can determine the value of an element by using the XElement.Value property.

The following example shows how each one of the XPath expressions is evaluated.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;

class Program
{
    static void Main(string[] args)
    {
        XElement data = XElement.Load("Data.xml");

        // The following XPath expression is evaluated in the context of
        // the root element of the XML document.
        IEnumerable<XElement> customers = data.XPathSelectElements("./Customer");

        // Each document would be generated in one iteration of the following
        // loop.
        foreach (var customer in customers)
        {
            // Assemble the filename for the document.  We get the string format
            // and the XPath expression from the Config content control.
            string fileName = String.Format("File{0}.docx",
                customer.XPathSelectElement("./CustomerID").Value);
            Console.WriteLine("Generating document: {0}", fileName);

            // Retrieve the values referenced by the SelectValue content controls.
            // The XPath expression is evaluated in the context of the Customer
            // element.
            string name = customer.XPathSelectElement("./Name").Value;
            string customerID = customer.XPathSelectElement("./CustomerID").Value;
            Console.WriteLine("CustomerID:{0}", customerID);
            Console.WriteLine("Name:{0}", name);

            // Retrieve the set of rows for the table.
            IEnumerable<XElement> rows =
                customer.XPathSelectElements("./Orders/Order");
            foreach (var row in rows)
            {
                // Retrieve the values for each row, based on the XPath
                // expressions in the prototype row.
                string productDescription =
                    row.XPathSelectElement("./ProductDescription").Value;
                string quantity =
                    row.XPathSelectElement("./Quantity").Value;
                string orderDate =
                    row.XPathSelectElement("./OrderDate").Value;
                Console.WriteLine(
                    "  ProductDescription:{0} Quantity:{1} OrderDate:{2}",
                    productDescription, quantity, orderDate);
            }
            Console.WriteLine();
        }
    }
}

When you run this example, you will see the following output, which parallels exactly the generated documents.


Generating document: File1.docx
CustomerID:1
Name:Andrew
  ProductDescription:Bike Quantity:2 OrderDate:5/1/2002
  ProductDescription:Sleigh Quantity:2 OrderDate:11/1/2000
  ProductDescription:Plane Quantity:2 OrderDate:2/19/2000

Generating document: File2.docx
CustomerID:2
Name:Bob
  ProductDescription:Boat Quantity:2 OrderDate:8/9/2000
  ProductDescription:Boat Quantity:4 OrderDate:3/25/2001
  ProductDescription:Bike Quantity:1 OrderDate:6/5/2002

Generating document: File3.docx
CustomerID:3
Name:Celcin
  ProductDescription:Bike Quantity:2 OrderDate:2/24/2001
  ProductDescription:Boat Quantity:4 OrderDate:5/6/2001

Comments

Release of V2 of Doc Gen System: XPath in Content Controls

Today I’m posting the release of version 2 of my simple document generation system.  In this example, you configure the document generation process by creating a template document that contains content controls.  You then enter XPath expressions in those content controls.  Those XPath expressions specify the data that the document generator pulls from the source data.  The source data is an XML document that contains data for each and every document that you generate.  The source XML document can also contain detail (children records) that populate tables in the generated document.  I detailed how the template document works in the post Generating Open XML WordprocessingML Documents using XPath Expressions in Content Controls.

This post is the 14th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Download: Generate Open XML WordprocessingML Documents using XPath Expressions in Content Controls

In my opinion, the use of XPath expressions in content controls is a superior approach to the one of entering C# code in content controls.  The code is cleaner and smaller (this first example is less than 240 lines of code).

I’ve recorded a short (2 minute) screen-cast that demonstrates this example in action.

Demonstrates the XPath-in-Content-Controls approach to document generation

So please download the example, try it out, and give me feedback.

Comments (38)

Generating Open XML WordprocessingML Documents using XPath Expressions in Content Controls

Over the last few days, I have completed a new prototype of an approach to Open XML WordprocessingML document generation. In this approach, I control the document generation process by placing XPath expressions in content controls. In contrast, the previous approach in this series of posts on document generation was controlled by writing C# code in content controls.

This post is the 13th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

When I started down this path of discovery around document generation, I would not have predicted it, but the XPath-in-Content-Controls approach is, in my opinion, much superior to the C#-in-Content-Controls approach. Going forward, I am going to abandon the C#-in-Content-Controls approach, and focus on this approach using XPath. There are some very cool places that we can take this approach.

To compare and contrast, the C#-in-Content-Controls prototype consists of less than 400 lines of code. While it was not fully fleshed-out, and there remain many necessary refinements, I would expect that a finished version would be perhaps 3000 lines of code.

The XPath-in-Content-Controls prototype that I am introducing in this post is even smaller. It is less than 240 lines of code. It is simpler, more robust, and more amenable to polishing. I expect that the finished example, including integration into a document-level add-in for Word 2010 will be less than 1000 lines of code. I’ll be posting V1 of the prototype with the next post in this series.

Driven from an XML Document

One of the nice things about the C#-in-Content-Controls approach is that you could drive the document generation process from literally any data you could get your hands on from the .NET framework. In contrast, with this approach, there is one and only one form of data source, which is an XML document. And in this first prototype, I am restricting the data to an XML document that contains XML in no namespace. Allowing for namespaces in the XML means that I would need to provide mapping between namespaces and namespace prefixes, and that would get in the way of discussing the architecture and merits of this approach. I’ll deal with this in the future.

In the meantime, if you have XML that uses namespaces (or any other variety of data sources), your first task is to transform that data source to XML in no namespace.

The XML document should look something like this:

<Customers>
  <Customer>
    <CustomerID>1</CustomerID>
    <Name>Andrew</Name>
    <HighValueCustomer>True</HighValueCustomer>
    <Orders>
      <Order>
        <ProductDescription>Bike</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>5/1/2002</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Sleigh</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>11/1/2000</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Plane</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>2/19/2000</OrderDate>
      </Order>
    </Orders>
  </Customer>
  <Customer>
    <CustomerID>2</CustomerID>
    <Name>Bob</Name>
    <HighValueCustomer>False</HighValueCustomer>
    <Orders>
      <Order>
        <ProductDescription>Boat</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>8/9/2000</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Boat</ProductDescription>
        <Quantity>4</Quantity>
        <OrderDate>3/25/2001</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Bike</ProductDescription>
        <Quantity>1</Quantity>
        <OrderDate>6/5/2002</OrderDate>
      </Order>
    </Orders>
  </Customer>
  <Customer>
    <CustomerID>3</CustomerID>
    <Name>Celcin</Name>
    <HighValueCustomer>False</HighValueCustomer>
    <Orders>
      <Order>
        <ProductDescription>Bike</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>2/24/2001</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Boat</ProductDescription>
        <Quantity>4</Quantity>
        <OrderDate>5/6/2001</OrderDate>
      </Order>
    </Orders>
  </Customer>
</Customers>

While it isn’t required, it is more convenient to use a form where the Orders element is a child of the Customer element. The reason for this will become clear.

The XPath-in-Content-Controls Template Document

The next step in introducing this approach is to take a look at the template document that will drive document generation. While looking at this template, you can compare and contrast it to the template that contains C# code in content controls.

In this template document, I am going to borrow some nomenclature from XSLT. One of the attributes of the xsl:apply-templates element is the select attribute. If you place an XPath expression in the optional select attribute, XSLT will apply templates to the set of nodes that are selected by the XPath expression. The XPath expression is applied relative to the current context of the node that is currently being transformed by the sequence constructor. I am going to use a very similar approach in the template document. In effect, I am going to turn an Open XML WordprocessingML document into something that is analogous to an XSLT style sheet. Don’t worry if this is not immediately clear. It will be before the end of this blog post series. The point of this paragraph is that I’m going to use the term Select to indicate an XPath expression that will be evaluated, and the results of the evaluation will become the current context for other operations.

As usual, I am going to show content controls in design mode. Here is the template document, in its entirety. Of course, the circles and arrows are added by me to aid in explanation.

image

The Config Content Control (*1)

Starting at the bottom of the document, there is the Config content control, which contains XML, with a root element of Config.

The DataFileName element specifies the source XML document that contains the data that drives the document generation process.

The SelectDocuments element specifies an XPath expression that when evaluated against the root element of the document returns a collection of elements, each of which represent a document to be generated. In the case of the XML data file that I presented earlier, the XPath expression “./Customer” returns a collection of the Customer child elements of the root Customers element. Given that source data file, the document generation process will generate three documents.

The DocumentGenerationInfo element, and its child elements contains the necessary information to control the actual physical generation of the documents – the directory where the documents will be placed, a .NET StringFormat that works in conjunction with the SelectDocumentName XPath expression to assemble the generated FileName.

As an aside, I initially played around with nested content controls instead of having a single content control that contains XML. While this approach works, maintaining nested content controls using the Word 2007 or Word 2010 user interface is idiosyncratic. I could write a pretty detailed bug report around the maintainability of nested content controls. Maintaining the XML in a single content control is a more satisfactory approach.

The SelectValue Content Control (*2)

At the top of the template document, you can see the SelectValue content controls. As mentioned in the last section, the SelectDocuments XPath expression selects multiple Customer elements. While generating each document in turn, each Customer element becomes the current context. The SelectValue XPath expression is then evaluated in the context of each Customer element in turn. One of the circled SelectValue XPath expressions selects the Name child element of the Customer element. The other circled SelectValue XPath expression selects the CustomerID child element of the Customer element. In XML, the value of an element is defined to be the concatenated descendant text nodes (in other words, its textual content). The document generation engine retrieves the value of the selected element and replaces the content control with the value.

The Table Content Control (*3)

Just as the SelectValue content control is evaluated in the context of a Customer element, the SelectRows content control is also evaluated in the context of a Customer element. The difference is that SelectValue is expected to select a single element, whereas the SelectRows expression is expected to select a collection of elements, one for each row in the table. For customer #1 (Andrew), the SelectRows XPath expression selects three Customer elements. The XPath expressions (pointed to by *4) stored in the prototype row (the second row in the table) are evaluated in the context of each row selected by the SelectRows expression.

You also often see a similar pattern in properly written XSLT style sheets. One template is evaluated in the context of the root element, which selects a set of elements. An xsl:apply-templates causes an XPath expression to be evaluated in the context of each element selected by the first template. And an xsl:apply-templates in the sequence constructor of the second template causes an XPath expression to be evaluated in the context of each element selected by the second template, thereby causing a third set of templates to be applied.

Once you are familiar with this approach (sometimes called the ‘pull’ approach), you never write XSLT style sheets in any other way. Inexperienced XSLT developers sometimes try to write style sheets by using loops and calling templates explicitly, instead of letting the pattern matching power of XSLT to do the heavy lifting. This incorrect approach is sometimes called the ‘push’ approach.

To summarize, the SelectDocuments expression selects multiple elements, one for each document. The SelectRows expression, evaluated in the context of the elements selected by SelectDocuments, selects multiple elements, one for each row. The XPath expressions in the prototype row are evaluated in the context of the row elements selected by SelectRows.

The Conditional Content Control (*5)

The conditional content control works in exactly the same way as SelectValue and SelectRows. The SelectTestValue expression is evaluated in the context of the Customer element. The retrieved value is compared to the contents of the Match content control. If there is a match, the Conditional content control is replaced by the contents of the Content content control in the generated document.

Advantages of the XPath-in-Content-Controls Approach

There are several advantages to the XPath-in-Content-Controls approach over the C#-in-Content-Controls approach:

  • We eliminate the two-step process for generating documents. The program that processes the template (and processes all of the XPath expressions in the template) does the actual document generation. We don’t need to generate code, and then compile and run the generated code.
  • We can catch errors in the XPath expressions, and supply the template designer with good error messages that indicate the specific XPath expression that contains the error.
  • We eliminate all of the issues associated with typing C# code into content controls. When entering C# code in Word, of course there is no Intellisense. It could be difficult to catch errors in the C# code. The issues associated with replacing single or double quotes with smart quotes is significantly reduced. Note that the issues around quotes is not entirely eliminated. There are circumstances where the template designer may need to use single or double quotes in XPath expressions.

In the next post, I’ll show a video of this approach in action.

Future posts:

  • Show this approach at scale
  • Review XPath semantics of LINQ to XML
  • Examine the issues around namespaces in the source XML document
  • Show the process of changing the schema
  • Add robustness and error handling
  • Integrate as a document-level managed add-in for Word 2010.

This is fun!

Comments (17)

Changing the Schema for this Open XML Document Generation System

Flexibility in a document generation system is very important to its usability.  We all know how it works.  You’ve been commissioned by the marketing department to put together a mailing to 50,000 customers.  After doing the work of putting together the template document, the marketing department *will* come ask for changes to the data and to the template document.  In the following screen cast, I show the process of adjusting the XML data that drives the document generation system, as well as adjusting the template document to use that data.

This post is the 12th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Shows changing the schema and the template document.

There are lots of disadvantages to this approach of editing C# code in content controls in a Word document:

  • It requires a developer to put together the template document.
  • If you write C# code that doesn’t compile in a content control, you don’t see any errors until you try to compile the generated program.
  • There is no Intellisense when editing this code.  In a couple of places, I ended up first getting a snippet of code to work in Visual Studio, and then pasting that code into the content control.  This is far from ideal.
  • The code that generates C# code from the template document is not long – only about 390 lines of code (see ProcessTemplate.cs in the Zip file).  However, it is a bit gnarly, particularly the bits that make it so that you can have Value, Table, or Conditional content controls within a Conditional content control.  However, the C# code that you write inside the template document is not so complex – just the code to generate the code.

There are advantages too:

  • The code is directly associated and stored with the document.  This is called ‘lexical proximity’ – you don’t need to find code in another file somewhere, and you don’t need to keep code and the template document in sync.
  • You can pull data from *any* data source.  I could easily modify the template document to use OData or the Managed Client Object Model to pull data from a SharePoint list.  I could also write some ADO.NET code to pull data from any SQL database.

It is not clear that the advantages outweigh the disadvantages.  In the next post in this series, I’m going to limit the data source to XML, and use XPath in the content controls.

Comments (2)

Release of V1 of Simple DOCX Generation System

I have completed a preliminary version of this simple DOCX generation system, which you can download, unzip, and try.  You can find the zip file that contains all necessary bits here.

This post is the eleventh in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

The following 90 second screen-cast shows how to run the doc gen system after you download and unzip the zip file.

Demonstrates minimum number of steps to run the Open XML WordprocessingML document generator system

The following 2 1/2 minute video shows using the document generation system at scale.  I show generating 3000 documents in under a minute.

Comments (3)

Video of use of Document Generation Example

I have completed a rough first version of this document generation system that is driven by C# code that you write in content controls in a Word document.  As an intro, I’ve recorded a small screen-cast that shows the doc gen system in action.

This post is the tenth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Demonstrates an approach to Open XML WordprocessingML document generation that uses C# code in content controls.



V1 of the code that enables this approach to document generation is less than 400 lines of code, so this counts as simply an example program.  This shows the value of using functional programming, meta programming, and Open XML to reduce program size.

I have to note at this point – the example program contains almost no error handling.  If you mistype code in the content controls, you will encounter interesting compiler errors after loading the generated program.  In the long run, I expect to resolve these issues in an interesting way.  While at this point, I’m just playing around with document generation ideas, in the future, I want to build a system that is easy and convenient for non-developers to use.

I plan on posting this code sometime early next week, as well as a video that explains in more detail how the doc gen system works.

Comments (3)

A Super-Simple Template System

In the last post, I explored Text Templates (T4), and determined that using T4 text templates for my code generation needs would add complexity and not yield sufficient ROI (although I did determine that a doc gen example using T4 is interesting in its own right). However, my exploration into T4 text templates yielded one important point, which is that delimiting blocks of code using <# and #> is a good approach. This post details my super-simple template system, which will be more than adequate for building this first version of a doc gen system.

This post is the ninth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

For what it’s worth, I did a fair amount of reading of the C#, VB, and XML specs and determined to my own satisfaction that those combinations are fine. I could go on for about two pages, detailing exactly where the hash mark is allowed in all three languages, and why <# and #> are safe, but I’ll spare you the ordeal. In any case, these are the combinations that the T4 architects and program managers decided on, and I’m certain that an extraordinary amount of time was spent designing the T4 syntax.

I am going to make one more simplification, which is that in my super-simple template system, the <# must be the first two non-whitespace characters on a line, and that #> must be the last two non-whitespace characters. You will see that this makes the LINQ projection that processes the template very simple. Ultimately, this template system would be best implemented by defining a grammar and writing or using a real parser, but my main objective is to build a small example that enables us to explore document generation, so a shortcut is in order here.

To allow for further enhancements in the future, I’m going to specify that the contents will be a small XML document. While this makes the syntax a bit more verbose, we gain such advantages as XML schema validation, extensibility, and a familiar syntax. Following is an example of a template using this system. It contains two insertion blocks, one with the name of Using, and the other with the name of GeneratorMain:

<# <Insert Name="Using"/> #>

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
            <# <Insert Name="GeneratorMain"/> #>
        }
    }
}

The Using insertion block might be replaced with this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

The GeneratorMain insertion block could be replaced with this:

Console.WriteLine("Hello world");

Processing this template would then result in the following C# program:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
Console.WriteLine("Hello world");
        }
    }
}

The LINQ projection that processes this template, in its entirety, is:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml.Linq;

class Program
{
    // Simulated method that returns the text of a tagged content control.
    static string GetTextFromContentControl(string tag)
    {
        if (tag == "Using")
            return
@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;";
        if (tag == "GeneratorMain")
            return @"Console.WriteLine(""Hello world"");";
        return "error";
    }

    static void Main(string[] args)
    {
        string[] templateCode = File.ReadAllLines("template.txt");
        var filledTemplate = templateCode
            .Select(l =>
            {
                string trimmed = l.Trim();
                if (trimmed.StartsWith("<#") && trimmed.EndsWith("#>"))
                {
                    XElement insert = XElement.Parse(trimmed.Substring(2, trimmed.Length - 4));
                    string tag = insert.Attribute("Name").Value;
                    return GetTextFromContentControl(tag);
                }
                else
                    return l;
            })
            .ToArray();
        File.WriteAllLines("GeneratedDocGenProgram.cs", filledTemplate);

        // Print out the template for demonstration purposes.
        Console.WriteLine(File.ReadAllText("GeneratedDocGenProgram.cs"));
    }
}

To run this example, create a new C# console application. Save the template as a file named template.txt in the bin directory, and then run it. The example produces the following output:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
Console.WriteLine("Hello world");
        }
    }
}

Comments (7)

Text Templates (T4) and the Code Generation Process

As I was contemplating the process of generating the C# code that will do the document generation, I was drawn to the idea of using text templates, also known as T4. Text templates are a .NET code generation technology. I have never used text templates before, so I spent a few hours researching them to see their applicability in the Open XML WordprocessingML document generation process. The short version of this post is that I have decided against using text templates in this particular iteration of a document generation system. However, text templates are very cool, and have applicability in the Open XML document generation process. This post details my notes and thoughts on text templates, and gives my reasons for deciding against using them, although I am going to steal some ideas from them.

This post is the eighth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Text templates are a very cool technology (introduced in VS2008, I believe) that makes it easy to generate code as part of the application build process in Visual Studio. In addition, you can use text templates to generate files at runtime. You code text templates in a way that is similar to coding ASP.NET pages. Some portions of a text template contain literal text that is copied verbatim to the generated file, while other portions (similar to code blocks and expression holes) contain C# or VB code that you can use to programmatically generate portions of the generated file. For example, the following text template generates a simple XML document:

<#@ template debug="false" hostspecific="true" language="C#" #>
<#@ output extension=".xml" #>
<#@ assembly name="System.Xml" #>
<#@ assembly name="System.Xml.Linq" #>
<#@ import namespace="System.Xml.Linq" #>

<#
XElement e = new XElement("ChildElement", "with some data");
#>

<Root>
  <#= e #>
</Root>

Here is the generated XML document:

<Root>
  <ChildElement>with some data</ChildElement>
</Root>

There is a fair amount to learn about text templates. The lines that start with <#@ are directives. The assembly directives tell the text template to link with the specified assemblies. The import directive serves the same purpose as the using directive in a C# program. The code between <# and #> is executed when the template is evaluated. The line <#= e => serves the same purpose as an expression hole in other similar technologies.

The principle reason that I am not going to use text templates in this current effort is that I am writing a pure functional transform from a WordprocessingML document to a bunch of pure functional C# code that will generate a number of WordprocessingML documents. I would not be using the most powerful feature of text templates, which are expression holes, at least with the intent with which they were designed. The text template then becomes simply a mechanism to combine some boiler-plate code with some code generated by the functional transform. Pulling in the additional complexity of text templates doesn’t pay.

That said, text templates are interesting in the domain of Open XML document generation. Instead of approaching the problem as I am in this current series of posts, the approach using text templates would be generating a Flat OPC document. You can use LINQ to XML handily in text templates. The development of a doc gen system using text templates becomes one of finding the interesting markup in the Flat OPC document and writing some expression holes to generate the variable parts of the document. This is definitely going on my list of blog posts to write in the near future. It will be an easy post to write – perhaps only an hour or two will be required to build a rudimentary doc gen system (one with quite different characteristics from the one I’m currently building).

There are a few more interesting points to note about text templates. The most powerful and common use of text templates is to facilitate code generation from within a Visual Studio project. The text template is evaluated whenever it is saved, and the resulting generated C# or VB source file is compiled whenever you build the project. This allows you to generate code as part of the editing process, and then use the generated code from other modules seamlessly. This is super-interesting, but not really relevant to the problem of document generation. Document generation should ultimately be in the hands of the domain experts – the marketing folks, the customer relationship departments, and whoever has industrial-strength document generation requirements. We don’t want to require Visual Studio for the design process.

You can generate text files at run-time by using a pre-processed text template. There are limitations on what you can do with this approach. Effectively, you can define additional properties for the generated class behind the template. You can then use those properties in a non-dynamic way in the generated file. This is somewhat interesting, but doesn’t justify the additional complexity.

One feature of text templates is that you can write a ‘Custom Host’ that allows you to kick off the transformation process programmatically. There is an interesting note in the topic Processing Text Templates by using a Custom Host:

We do not recommend using text template transformations in server applications. We do not recommend using text template transformations except in a single thread. This is because the text templating Engine re-uses a single AppDomain to translate, compile, and execute templates. The translated code is not designed to be thread-safe. The Engine is designed to process files serially, as they are in a Visual Studio project at design time.

It is probably possible to design a robust document generation system using text templates, but you would have to take care to avoid any complications related to the above warning. You would also want to do a lot of testing at scale.

For more info about text templates, see Code Generation and Text Templates (T4).

Comments (1)

« Previous entries Next Page » Next Page »