Archive for WordprocessingML

Release of V1 of Simple DOCX Generation System

I have completed a preliminary version of this simple DOCX generation system, which you can download, unzip, and try.  You can find the zip file that contains all necessary bits here.

This post is the eleventh in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

The following 90 second screen-cast shows how to run the doc gen system after you download and unzip the zip file.

Demonstrates minimum number of steps to run the Open XML WordprocessingML document generator system

The following 2 1/2 minute video shows using the document generation system at scale.  I show generating 3000 documents in under a minute.

Comments (3)

Video of use of Document Generation Example

I have completed a rough first version of this document generation system that is driven by C# code that you write in content controls in a Word document.  As an intro, I’ve recorded a small screen-cast that shows the doc gen system in action.

This post is the tenth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Demonstrates an approach to Open XML WordprocessingML document generation that uses C# code in content controls.

V1 of the code that enables this approach to document generation is less than 400 lines of code, so this counts as simply an example program.  This shows the value of using functional programming, meta programming, and Open XML to reduce program size.

I have to note at this point – the example program contains almost no error handling.  If you mistype code in the content controls, you will encounter interesting compiler errors after loading the generated program.  In the long run, I expect to resolve these issues in an interesting way.  While at this point, I’m just playing around with document generation ideas, in the future, I want to build a system that is easy and convenient for non-developers to use.

I plan on posting this code sometime early next week, as well as a video that explains in more detail how the doc gen system works.

Comments (5)

A Super-Simple Template System

In the last post, I explored Text Templates (T4), and determined that using T4 text templates for my code generation needs would add complexity and not yield sufficient ROI (although I did determine that a doc gen example using T4 is interesting in its own right). However, my exploration into T4 text templates yielded one important point, which is that delimiting blocks of code using <# and #> is a good approach. This post details my super-simple template system, which will be more than adequate for building this first version of a doc gen system.

This post is the ninth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

For what it’s worth, I did a fair amount of reading of the C#, VB, and XML specs and determined to my own satisfaction that those combinations are fine. I could go on for about two pages, detailing exactly where the hash mark is allowed in all three languages, and why <# and #> are safe, but I’ll spare you the ordeal. In any case, these are the combinations that the T4 architects and program managers decided on, and I’m certain that an extraordinary amount of time was spent designing the T4 syntax.

I am going to make one more simplification, which is that in my super-simple template system, the <# must be the first two non-whitespace characters on a line, and that #> must be the last two non-whitespace characters. You will see that this makes the LINQ projection that processes the template very simple. Ultimately, this template system would be best implemented by defining a grammar and writing or using a real parser, but my main objective is to build a small example that enables us to explore document generation, so a shortcut is in order here.

To allow for further enhancements in the future, I’m going to specify that the contents will be a small XML document. While this makes the syntax a bit more verbose, we gain such advantages as XML schema validation, extensibility, and a familiar syntax. Following is an example of a template using this system. It contains two insertion blocks, one with the name of Using, and the other with the name of GeneratorMain:

<# <Insert Name="Using"/> #>

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
            <# <Insert Name="GeneratorMain"/> #>
        }
    }
}

The Using insertion block might be replaced with this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

The GeneratorMain insertion block could be replaced with this:

Console.WriteLine("Hello world");

Processing this template would then result in the following C# program:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
Console.WriteLine("Hello world");
        }
    }
}

The LINQ projection that processes this template, in its entirety, is:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml.Linq;

class Program
{
    // Simulated method that returns the text of a tagged content control.
    static string GetTextFromContentControl(string tag)
    {
        if (tag == "Using")
            return
@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;";
        if (tag == "GeneratorMain")
            return @"Console.WriteLine(""Hello world"");";
        return "error";
    }

    static void Main(string[] args)
    {
        string[] templateCode = File.ReadAllLines("template.txt");
        var filledTemplate = templateCode
            .Select(l =>
            {
                string trimmed = l.Trim();
                if (trimmed.StartsWith("<#") && trimmed.EndsWith("#>"))
                {
                    XElement insert = XElement.Parse(trimmed.Substring(2, trimmed.Length - 4));
                    string tag = insert.Attribute("Name").Value;
                    return GetTextFromContentControl(tag);
                }
                else
                    return l;
            })
            .ToArray();
        File.WriteAllLines("GeneratedDocGenProgram.cs", filledTemplate);

        // Print out the template for demonstration purposes.
        Console.WriteLine(File.ReadAllText("GeneratedDocGenProgram.cs"));
    }
}

To run this example, create a new C# console application. Save the template as a file named template.txt in the bin directory, and then run it. The example produces the following output:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
Console.WriteLine("Hello world");
        }
    }
}

Comments (5)

Text Templates (T4) and the Code Generation Process

As I was contemplating the process of generating the C# code that will do the document generation, I was drawn to the idea of using text templates, also known as T4. Text templates are a .NET code generation technology. I have never used text templates before, so I spent a few hours researching them to see their applicability in the Open XML WordprocessingML document generation process. The short version of this post is that I have decided against using text templates in this particular iteration of a document generation system. However, text templates are very cool, and have applicability in the Open XML document generation process. This post details my notes and thoughts on text templates, and gives my reasons for deciding against using them, although I am going to steal some ideas from them.

This post is the eighth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Text templates are a very cool technology (introduced in VS2008, I believe) that makes it easy to generate code as part of the application build process in Visual Studio. In addition, you can use text templates to generate files at runtime. You code text templates in a way that is similar to coding ASP.NET pages. Some portions of a text template contain literal text that is copied verbatim to the generated file, while other portions (similar to code blocks and expression holes) contain C# or VB code that you can use to programmatically generate portions of the generated file. For example, the following text template generates a simple XML document:

<#@ template debug="false" hostspecific="true" language="C#" #>
<#@ output extension=".xml" #>
<#@ assembly name="System.Xml" #>
<#@ assembly name="System.Xml.Linq" #>
<#@ import namespace="System.Xml.Linq" #>

<#
XElement e = new XElement("ChildElement", "with some data");
#>

<Root>
  <#= e #>
</Root>

Here is the generated XML document:

<Root>
  <ChildElement>with some data</ChildElement>
</Root>

There is a fair amount to learn about text templates. The lines that start with <#@ are directives. The assembly directives tell the text template to link with the specified assemblies. The import directive serves the same purpose as the using directive in a C# program. The code between <# and #> is executed when the template is evaluated. The line <#= e => serves the same purpose as an expression hole in other similar technologies.

The principle reason that I am not going to use text templates in this current effort is that I am writing a pure functional transform from a WordprocessingML document to a bunch of pure functional C# code that will generate a number of WordprocessingML documents. I would not be using the most powerful feature of text templates, which are expression holes, at least with the intent with which they were designed. The text template then becomes simply a mechanism to combine some boiler-plate code with some code generated by the functional transform. Pulling in the additional complexity of text templates doesn’t pay.

That said, text templates are interesting in the domain of Open XML document generation. Instead of approaching the problem as I am in this current series of posts, the approach using text templates would be generating a Flat OPC document. You can use LINQ to XML handily in text templates. The development of a doc gen system using text templates becomes one of finding the interesting markup in the Flat OPC document and writing some expression holes to generate the variable parts of the document. This is definitely going on my list of blog posts to write in the near future. It will be an easy post to write – perhaps only an hour or two will be required to build a rudimentary doc gen system (one with quite different characteristics from the one I’m currently building).

There are a few more interesting points to note about text templates. The most powerful and common use of text templates is to facilitate code generation from within a Visual Studio project. The text template is evaluated whenever it is saved, and the resulting generated C# or VB source file is compiled whenever you build the project. This allows you to generate code as part of the editing process, and then use the generated code from other modules seamlessly. This is super-interesting, but not really relevant to the problem of document generation. Document generation should ultimately be in the hands of the domain experts – the marketing folks, the customer relationship departments, and whoever has industrial-strength document generation requirements. We don’t want to require Visual Studio for the design process.

You can generate text files at run-time by using a pre-processed text template. There are limitations on what you can do with this approach. Effectively, you can define additional properties for the generated class behind the template. You can then use those properties in a non-dynamic way in the generated file. This is somewhat interesting, but doesn’t justify the additional complexity.

One feature of text templates is that you can write a ‘Custom Host’ that allows you to kick off the transformation process programmatically. There is an interesting note in the topic Processing Text Templates by using a Custom Host:

We do not recommend using text template transformations in server applications. We do not recommend using text template transformations except in a single thread. This is because the text templating Engine re-uses a single AppDomain to translate, compile, and execute templates. The translated code is not designed to be thread-safe. The Engine is designed to process files serially, as they are in a Visual Studio project at design time.

It is probably possible to design a robust document generation system using text templates, but you would have to take care to avoid any complications related to the above warning. You would also want to do a lot of testing at scale.

For more info about text templates, see Code Generation and Text Templates (T4).

Comments (1)

Refinement: Generating C# code from an XML Tree using Virtual Extension Methods

I’ve made great progress on this example to generate WordprocessingML documents from a template that is driven from C# that is written inside of content controls.  It took me longer than I thought it would, but I’ve been mostly on vacation this past week, so had to fit in coding around lots of other activities.  Also, there is Hofstadter’s law to take into account.

This post is the seventh in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Along the way, I determined that I needed to refine the approach for generating C# code from an XML tree.  There were two refinements:

  • I needed to use the @”” syntax for strings (called verbatim string literals).  This handles cases such as insignificant white space that contains new lines.  My document generation example doesn’t need this functionality, but I dislike incomplete solutions.
  • I added an approach for injecting arbitrary code into the code that creates the XML tree.  I need to inject the code that the developer types into content controls into the code that creates the XML tree.  My approach is that if the transform encounters an element with the fully qualified name of {http://www.ericwhite.com/xmlcodegeneration}Literal then the text contents of that element are directly injected into the code to create the XML tree.

The following code shows what I mean.  The code creates an XML tree that contains an element with the special namespace/name:

XNamespace ewx = "http://www.ericwhite.com/xmlcodegeneration";
XDocument root = new XDocument(
    new XElement("Root",
        new XElement("Child", 123),
        new XElement(ewx + "Literal",
          @"new XElement(""Data"", 123),  // injected code" + Environment.NewLine),
        new XElement("Child", 345)));
Console.WriteLine(LtxToCode.XDocumentToCode(root));

When you run this code, it generates the following code that contains directly injected code:

new XDocument(
  new XElement("Root",
    new XElement("Child",
      new XText(@"123")
    ),
new XElement("Data", 123),  // injected code
    new XElement("Child",
      new XText(@"345")
    )
  )
)

To avoid having two versions of the code posted on my blog, I’ve altered the previous post to contain the corrected code. In the next post, I’ll discuss pros and cons of text templates.

Comments

Simulating Virtual Extension Methods

When considering the problem of how to generate code that will create some arbitrary XML tree, it is interesting to examine the LINQ to XML class hierarchy, which uses polymorphism. The XObject class is an abstract base class of both the XAttribute and XNode classes. The XNode class is an abstract base class of XContainer, XComment, XDocumentType, XProcessingInstruction, and XText. XContainer is the base class for XElement and XDocument.

This post is the sixth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

The following diagram shows the class hierarchy of the LINQ to XML classes that are important when considering serializing a LINQ to XML tree as code. There are other classes in LINQ to XML, but they don’t impact this discussion.

Image1

Software systems such as LINQ to XML are typically written using a recursive approach. The XNode class has an abstract virtual method WriteTo. Each derived class must override the WriteTo method and provide its own method for serializing. When serializing an XElement object by calling WriteTo, internally, LINQ to XML calls the WriteTo method for each of the child nodes of the XElement object. This is the approach that we want to use when generating code to create an arbitrary XML tree.

What would be ideal is to write ‘Virtual Extension Methods’. There would be an abstract virtual extension method for XObject named ToCode. This extension method would be overridden in each of the concrete classes. Classes that include child objects, such as XElement and XDocument can then iterate through their children and use the virtual extension method to convert each node in the XML tree to the appropriate C# code to create the node.

However, C# does not include support for the idea of a virtual extension method. When you call an extension method, it is bound at compile time to an extension method based on the type of the variable, not based on the type of the object in the variable. To show exactly what I mean, consider the following snippet:

public static class MyExtensions
{
    public static string ToCode(this XObject o)
    {
        return "Called extension method on XObject";
    }

    public static string ToCode(this XElement e)
    {
        return "Called extension method on XElement";
    }
}

class Program
{
    static void Main(string[] args)
    {
        XElement e = new XElement("Root");
        Console.WriteLine(e.ToCode());
        XObject o = e;
        Console.WriteLine(o.ToCode());
    }
}

The above snippet calls the ToCode extension method twice for the same object. However, even though the type of the object is XElement, when the object is assigned to a variable with type XObject (which is valid because XElement is derived from XNode, which derives from XObject), and the program calls the ToCode extension method, the extension method on XElement is not called. Instead, the extension method on XObject is called.

Because the LINQ to XML programming interface is fairly simple, we can simulate virtual extension methods by implementing an extension method on XObject that simply dispatches to the appropriate extension method based on the actual type of the object. The following listing shows the implementation of the ToCode extension method for XObject.

public static string ToCode(this XObject xObject)
{
    XAttribute a = xObject as XAttribute;
    if (a != null)
        return a.ToCode();
    XElement element = xObject as XElement;
    if (element != null)
        return element.ToCode();
    XCData cdata = xObject as XCData;
    if (cdata != null)
        return cdata.ToCode();
    XText text = xObject as XText;
    if (text != null)
        return text.ToCode();
    XComment comment = xObject as XComment;
    if (comment != null)
        return comment.ToCode();
    XProcessingInstruction pi = xObject as XProcessingInstruction;
    if (pi != null)
        return pi.ToCode();
    throw new CodeGenerationException("Internal error");
}

Next, we can examine the ToCode method that is implemented on XAttribute. The code calls an Indentation method that determines the number of spaces to precede the ‘new XAttribute’. This enables the code to be properly indented, so that it is easy to examine the generated code. Other than this, it is pretty straightforward code to generate the code to new up an XAttribute object.

public static string ToCode(this XAttribute attribute)
{
    return Indentation(attribute) +
        String.Format("new XAttribute(\"{0}\", \"{1}\")," + Environment.NewLine,
        attribute.Name, attribute.Value);
}

As initially written, the generated code to create an XAttribute or XElement object includes appending the comma before the new line.  One minor issue that must be solved is that when passing a number of parameters to a function that takes a params array, a comma immediately before the closing parenthesis is invalid.

Image2

The approach I took to solve this problem is that after assembling all of the code to create the child nodes of an XElement object, the code calls a method to trim off the final comma.

public static string ToCode(this XElement element)
{
    var c = element.Attributes().Cast<XObject>().Concat(element.Nodes().Cast<XObject>());
    if (c.Count() == 0)
        return Indentation(element) +
            String.Format("new XElement(\"{0}\")," + Environment.NewLine, element.Name);
    else
        return Indentation(element) +
            String.Format("new XElement(\"{0}\"," + Environment.NewLine, element.Name) +
            TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) +
            Indentation(element) + ")," + Environment.NewLine;
}

TrimFinalComma looks like this:

private static string TrimFinalComma(string code)
{
    if (code.EndsWith("," + Environment.NewLine))
        return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) +
            Environment.NewLine;
    return code;
}

You can see the rest of the ToCode extension methods in Generating C# code from an XML Tree using Virtual Extension Methods.

One Final Note

Some people will recognize the similarity in functionality between the Paste XML as LINQ sample and the code presented in this and the last post.  The code presented in the Paste XML as LINQ sample is imperative code that iterates through the nodes outputting code.  I need a completely different structure for my code.  By coding as a recursive functional transform, I can easily alter the transform as appropriate for special purpose content controls that contain C#.

Comments

Generating C# code from an XML Tree using Virtual Extension Methods

One integral part of my scheme for building a document generation system is to write some code that generates C# code to create an arbitrary XML tree. I want to transform the markup for the main document part into C# code that will produce that main document part, with the exception that at various points where I find content controls, I want to alter the transformation as appropriate. The first task is to write code that produces C# code that will create any arbitrary XML tree.

This post is the fifth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

To demonstrate what I mean by code that generates code, here is a small snippet that parses a string, creates an XML tree, and then prints the code that the XElementToCode method produces:

XElement e = XElement.Parse(
@"<Root xmlns='http://www.ericwhite.com'>
  <Child>This is a text node.</Child>
  <!--Here is a comment.-->
</Root>");
Console.WriteLine("var z = {0};", LtxToCode.XElementToCode(e));

This produces the following automatically written code:

var z = new XElement("{http://www.ericwhite.com}Root",
  new XAttribute("xmlns", @"http://www.ericwhite.com"),
  new XElement("{http://www.ericwhite.com}Child",
    new XText(@"This is a text node.")
  ),
  new XComment(@"Here is a comment.")
);

The code that I present in this post uses expanded XML names, which deserve a bit of an explanation.

Expanded XML Names

In LINQ to XML, an expanded name is an approach that enables specification of a namespace and local name in a single string. The gist of it (which you can see in the example above) is that the namespace is enclosed in curly braces, followed by the local name.

The normal idiom when working with names and namespaces in LINQ to XML is to declare and initialize an XNamespace object, and then use the overload of the ‘+’ operator to combine the namespace with a local name to create a fully qualified name:

XNamespace ew = "http://www.ericwhite.com";
XElement root = new XElement(ew + "Root");
Console.WriteLine(root);

This snippet is identical in functionality to the following example, which uses an expanded name:

XElement root = new XElement("{http://www.ericwhite.com}Root");
Console.WriteLine(root);

While the second approach is perhaps marginally slower than the first approach, it is far easier to generate code that uses the second approach. If I used the first approach, I would need to setup a dictionary that maps namespace names to XNamespace object names, and then appropriately generate code that uses the correct XNamespace objects. It is a fair amount of housekeeping. So instead of using that approach, the generated code specifies fully qualified names using expanded names.

One note about the LINQ to XML programming interface: When you call the ToString method on an XName object, the returned string is an expanded name. For instance, the following code prints the fully qualified name of an element:

XElement root = XElement.Parse("<Root xmlns='http://www.ericwhite.com'/>");
Console.WriteLine(root.Name);

This outputs the expanded name:

{http://www.ericwhite.com}Root

About the Example

The code in the following example uses extension methods to implement a recursive transform from the XML tree to code that will create the XML tree.  The example contains a few sample XML documents that it converts to code.  It produces a C# that you can compile and run – the C# code instantiates two XML trees – one using the XElement.Parse method, and another using the C# code that is generated by the example.  The example then uses DeepEquals to validate that the two trees are identical.

To run the example:

Create a new C# console application

Copy and paste the following code into Program.cs.

Run the example.  This produces a new file, GeneratedTestProgram.cs.  You can examine the generated code for each XML tree in the generated program.

Next, we want to validate that the generated code actually generates the XML tree that it should.  Create a new C# console application, replace Program.cs in the new program with the generated program, and then run it to validate that the generated code produced the correct XML tree.

This example simulates the use of virtual extension methods, which made the example very easy to write.  In the next post, I’ll explain virtual extension methods.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml.Linq;

public static class LocalExtensions
{
    public static string StringConcatenate(this IEnumerable<string> source)
    {
        StringBuilder sb = new StringBuilder();
        foreach (string item in source)
            sb.Append(item);
        return sb.ToString();
    }
}

public static class LtxToCode
{
    public static XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
    public static XNamespace ew = "http://www.ericwhite.com/xmlcodegeneration";

    private static string Indentation(XObject xObject)
    {
        XAttribute attribute = xObject as XAttribute;
        if (attribute != null)
            return "".PadRight((attribute.Parent.Ancestors().Count() + 1 +
                (attribute.Parent.Document != null ? 1 : 0)) * 2, ' ');
        XElement element = xObject as XElement;
        if (element != null)
            return "".PadRight((element.Ancestors().Count() +
                (element.Document != null ? 1 : 0)) * 2, ' ');
        XDocument document = xObject as XDocument;
        if (document != null)
            return "";
        XProcessingInstruction pi = xObject as XProcessingInstruction;
        if (pi != null)
            return "".PadRight((pi.Ancestors().Count() +
                (pi.Document != null ? 1 : 0)) * 2, ' ');
        XNode node = xObject as XNode;
        if (node != null)
            if (node.Parent != null)
                return "".PadRight((node.Parent.Ancestors().Count() +
                    1 + (node.Document != null ? 1 : 0)) * 2, ' ');
            else
                return "";
        throw new CodeGenerationException("Internal error");
    }

    public static string ToCode(this XObject xObject)
    {
        XAttribute a = xObject as XAttribute;
        if (a != null)
            return a.ToCode();
        XElement element = xObject as XElement;
        if (element != null)
            return element.ToCode();
        XCData cdata = xObject as XCData;
        if (cdata != null)
            return cdata.ToCode();
        XText text = xObject as XText;
        if (text != null)
            return text.ToCode();
        XComment comment = xObject as XComment;
        if (comment != null)
            return comment.ToCode();
        XProcessingInstruction pi = xObject as XProcessingInstruction;
        if (pi != null)
            return pi.ToCode();
        throw new CodeGenerationException("Internal error");
    }

    public static string ToCode(this XDocument document)
    {
        var s = "new XDocument(" + Environment.NewLine +
            (document.Declaration != null ?
                String.Format("  new XDeclaration(\"{0}\", \"{1}\", \"{2}\")," +
                Environment.NewLine,
                document.Declaration.Version, document.Declaration.Encoding,
                document.Declaration.Standalone) :
                "") +
            TrimFinalComma(document
                .Nodes().Select(n => n.ToCode()).StringConcatenate()) +
            ")";
        return s;
    }

    public static string ToCode(this XElement element)
    {
        var c = element
            .Attributes()
            .Cast<XObject>()
            .Concat(element.Nodes().Cast<XObject>());
        if (element.Name == ew + "Literal")
            return element.Value;
        if (c.Count() == 0)
            return Indentation(element) +
                String.Format("new XElement(\"{0}\")," + Environment.NewLine,
                element.Name);
        else
            return Indentation(element) +
                String.Format("new XElement(\"{0}\"," + Environment.NewLine,
                    element.Name) +
                TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) +
                Indentation(element) + ")," + Environment.NewLine;
    }

    public static string ToCode(this XAttribute attribute)
    {
        return Indentation(attribute) +
            String.Format("new XAttribute(\"{0}\", @\"{1}\")," + Environment.NewLine,
            attribute.Name, attribute.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XText text)
    {
        return Indentation(text) +
            String.Format("new XText(@\"{0}\")," + Environment.NewLine, text.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XComment comment)
    {
        return Indentation(comment) +
            String.Format("new XComment(@\"{0}\")," + Environment.NewLine, comment.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XProcessingInstruction pi)
    {
        return Indentation(pi) +
            String.Format("new XProcessingInstruction(@\"{0}\", @\"{1}\")," +
                Environment.NewLine, pi.Target.Replace("\"", "\"\""), pi.Data.Replace("\"", "\"\""));
    }

    public static string ToCode(this XCData cdata)
    {
        return Indentation(cdata) +
            String.Format("new XCData(@\"{0}\")," + Environment.NewLine, cdata.Value.Replace("\"", "\"\""));
    }

    private static string TrimFinalComma(string code)
    {
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) +
                Environment.NewLine;
        return code;
    }

    public static string XElementToCode(XElement element)
    {
        string code = element.ToCode();
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);
        return code;
    }

    public static string XDocumentToCode(XDocument document)
    {
        string code = document.ToCode();
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);
        return code;
    }

    public class CodeGenerationException : Exception
    {
        public CodeGenerationException(string message)
            : base(message)
        {
        }
    }
}

public class GenerateDocumentException : Exception
{
    public GenerateDocumentException(string message)
        : base(message)
    {
    }
}

class Program
{
    static string[] testXml = new[] {
@"<Root a=""1"" b=""2""/>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root>
  <Child> abc </Child>
  <Child xmlns:space=""preserve""> abc </Child>
</Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root><![CDATA[foo]]></Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root><Child/></Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root/>",

@"<Root xmlns=""http://www.ericwhite.com/aaaaa"">
  <Child xmlns=""http://www.ericwhite.com/child"">
    <Element att=""1""
             b:att2=""2""
             xmlns:b=""http://www.ericwhite.com/bbbbb"">abc</Element>
  </Child>
</Root>",

@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc</a:Root>",

@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc<!--a comment -->def</a:Root>",

@"<Root>abc</Root>",

@"<Root att1=""1"" att2=""2""/>",

@"<Root/>",

@"<Root att1=""1"">
  <Child>
    <Gc1>abc<b/>def</Gc1>
  </Child>
  <Child>
    <Gc2>abc</Gc2>
  </Child>
</Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<?mso-application progid=""Word.Document""?>

<!--This is a comment at the root level.  There are also white space nodes at the root level.-->

<pkg:package xmlns:pkg=""http://schemas.microsoft.com/office/2006/xmlPackage"">
  <pkg:part pkg:name=""/_rels/.rels""
            pkg:contentType=""application/vnd.openxmlformats-package.relationships+xml""
            pkg:padding=""512"">
    <pkg:xmlData>
      <Relationships
          xmlns=""http://schemas.openxmlformats.org/package/2006/relationships"">
        <Relationship Id=""rId3""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""docProps/app.xml""/>
        <Relationship Id=""rId2""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""docProps/core.xml""/>
        <Relationship Id=""rId1""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""word/document.xml""/>
      </Relationships>
    </pkg:xmlData>
  </pkg:part>
</pkg:package>",
};

    static void Main(string[] args)
    {
        string st1 = (@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
");
        StringBuilder sb = new StringBuilder();
        // test as XElement
        for (int i = 0; i < testXml.Length; i++)
        {
            sb.Append(
                String.Format("var xElementSourceTree{0} = XElement.Parse(@\"{1}\");",
                i, testXml[i].Replace("\"", "\"\"")));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format("var xElementCodeTree{0} = {1};",
                i, LtxToCode.XElementToCode(XElement.Parse(testXml[i]))));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "if (XNode.DeepEquals(xElementSourceTree{0}, xElementCodeTree{0}))", i));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XElement Test {0} Passed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append("else");
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XElement Test {0} Failed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append(Environment.NewLine);
        }
        // test as XDocument
        for (int i = 0; i < testXml.Length; i++)
        {
            sb.Append(String.Format(
                "var xDocumentSourceTree{0} = XDocument.Parse(@\"{1}\");",
                i, testXml[i].Replace("\"", "\"\"")));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format("var xDocumentCodeTree{0} = {1};",
                i, LtxToCode.XDocumentToCode(XDocument.Parse(testXml[i]))));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
              "if (XNode.DeepEquals(xDocumentSourceTree{0}, xDocumentCodeTree{0}))", i));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XDocument Test {0} Passed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append("else");
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XDocument Test {0} Failed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append(Environment.NewLine);
        }
        string st2 = @"        }
    }
}";
        string fullProgram = st1 + sb.ToString() + st2;
        File.WriteAllText("GeneratedTestProgram.cs", fullProgram);
    }
}

Comments (6)

More Enhancements to the Document Template

The next step in building this document generation system is to define a few content controls that allow the template designer to write other necessary C# code for the document generation process.  As an example, the template designer needs to write code that executes when the document generation process starts, i.e. the code that will go in Main.  There is other code the template designer needs to write – details below.

This post is the fourth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

For this particular approach to building a document generation system, I am planning on the following:

  • The document generation process will first process the document template and generate a new C# program.
  • After generating the new program, the document generation process will compile the new program, and then run it, which will then generate all documents for this document generation run.
  • There are, of course, pros and cons to this approach. I’ll mention some of them as I go along.

The generated code will be a combination of:

  • Code to create the Open XML SDK package, and to create an in-memory clone of the template document.
  • LINQ to XML code to update the contents of the main document part in the package. I’ll have more to say about this in the next post. I’m going to write a pure functional transformation from am XML tree to C# code to create the tree. The generated code will use functional construction, not a DOM style approach.
  • Code extracted from the template document and inserted at appropriate places so that it generates the proper WordprocessingML markup. The code to extract the content of the content controls, and include the content control code in the appropriate places. This will be just a few lines of code.

This sounds more complicated than it is. In upcoming posts, I’ll walk through how the code works in detail.  Also, when completed, this example will be super-easy to run.  It will be educational to play around with code in the content controls to generate a wide variety of documents.

I have further refined the C# code that the template designer will write inside of content controls. In addition to that refinement, I have added four more content controls where it doesn’t matter where the template designer places them in the document. These content controls contain code that we need to make the process work – they don’t contain code directly associated with generating content.

The Generated Program Structure

Before discussing the code in the various content controls, I want to discuss the structure of the generated program. There will be a class named Generator, which will be generated in the code generation process. This class will have a couple of properties or fields, and will have one method (beyond the constructor). There will one instance of this class for each generated document.

The code in the Value, Table, and Conditional content controls executes in the context of an instance of the Generator class. Therefore, the code can access instance fields and properties.

New Version of the Template Document

First, I’ll examine the new version of the template document. The titles of the content controls are the same, but the code inside is different from the last post. As before, there are the Value content controls that contain values to be inserted into the document.  I agree with feedback that Svetlin provided – the value derived from this content control will use the formatting of the underlying run or paragraph.  No need to specify a style.

The following code accesses an instance property of the Generator class.  The instance property is Cust.

image

The Table content control looks about the same as in the last post, except that it is now written to access fields in the Generator object:

image

Same thing with Conditional – it can access the Cust instance property.

image

The design of the Ask content control is the same as in the last iteration of this template document:

image

There are four new content controls in this template that enable the template designer to write the necessary code so that the document generation process can generate a C# program that is complete and can compile:

  • Using
  • Classes
  • GeneratorMembers
  • GeneratorMain

The Using content control contains the using statements for the generated program. The generated program may be so simple that it only uses LINQ to XML, as in the example I’m working up. However, it could be more interesting – it might access an OData feed. It might connect to any arbitrary database or web service. It can connect to any data source that you can get to with .NET. Therefore, we need the ability to specify the using statements for the generated program:

image

The Classes content control enables the template designer to define some classes that contain the data that the program reads from the data source.  In this first example, there is a Customer class and an Order class:

image

The GeneratorMembers content control enables the template designer to specify members of the Generator class.  This field contains the customer that a particular Generator object is generating a document for.

image

Last, but not least, there will be the GeneratorMain content control, which contains the code that will go in the Main method in the generated program:

image

This GeneratorMain is expecting to read an XML document that looks like this:

image

And the generated program, after everything is said and done, will look something like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;

namespace DocumentGenerator
{
    class Customer
    {
        public int CustomerID;
        public string Name;
        public bool IsHighValueCustomer;
    }

    class Order
    {
        public int CustomerID;
        public string ProductDescription;
        public decimal Quantity;
        public DateTime OrderDate;
    }

    class Generator
    {
        Customer Cust;

        void GenerateDocument()
        {
            // This method will be automatically generated during the generation process.
            // There will be a lot of code in this method.
            Console.WriteLine("Generating document for customer {0}", Cust.CustomerID);
        }

        static void Main(string[] args)
        {
            XElement data = XElement.Load("Data.xml");
            var customers = data
                .Elements("Customers")
                .Elements("Customer")
                .Select(c => new Customer() {
                    CustomerID = (int)c.Element("CustomerID"),
                    Name = (string)c.Element("Name"),
                    IsHighValueCustomer = (bool)c.Element("IsHighValueCustomer"),
                });
            var orders = data
                .Elements("Orders")
                .Elements("Order")
                .Select(o => new Order() {
                    CustomerID = (int)o.Element("CustomerID"),
                    ProductDescription = (string)o.Element("ProductDescription"),
                    Quantity = (decimal)o.Element("Quantity"),
                    OrderDate = (DateTime)o.Element("OrderDate"),
                });
            Generator p = new Generator();
            foreach (var customer in customers)
            {
                p.Cust = customer;
                p.GenerateDocument();
            }
        }
    }
}

In the next blog post, I’m going to discuss generating C# code from an XML tree.  This code needs to use functional construction, so that the code generation process can insert queries at all appropriate points in the generated code.  That is going to be fun code to write!!!

Smile

Comments (5)

The Second Iteration of the Template Document

After great feedback by Svetlin, also after some more contemplation about tables, this post presents the second iteration on a template document to be used for a document generation process.

This post is the third in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

One additional goal that I have for these document templates is that if necessary, the template designer can specify formatting for a field or for a cell in a table. To facilitate this, I’m going add the capability to specify the style in a separate nested content control.

In the following template, there are five content controls. The first is a value with a style. The second is a value that uses the style of the containing paragraph. The third generates a table from the query. The table is formatted with the table style of the sample table. The fourth shows conditional content. The last specifies that the user should be asked a question, the answer to which must be shorter than 256 characters.

I am certain that the design for this document template will be refined over the next couple of weeks.

Comments (5)

Using a WordprocessingML Document as a Template in the Document Generation Process

In this post, I examine the approaches for building a template document for the document generation process.

This post is the second in a series of blog posts.  Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

In my approach to document generation, a template document is a DOCX document that contains content controls that will control the document generation process.  The document template designer can format this document as desired, and the document generation process will generate documents that have the format of the template document.

When working with content controls, first of all, remember that you need to turn on the developer tab in the ribbon.  Click File => Options => Customize Ribbon, and then turn on the developer tab:

Turning on the Developer Tab

Turning on the Developer Tab

Another point that will make it easier to work with content controls is to turn on design mode.  If design mode is turned off (which is the default), content controls have a square boxed appearance with a tab at the top that contains the title of the content control:

Content Control - not in Design Mode

Content Control - not in Design Mode

This is not a problem, except that if the focus is not in a content control, there is no visual indication that the content control is there.  Instead, turn on design mode:

Turning on design mode

Turning on design mode

With design mode turned on, content controls have blue tags that indicate the beginning and end of the location of a content control.  With design mode turned on, a template document will look something like the following:

Sample template document with content controls

Sample template document with content controls

In this document, plain text content controls contain a LINQ query that returns a single value.  Formatting is easy – the value returned by the query takes on the formatting of the containing run or paragraph.

In this document, the rich text content control with Table as its title contains a LINQ query that returns a collection of anonymous types.  The results of the query will be inserted into the document as a WordprocesssingML table.  The inserted table will have the formatting of the empty table that is inserted into the rich text content control.

Other uses of the word ‘Template’ in Microsoft Office

One minor issue around the idea of creating a template WordprocessingML document is that the term ‘template’ is overloaded.  Microsoft Word has the notion of ‘Document Templates’, which are saved with the dotx extension.  These are WordprocessingML documents with one special characteristic – when the user opens one of these documents, the user cannot directly save back to the dotx file – the user must instead supply a new filename, and Word will append docx as the extension.

In addition, related to dotx document templates are ‘document template projects’ in Visual Studio 2010 (and 2008).  These are template-based document-level projects (see Architecture of Document-Level Customizations) that consist of managed code that is attached to a document template instead of a document.  The user opens the template, uses the managed customization to do whatever it does, and then saves as a docx document.  The docx document can have a managed customization, or it can be stripped of the customization, leaving a plain old docx.

For this document generation project, we don’t need to use either of these facilities.  Instead, the template document that the designer creates is, as far as Word is concerned, an ordinary word-processing document.

Comments (4)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »