Archive for Document Generation Series

Refinement: Generating C# code from an XML Tree using Virtual Extension Methods

February 18, 2011 at 1:55 pm · Filed under Document Generation Series, Open XML, WordprocessingML

I’ve made great progress on this example to generate WordprocessingML documents from a template that is driven from C# that is written inside of content controls. It took me longer than I thought it would, but I’ve been mostly on vacation this past week, so had to fit in coding around lots of other activities. Also, there is Hofstadter’s law to take into account.

This post is the seventh in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Along the way, I determined that I needed to refine the approach for generating C# code from an XML tree. There were two refinements:

I needed to use the @”” syntax for strings (called verbatim string literals). This handles cases such as insignificant white space that contains new lines. My document generation example doesn’t need this functionality, but I dislike incomplete solutions.
I added an approach for injecting arbitrary code into the code that creates the XML tree. I need to inject the code that the developer types into content controls into the code that creates the XML tree. My approach is that if the transform encounters an element with the fully qualified name of {http://www.ericwhite.com/xmlcodegeneration}Literal then the text contents of that element are directly injected into the code to create the XML tree.

The following code shows what I mean. The code creates an XML tree that contains an element with the special namespace/name:

XNamespace ewx = "http://www.ericwhite.com/xmlcodegeneration"; XDocument root = new XDocument( new XElement("Root", new XElement("Child", 123), new XElement(ewx + "Literal", @"new XElement(""Data"", 123), // injected code" + Environment.NewLine), new XElement("Child", 345))); Console.WriteLine(LtxToCode.XDocumentToCode(root));

When you run this code, it generates the following code that contains directly injected code:

new XDocument( new XElement("Root", new XElement("Child", new XText(@"123") ), new XElement("Data", 123), // injected code new XElement("Child", new XText(@"345") ) ) )

To avoid having two versions of the code posted on my blog, I’ve altered the previous post to contain the corrected code. In the next post, I’ll discuss pros and cons of text templates.

Permalink Comments

Simulating Virtual Extension Methods

February 11, 2011 at 3:49 pm · Filed under Document Generation Series, Functional Programming, Open XML, WordprocessingML

When considering the problem of how to generate code that will create some arbitrary XML tree, it is interesting to examine the LINQ to XML class hierarchy, which uses polymorphism. The XObject class is an abstract base class of both the XAttribute and XNode classes. The XNode class is an abstract base class of XContainer, XComment, XDocumentType, XProcessingInstruction, and XText. XContainer is the base class for XElement and XDocument.

This post is the sixth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

The following diagram shows the class hierarchy of the LINQ to XML classes that are important when considering serializing a LINQ to XML tree as code. There are other classes in LINQ to XML, but they don’t impact this discussion.

Software systems such as LINQ to XML are typically written using a recursive approach. The XNode class has an abstract virtual method WriteTo. Each derived class must override the WriteTo method and provide its own method for serializing. When serializing an XElement object by calling WriteTo, internally, LINQ to XML calls the WriteTo method for each of the child nodes of the XElement object. This is the approach that we want to use when generating code to create an arbitrary XML tree.

What would be ideal is to write ‘Virtual Extension Methods’. There would be an abstract virtual extension method for XObject named ToCode. This extension method would be overridden in each of the concrete classes. Classes that include child objects, such as XElement and XDocument can then iterate through their children and use the virtual extension method to convert each node in the XML tree to the appropriate C# code to create the node.

However, C# does not include support for the idea of a virtual extension method. When you call an extension method, it is bound at compile time to an extension method based on the type of the variable, not based on the type of the object in the variable. To show exactly what I mean, consider the following snippet:
public static class MyExtensions { public static string ToCode(this XObject o) { return "Called extension method on XObject"; }


    public static string ToCode(this XElement e)

    {

        return "Called extension method on XElement";

    }

}

class Program { static void Main(string[] args) { XElement e = new XElement("Root"); Console.WriteLine(e.ToCode()); XObject o = e; Console.WriteLine(o.ToCode()); } }

The above snippet calls the ToCode extension method twice for the same object. However, even though the type of the object is XElement, when the object is assigned to a variable with type XObject (which is valid because XElement is derived from XNode, which derives from XObject), and the program calls the ToCode extension method, the extension method on XElement is not called. Instead, the extension method on XObject is called.

Because the LINQ to XML programming interface is fairly simple, we can simulate virtual extension methods by implementing an extension method on XObject that simply dispatches to the appropriate extension method based on the actual type of the object. The following listing shows the implementation of the ToCode extension method for XObject.
public static string ToCode(this XObject xObject) { XAttribute a = xObject as XAttribute; if (a != null) return a.ToCode(); XElement element = xObject as XElement; if (element != null) return element.ToCode(); XCData cdata = xObject as XCData; if (cdata != null) return cdata.ToCode(); XText text = xObject as XText; if (text != null) return text.ToCode(); XComment comment = xObject as XComment; if (comment != null) return comment.ToCode(); XProcessingInstruction pi = xObject as XProcessingInstruction; if (pi != null) return pi.ToCode(); throw new CodeGenerationException("Internal error"); }

Next, we can examine the ToCode method that is implemented on XAttribute. The code calls an Indentation method that determines the number of spaces to precede the ‘new XAttribute’. This enables the code to be properly indented, so that it is easy to examine the generated code. Other than this, it is pretty straightforward code to generate the code to new up an XAttribute object.
public static string ToCode(this XAttribute attribute) { return Indentation(attribute) + String.Format("new XAttribute(\"{0}\", \"{1}\")," + Environment.NewLine, attribute.Name, attribute.Value); }

As initially written, the generated code to create an XAttribute or XElement object includes appending the comma before the new line. One minor issue that must be solved is that when passing a number of parameters to a function that takes a params array, a comma immediately before the closing parenthesis is invalid.

The approach I took to solve this problem is that after assembling all of the code to create the child nodes of an XElement object, the code calls a method to trim off the final comma.
public static string ToCode(this XElement element) { var c = element.Attributes().Cast<XObject>().Concat(element.Nodes().Cast<XObject>()); if (c.Count() == 0) return Indentation(element) + String.Format("new XElement(\"{0}\")," + Environment.NewLine, element.Name); else return Indentation(element) + String.Format("new XElement(\"{0}\"," + Environment.NewLine, element.Name) + TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) + Indentation(element) + ")," + Environment.NewLine; }

TrimFinalComma looks like this:
private static string TrimFinalComma(string code) { if (code.EndsWith("," + Environment.NewLine)) return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) + Environment.NewLine; return code; }

You can see the rest of the ToCode extension methods in Generating C# code from an XML Tree using Virtual Extension Methods.

One Final Note

Some people will recognize the similarity in functionality between the Paste XML as LINQ sample and the code presented in this and the last post. The code presented in the Paste XML as LINQ sample is imperative code that iterates through the nodes outputting code. I need a completely different structure for my code. By coding as a recursive functional transform, I can easily alter the transform as appropriate for special purpose content controls that contain C#.

Permalink Comments

Generating C# code from an XML Tree using Virtual Extension Methods

February 7, 2011 at 8:38 pm · Filed under Document Generation Series, Open XML, WordprocessingML

One integral part of my scheme for building a document generation system is to write some code that generates C# code to create an arbitrary XML tree. I want to transform the markup for the main document part into C# code that will produce that main document part, with the exception that at various points where I find content controls, I want to alter the transformation as appropriate. The first task is to write code that produces C# code that will create any arbitrary XML tree.

This post is the fifth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

To demonstrate what I mean by code that generates code, here is a small snippet that parses a string, creates an XML tree, and then prints the code that the XElementToCode method produces:
XElement e = XElement.Parse( @"<Root xmlns='http://www.ericwhite.com'> <Child>This is a text node.</Child>  </Root>"); Console.WriteLine("var z = {0};", LtxToCode.XElementToCode(e));

This produces the following automatically written code:
var z = new XElement("{http://www.ericwhite.com}Root", new XAttribute("xmlns", @"http://www.ericwhite.com"), new XElement("{http://www.ericwhite.com}Child", new XText(@"This is a text node.") ), new XComment(@"Here is a comment.") );

The code that I present in this post uses expanded XML names, which deserve a bit of an explanation.

Expanded XML Names

In LINQ to XML, an expanded name is an approach that enables specification of a namespace and local name in a single string. The gist of it (which you can see in the example above) is that the namespace is enclosed in curly braces, followed by the local name.

The normal idiom when working with names and namespaces in LINQ to XML is to declare and initialize an XNamespace object, and then use the overload of the ‘+’ operator to combine the namespace with a local name to create a fully qualified name:
XNamespace ew = "http://www.ericwhite.com"; XElement root = new XElement(ew + "Root"); Console.WriteLine(root);

This snippet is identical in functionality to the following example, which uses an expanded name:
XElement root = new XElement("{http://www.ericwhite.com}Root"); Console.WriteLine(root);

While the second approach is perhaps marginally slower than the first approach, it is far easier to generate code that uses the second approach. If I used the first approach, I would need to setup a dictionary that maps namespace names to XNamespace object names, and then appropriately generate code that uses the correct XNamespace objects. It is a fair amount of housekeeping. So instead of using that approach, the generated code specifies fully qualified names using expanded names.

One note about the LINQ to XML programming interface: When you call the ToString method on an XName object, the returned string is an expanded name. For instance, the following code prints the fully qualified name of an element:
XElement root = XElement.Parse("<Root xmlns='http://www.ericwhite.com'/>"); Console.WriteLine(root.Name);

This outputs the expanded name:
{http://www.ericwhite.com}Root

About the Example

The code in the following example uses extension methods to implement a recursive transform from the XML tree to code that will create the XML tree. The example contains a few sample XML documents that it converts to code. It produces a C# that you can compile and run – the C# code instantiates two XML trees – one using the XElement.Parse method, and another using the C# code that is generated by the example. The example then uses DeepEquals to validate that the two trees are identical.

To run the example:

Create a new C# console application

Copy and paste the following code into Program.cs.

Run the example. This produces a new file, GeneratedTestProgram.cs. You can examine the generated code for each XML tree in the generated program.

Next, we want to validate that the generated code actually generates the XML tree that it should. Create a new C# console application, replace Program.cs in the new program with the generated program, and then run it to validate that the generated code produced the correct XML tree.

This example simulates the use of virtual extension methods, which made the example very easy to write. In the next post, I’ll explain virtual extension methods.
using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Xml.Linq;


public static class LocalExtensions

{

    public static string StringConcatenate(this IEnumerable<string> source)

    {

        StringBuilder sb = new StringBuilder();

        foreach (string item in source)

            sb.Append(item);

        return sb.ToString();

    }

}
public static class LtxToCode

{

    public static XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

    public static XNamespace ew = "http://www.ericwhite.com/xmlcodegeneration";
    private static string Indentation(XObject xObject)

    {

        XAttribute attribute = xObject as XAttribute;

        if (attribute != null)

            return "".PadRight((attribute.Parent.Ancestors().Count() + 1 +

                (attribute.Parent.Document != null ? 1 : 0)) * 2, ' ');

        XElement element = xObject as XElement;

        if (element != null)

            return "".PadRight((element.Ancestors().Count() +

                (element.Document != null ? 1 : 0)) * 2, ' ');

        XDocument document = xObject as XDocument;

        if (document != null)

            return "";

        XProcessingInstruction pi = xObject as XProcessingInstruction;

        if (pi != null)

            return "".PadRight((pi.Ancestors().Count() +

                (pi.Document != null ? 1 : 0)) * 2, ' ');

        XNode node = xObject as XNode;

        if (node != null)

            if (node.Parent != null)

                return "".PadRight((node.Parent.Ancestors().Count() +

                    1 + (node.Document != null ? 1 : 0)) * 2, ' ');

            else

                return "";

        throw new CodeGenerationException("Internal error");

    }
    public static string ToCode(this XObject xObject)

    {

        XAttribute a = xObject as XAttribute;

        if (a != null)

            return a.ToCode();

        XElement element = xObject as XElement;

        if (element != null)

            return element.ToCode();

        XCData cdata = xObject as XCData;

        if (cdata != null)

            return cdata.ToCode();

        XText text = xObject as XText;

        if (text != null)

            return text.ToCode();

        XComment comment = xObject as XComment;

        if (comment != null)

            return comment.ToCode();

        XProcessingInstruction pi = xObject as XProcessingInstruction;

        if (pi != null)

            return pi.ToCode();

        throw new CodeGenerationException("Internal error");

    }
    public static string ToCode(this XDocument document)

    {

        var s = "new XDocument(" + Environment.NewLine +

            (document.Declaration != null ?

                String.Format("  new XDeclaration(\"{0}\", \"{1}\", \"{2}\")," +

                Environment.NewLine,

                document.Declaration.Version, document.Declaration.Encoding,

                document.Declaration.Standalone) :

                "") +

            TrimFinalComma(document

                .Nodes().Select(n => n.ToCode()).StringConcatenate()) +

            ")";

        return s;

    }
    public static string ToCode(this XElement element)

    {

        var c = element

            .Attributes()

            .Cast<XObject>()

            .Concat(element.Nodes().Cast<XObject>());

        if (element.Name == ew + "Literal")

            return element.Value;

        if (c.Count() == 0)

            return Indentation(element) +

                String.Format("new XElement(\"{0}\")," + Environment.NewLine,

                element.Name);

        else

            return Indentation(element) +

                String.Format("new XElement(\"{0}\"," + Environment.NewLine,

                    element.Name) +

                TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) +

                Indentation(element) + ")," + Environment.NewLine;

    }
    public static string ToCode(this XAttribute attribute)

    {

        return Indentation(attribute) +

            String.Format("new XAttribute(\"{0}\", @\"{1}\")," + Environment.NewLine,

            attribute.Name, attribute.Value.Replace("\"", "\"\""));

    }
    public static string ToCode(this XText text)

    {

        return Indentation(text) +

            String.Format("new XText(@\"{0}\")," + Environment.NewLine, text.Value.Replace("\"", "\"\""));

    }
    public static string ToCode(this XComment comment)

    {

        return Indentation(comment) +

            String.Format("new XComment(@\"{0}\")," + Environment.NewLine, comment.Value.Replace("\"", "\"\""));

    }
    public static string ToCode(this XProcessingInstruction pi)

    {

        return Indentation(pi) +

            String.Format("new XProcessingInstruction(@\"{0}\", @\"{1}\")," +

                Environment.NewLine, pi.Target.Replace("\"", "\"\""), pi.Data.Replace("\"", "\"\""));

    }
    public static string ToCode(this XCData cdata)

    {

        return Indentation(cdata) +

            String.Format("new XCData(@\"{0}\")," + Environment.NewLine, cdata.Value.Replace("\"", "\"\""));

    }
    private static string TrimFinalComma(string code)

    {

        if (code.EndsWith("," + Environment.NewLine))

            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) +

                Environment.NewLine;

        return code;

    }
    public static string XElementToCode(XElement element)

    {

        string code = element.ToCode();

        if (code.EndsWith("," + Environment.NewLine))

            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);

        return code;

    }
    public static string XDocumentToCode(XDocument document)

    {

        string code = document.ToCode();

        if (code.EndsWith("," + Environment.NewLine))

            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);

        return code;

    }
    public class CodeGenerationException : Exception

    {

        public CodeGenerationException(string message)

            : base(message)

        {

        }

    }

}
public class GenerateDocumentException : Exception

{

    public GenerateDocumentException(string message)

        : base(message)

    {

    }

}
class Program

{

    static string[] testXml = new[] {

@"<Root a=""1"" b=""2""/>",
@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<Root>

  <Child> abc </Child>

  <Child xmlns:space=""preserve""> abc </Child>

</Root>",
@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<Root><![CDATA[foo]]></Root>",
@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<Root><Child/></Root>",
@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<Root/>",
@"<Root xmlns=""http://www.ericwhite.com/aaaaa"">

  <Child xmlns=""http://www.ericwhite.com/child"">

    <Element att=""1""

             b:att2=""2""

             xmlns:b=""http://www.ericwhite.com/bbbbb"">abc</Element>

  </Child>

</Root>",
@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc</a:Root>",
@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc<!--a comment -->def</a:Root>",
@"<Root>abc</Root>",
@"<Root att1=""1"" att2=""2""/>",
@"<Root/>",
@"<Root att1=""1"">

  <Child>

    <Gc1>abc<b/>def</Gc1>

  </Child>

  <Child>

    <Gc2>abc</Gc2>

  </Child>

</Root>",
@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<?mso-application progid=""Word.Document""?>
<!--This is a comment at the root level.  There are also white space nodes at the root level.-->
<pkg:package xmlns:pkg=""http://schemas.microsoft.com/office/2006/xmlPackage"">

  <pkg:part pkg:name=""/_rels/.rels""

            pkg:contentType=""application/vnd.openxmlformats-package.relationships+xml""

            pkg:padding=""512"">

    <pkg:xmlData>

      <Relationships

          xmlns=""http://schemas.openxmlformats.org/package/2006/relationships"">

        <Relationship Id=""rId3""

                      Type=""http://schemas.openxmlformats.org""

                      Target=""docProps/app.xml""/>

        <Relationship Id=""rId2""

                      Type=""http://schemas.openxmlformats.org""

                      Target=""docProps/core.xml""/>

        <Relationship Id=""rId1""

                      Type=""http://schemas.openxmlformats.org""

                      Target=""word/document.xml""/>

      </Relationships>

    </pkg:xmlData>

  </pkg:part>

</pkg:package>",

};
    static void Main(string[] args)

    {

        string st1 = (@"using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Xml.Linq;

namespace ConsoleApplication1 { class Program { static void Main(string[] args) { "); StringBuilder sb = new StringBuilder(); // test as XElement for (int i = 0; i < testXml.Length; i++) { sb.Append( String.Format("var xElementSourceTree{0} = XElement.Parse(@\"{1}\");", i, testXml[i].Replace("\"", "\"\""))); sb.Append(Environment.NewLine); sb.Append(String.Format("var xElementCodeTree{0} = {1};", i, LtxToCode.XElementToCode(XElement.Parse(testXml[i])))); sb.Append(Environment.NewLine); sb.Append(String.Format( "if (XNode.DeepEquals(xElementSourceTree{0}, xElementCodeTree{0}))", i)); sb.Append(Environment.NewLine); sb.Append(String.Format( " Console.WriteLine(\"XElement Test {0} Passed\");", i)); sb.Append(Environment.NewLine); sb.Append("else"); sb.Append(Environment.NewLine); sb.Append(String.Format( " Console.WriteLine(\"XElement Test {0} Failed\");", i)); sb.Append(Environment.NewLine); sb.Append(Environment.NewLine); } // test as XDocument for (int i = 0; i < testXml.Length; i++) { sb.Append(String.Format( "var xDocumentSourceTree{0} = XDocument.Parse(@\"{1}\");", i, testXml[i].Replace("\"", "\"\""))); sb.Append(Environment.NewLine); sb.Append(String.Format("var xDocumentCodeTree{0} = {1};", i, LtxToCode.XDocumentToCode(XDocument.Parse(testXml[i])))); sb.Append(Environment.NewLine); sb.Append(String.Format( "if (XNode.DeepEquals(xDocumentSourceTree{0}, xDocumentCodeTree{0}))", i)); sb.Append(Environment.NewLine); sb.Append(String.Format( " Console.WriteLine(\"XDocument Test {0} Passed\");", i)); sb.Append(Environment.NewLine); sb.Append("else"); sb.Append(Environment.NewLine); sb.Append(String.Format( " Console.WriteLine(\"XDocument Test {0} Failed\");", i)); sb.Append(Environment.NewLine); sb.Append(Environment.NewLine); } string st2 = @" } } }"; string fullProgram = st1 + sb.ToString() + st2; File.WriteAllText("GeneratedTestProgram.cs", fullProgram); } }

Permalink Comments (6)

The Generated Program Structure

Before discussing the code in the various content controls, I want to discuss the structure of the generated program. There will be a class named Generator, which will be generated in the code generation process. This class will have a couple of properties or fields, and will have one method (beyond the constructor). There will one instance of this class for each generated document.

The code in the Value, Table, and Conditional content controls executes in the context of an instance of the Generator class. Therefore, the code can access instance fields and properties.

New Version of the Template Document

First, I’ll examine the new version of the template document. The titles of the content controls are the same, but the code inside is different from the last post. As before, there are the Value content controls that contain values to be inserted into the document. I agree with feedback that Svetlin provided – the value derived from this content control will use the formatting of the underlying run or paragraph. No need to specify a style.

The following code accesses an instance property of the Generator class. The instance property is Cust.

The Table content control looks about the same as in the last post, except that it is now written to access fields in the Generator object:

Same thing with Conditional – it can access the Cust instance property.

The design of the Ask content control is the same as in the last iteration of this template document:

There are four new content controls in this template that enable the template designer to write the necessary code so that the document generation process can generate a C# program that is complete and can compile:

Using
Classes
GeneratorMembers
GeneratorMain

The Using content control contains the using statements for the generated program. The generated program may be so simple that it only uses LINQ to XML, as in the example I’m working up. However, it could be more interesting – it might access an OData feed. It might connect to any arbitrary database or web service. It can connect to any data source that you can get to with .NET. Therefore, we need the ability to specify the using statements for the generated program:

The Classes content control enables the template designer to define some classes that contain the data that the program reads from the data source. In this first example, there is a Customer class and an Order class:

The GeneratorMembers content control enables the template designer to specify members of the Generator class. This field contains the customer that a particular Generator object is generating a document for.

Last, but not least, there will be the GeneratorMain content control, which contains the code that will go in the Main method in the generated program:

This GeneratorMain is expecting to read an XML document that looks like this:

And the generated program, after everything is said and done, will look something like this:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml.Linq;


namespace DocumentGenerator

{

    class Customer

    {

        public int CustomerID;

        public string Name;

        public bool IsHighValueCustomer;

    }
    class Order

    {

        public int CustomerID;

        public string ProductDescription;

        public decimal Quantity;

        public DateTime OrderDate;

    }
    class Generator

    {

        Customer Cust;
        void GenerateDocument()

        {

            // This method will be automatically generated during the generation process.

            // There will be a lot of code in this method.

            Console.WriteLine("Generating document for customer {0}", Cust.CustomerID);

        }

static void Main(string[] args) { XElement data = XElement.Load("Data.xml"); var customers = data .Elements("Customers") .Elements("Customer") .Select(c => new Customer() { CustomerID = (int)c.Element("CustomerID"), Name = (string)c.Element("Name"), IsHighValueCustomer = (bool)c.Element("IsHighValueCustomer"), }); var orders = data .Elements("Orders") .Elements("Order") .Select(o => new Order() { CustomerID = (int)o.Element("CustomerID"), ProductDescription = (string)o.Element("ProductDescription"), Quantity = (decimal)o.Element("Quantity"), OrderDate = (DateTime)o.Element("OrderDate"), }); Generator p = new Generator(); foreach (var customer in customers) { p.Cust = customer; p.GenerateDocument(); } } } }
In the next blog post, I’m going to discuss generating C# code from an XML tree. This code needs to use functional construction, so that the code generation process can insert queries at all appropriate points in the generated code. That is going to be fun code to write!!!

Smile

Permalink Comments (5)

The Second Iteration of the Template Document

January 28, 2011 at 2:48 pm · Filed under Document Generation Series, Open XML, WordprocessingML

After great feedback by Svetlin, also after some more contemplation about tables, this post presents the second iteration on a template document to be used for a document generation process.

This post is the third in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

One additional goal that I have for these document templates is that if necessary, the template designer can specify formatting for a field or for a cell in a table. To facilitate this, I’m going add the capability to specify the style in a separate nested content control.

In the following template, there are five content controls. The first is a value with a style. The second is a value that uses the style of the containing paragraph. The third generates a table from the query. The table is formatted with the table style of the sample table. The fourth shows conditional content. The last specifies that the user should be asked a question, the answer to which must be shorter than 256 characters.

I am certain that the design for this document template will be refined over the next couple of weeks.

Permalink Comments (5)

Using a WordprocessingML Document as a Template in the Document Generation Process

January 26, 2011 at 3:05 pm · Filed under Document Generation Series, Open XML, WordprocessingML

In this post, I examine the approaches for building a template document for the document generation process.

This post is the second in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

In my approach to document generation, a template document is a DOCX document that contains content controls that will control the document generation process. The document template designer can format this document as desired, and the document generation process will generate documents that have the format of the template document.

When working with content controls, first of all, remember that you need to turn on the developer tab in the ribbon. Click File => Options => Customize Ribbon, and then turn on the developer tab:

Turning on the Developer Tab

Another point that will make it easier to work with content controls is to turn on design mode. If design mode is turned off (which is the default), content controls have a square boxed appearance with a tab at the top that contains the title of the content control:

Content Control - not in Design Mode

This is not a problem, except that if the focus is not in a content control, there is no visual indication that the content control is there. Instead, turn on design mode:

Turning on design mode

With design mode turned on, content controls have blue tags that indicate the beginning and end of the location of a content control. With design mode turned on, a template document will look something like the following:

Sample template document with content controls

In this document, plain text content controls contain a LINQ query that returns a single value. Formatting is easy – the value returned by the query takes on the formatting of the containing run or paragraph.

In this document, the rich text content control with Table as its title contains a LINQ query that returns a collection of anonymous types. The results of the query will be inserted into the document as a WordprocesssingML table. The inserted table will have the formatting of the empty table that is inserted into the rich text content control.

Other uses of the word ‘Template’ in Microsoft Office

One minor issue around the idea of creating a template WordprocessingML document is that the term ‘template’ is overloaded. Microsoft Word has the notion of ‘Document Templates’, which are saved with the dotx extension. These are WordprocessingML documents with one special characteristic – when the user opens one of these documents, the user cannot directly save back to the dotx file – the user must instead supply a new filename, and Word will append docx as the extension.

In addition, related to dotx document templates are ‘document template projects’ in Visual Studio 2010 (and 2008). These are template-based document-level projects (see Architecture of Document-Level Customizations) that consist of managed code that is attached to a document template instead of a document. The user opens the template, uses the managed customization to do whatever it does, and then saves as a docx document. The docx document can have a managed customization, or it can be stripped of the customization, leaving a plain old docx.

For this document generation project, we don’t need to use either of these facilities. Instead, the template document that the designer creates is, as far as Word is concerned, an ordinary word-processing document.

Permalink Comments (4)

Generating Open XML WordprocessingML Documents

January 24, 2011 at 3:39 pm · Filed under Document Generation Series, Open XML, WordprocessingML

Generating word-processing documents is perhaps the single most compelling use of Open XML. The archetypical case is an insurance company or bank that needs to generate 10’s of thousands of documents per month and archive them and make them available online, send them electronically, or print them and send via post. But there are about a million variations on this theme. In this blog series, I am going to examine the various approaches for document generation. I’m going to present code that demonstrates the various approaches.

This post is the first in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

I have some goals for the code that I’ll be publishing:

First and foremost, I want the document generation process to be data-driven from content controls that you configure in a template document.
The approach that I want to take is that the template designer creates a document, inserts content controls with specific tags, and then inserts specific instructions into each content control.
The data that we will supply to the document generation process will be a data-centric XML document. I’ll place a few constraints on this document. Some time ago, I wrote about Document-Centric Transforms using LINQ to XML. That post discusses data-centric vs. document-centric XML documents. When generating documents from another data source, such as a SQL database or an internal or secure Web service, the task will be to generate a data centric XML document from that source, and then kick off the document generation process.
This code should be short and sweet. I don’t want to create some monolithic code base that would require a design process, formalized coding and testing procedures, and the like. The question is: how simple and how powerful can such a system be made? I’m hoping to stay under a 1000 lines of code. But we have some powerful tools at our disposal, most importantly using LINQ to XML in a functional style. Also, I probably will code a few recursive functional transforms.

I am contemplating four approaches for the instructions that the template designer will place in the content controls. The content controls could contain:

Parameterized XPath expressions: This approach might be the easiest for the template designer to configure.
XSLT sequence constructors: This approach possible might be the easiest to code. It might be very, very short if you exclude existing code such as transforming OPC back and forth to Flat OPC, OpenXmlCodeTester, and the axes I detailed in Mastering Text in Open XML WordprocessingML Documents. I am contemplating using XSLT 2.0.
.NET code (either VB or C#): This approach reminds me of code that I presented in OpenXmlCodeTester: Validating Code in Open XML Documents. It might be cool to put a LINQ expression in a content control that projects a collection of rows and columns that become an table in the word-processing document. There could be some cool and easy ways to supply formatting.
Some XML dialect that I invent as I go along.

I’m not sure which approach I’ll take. I want to play around with all four approaches, and see which one is easiest to use, and which one is easiest to develop. As I start playing around with these (and posting the code as I go along), I’ll make some design decisions, and list my reasons for the decisions.

By the way, I really love to have discussions about these things. If you agree or disagree with any of my design decisions, feel free to chime in. You can register so we can have more of a discussion, or post anonymously, as you like.

In the next post, I’m going to examine template documents, and define exactly what I mean by a template document.

Permalink Comments (16)

« Previous Page « Previous Page Next entries »

Eric White's Blog

Archive for Document Generation Series

Refinement: Generating C# code from an XML Tree using Virtual Extension Methods

Simulating Virtual Extension Methods

One Final Note

Generating C# code from an XML Tree using Virtual Extension Methods

Expanded XML Names

About the Example

More Enhancements to the Document Template

The Generated Program Structure

New Version of the Template Document

The Second Iteration of the Template Document

Using a WordprocessingML Document as a Template in the Document Generation Process

Other uses of the word ‘Template’ in Microsoft Office

Generating Open XML WordprocessingML Documents

Forums

Developer Content

User

Blog TOC

Archives

Categories

Search