Generating C# code from an XML Tree using Virtual Extension Methods

One integral part of my scheme for building a document generation system is to write some code that generates C# code to create an arbitrary XML tree. I want to transform the markup for the main document part into C# code that will produce that main document part, with the exception that at various points where I find content controls, I want to alter the transformation as appropriate. The first task is to write code that produces C# code that will create any arbitrary XML tree.

This post is the fifth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

To demonstrate what I mean by code that generates code, here is a small snippet that parses a string, creates an XML tree, and then prints the code that the XElementToCode method produces:

XElement e = XElement.Parse(
@"<Root xmlns='http://www.ericwhite.com'>
  <Child>This is a text node.</Child>
  <!--Here is a comment.-->
</Root>");
Console.WriteLine("var z = {0};", LtxToCode.XElementToCode(e));

This produces the following automatically written code:

var z = new XElement("{http://www.ericwhite.com}Root",
  new XAttribute("xmlns", @"http://www.ericwhite.com"),
  new XElement("{http://www.ericwhite.com}Child",
    new XText(@"This is a text node.")
  ),
  new XComment(@"Here is a comment.")
);

The code that I present in this post uses expanded XML names, which deserve a bit of an explanation.

Expanded XML Names

In LINQ to XML, an expanded name is an approach that enables specification of a namespace and local name in a single string. The gist of it (which you can see in the example above) is that the namespace is enclosed in curly braces, followed by the local name.

The normal idiom when working with names and namespaces in LINQ to XML is to declare and initialize an XNamespace object, and then use the overload of the ‘+’ operator to combine the namespace with a local name to create a fully qualified name:

XNamespace ew = "http://www.ericwhite.com";
XElement root = new XElement(ew + "Root");
Console.WriteLine(root);

This snippet is identical in functionality to the following example, which uses an expanded name:

XElement root = new XElement("{http://www.ericwhite.com}Root");
Console.WriteLine(root);

While the second approach is perhaps marginally slower than the first approach, it is far easier to generate code that uses the second approach. If I used the first approach, I would need to setup a dictionary that maps namespace names to XNamespace object names, and then appropriately generate code that uses the correct XNamespace objects. It is a fair amount of housekeeping. So instead of using that approach, the generated code specifies fully qualified names using expanded names.

One note about the LINQ to XML programming interface: When you call the ToString method on an XName object, the returned string is an expanded name. For instance, the following code prints the fully qualified name of an element:

XElement root = XElement.Parse("<Root xmlns='http://www.ericwhite.com'/>");
Console.WriteLine(root.Name);

This outputs the expanded name:

{http://www.ericwhite.com}Root

About the Example

The code in the following example uses extension methods to implement a recursive transform from the XML tree to code that will create the XML tree.  The example contains a few sample XML documents that it converts to code.  It produces a C# that you can compile and run – the C# code instantiates two XML trees – one using the XElement.Parse method, and another using the C# code that is generated by the example.  The example then uses DeepEquals to validate that the two trees are identical.

To run the example:

Create a new C# console application

Copy and paste the following code into Program.cs.

Run the example.  This produces a new file, GeneratedTestProgram.cs.  You can examine the generated code for each XML tree in the generated program.

Next, we want to validate that the generated code actually generates the XML tree that it should.  Create a new C# console application, replace Program.cs in the new program with the generated program, and then run it to validate that the generated code produced the correct XML tree.

This example simulates the use of virtual extension methods, which made the example very easy to write.  In the next post, I’ll explain virtual extension methods.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml.Linq;

public static class LocalExtensions
{
    public static string StringConcatenate(this IEnumerable<string> source)
    {
        StringBuilder sb = new StringBuilder();
        foreach (string item in source)
            sb.Append(item);
        return sb.ToString();
    }
}

public static class LtxToCode
{
    public static XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
    public static XNamespace ew = "http://www.ericwhite.com/xmlcodegeneration";

    private static string Indentation(XObject xObject)
    {
        XAttribute attribute = xObject as XAttribute;
        if (attribute != null)
            return "".PadRight((attribute.Parent.Ancestors().Count() + 1 +
                (attribute.Parent.Document != null ? 1 : 0)) * 2, ' ');
        XElement element = xObject as XElement;
        if (element != null)
            return "".PadRight((element.Ancestors().Count() +
                (element.Document != null ? 1 : 0)) * 2, ' ');
        XDocument document = xObject as XDocument;
        if (document != null)
            return "";
        XProcessingInstruction pi = xObject as XProcessingInstruction;
        if (pi != null)
            return "".PadRight((pi.Ancestors().Count() +
                (pi.Document != null ? 1 : 0)) * 2, ' ');
        XNode node = xObject as XNode;
        if (node != null)
            if (node.Parent != null)
                return "".PadRight((node.Parent.Ancestors().Count() +
                    1 + (node.Document != null ? 1 : 0)) * 2, ' ');
            else
                return "";
        throw new CodeGenerationException("Internal error");
    }

    public static string ToCode(this XObject xObject)
    {
        XAttribute a = xObject as XAttribute;
        if (a != null)
            return a.ToCode();
        XElement element = xObject as XElement;
        if (element != null)
            return element.ToCode();
        XCData cdata = xObject as XCData;
        if (cdata != null)
            return cdata.ToCode();
        XText text = xObject as XText;
        if (text != null)
            return text.ToCode();
        XComment comment = xObject as XComment;
        if (comment != null)
            return comment.ToCode();
        XProcessingInstruction pi = xObject as XProcessingInstruction;
        if (pi != null)
            return pi.ToCode();
        throw new CodeGenerationException("Internal error");
    }

    public static string ToCode(this XDocument document)
    {
        var s = "new XDocument(" + Environment.NewLine +
            (document.Declaration != null ?
                String.Format("  new XDeclaration(\"{0}\", \"{1}\", \"{2}\")," +
                Environment.NewLine,
                document.Declaration.Version, document.Declaration.Encoding,
                document.Declaration.Standalone) :
                "") +
            TrimFinalComma(document
                .Nodes().Select(n => n.ToCode()).StringConcatenate()) +
            ")";
        return s;
    }

    public static string ToCode(this XElement element)
    {
        var c = element
            .Attributes()
            .Cast<XObject>()
            .Concat(element.Nodes().Cast<XObject>());
        if (element.Name == ew + "Literal")
            return element.Value;
        if (c.Count() == 0)
            return Indentation(element) +
                String.Format("new XElement(\"{0}\")," + Environment.NewLine,
                element.Name);
        else
            return Indentation(element) +
                String.Format("new XElement(\"{0}\"," + Environment.NewLine,
                    element.Name) +
                TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) +
                Indentation(element) + ")," + Environment.NewLine;
    }

    public static string ToCode(this XAttribute attribute)
    {
        return Indentation(attribute) +
            String.Format("new XAttribute(\"{0}\", @\"{1}\")," + Environment.NewLine,
            attribute.Name, attribute.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XText text)
    {
        return Indentation(text) +
            String.Format("new XText(@\"{0}\")," + Environment.NewLine, text.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XComment comment)
    {
        return Indentation(comment) +
            String.Format("new XComment(@\"{0}\")," + Environment.NewLine, comment.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XProcessingInstruction pi)
    {
        return Indentation(pi) +
            String.Format("new XProcessingInstruction(@\"{0}\", @\"{1}\")," +
                Environment.NewLine, pi.Target.Replace("\"", "\"\""), pi.Data.Replace("\"", "\"\""));
    }

    public static string ToCode(this XCData cdata)
    {
        return Indentation(cdata) +
            String.Format("new XCData(@\"{0}\")," + Environment.NewLine, cdata.Value.Replace("\"", "\"\""));
    }

    private static string TrimFinalComma(string code)
    {
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) +
                Environment.NewLine;
        return code;
    }

    public static string XElementToCode(XElement element)
    {
        string code = element.ToCode();
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);
        return code;
    }

    public static string XDocumentToCode(XDocument document)
    {
        string code = document.ToCode();
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);
        return code;
    }

    public class CodeGenerationException : Exception
    {
        public CodeGenerationException(string message)
            : base(message)
        {
        }
    }
}

public class GenerateDocumentException : Exception
{
    public GenerateDocumentException(string message)
        : base(message)
    {
    }
}

class Program
{
    static string[] testXml = new[] {
@"<Root a=""1"" b=""2""/>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root>
  <Child> abc </Child>
  <Child xmlns:space=""preserve""> abc </Child>
</Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root><![CDATA[foo]]></Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root><Child/></Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root/>",

@"<Root xmlns=""http://www.ericwhite.com/aaaaa"">
  <Child xmlns=""http://www.ericwhite.com/child"">
    <Element att=""1""
             b:att2=""2""
             xmlns:b=""http://www.ericwhite.com/bbbbb"">abc</Element>
  </Child>
</Root>",

@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc</a:Root>",

@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc<!--a comment -->def</a:Root>",

@"<Root>abc</Root>",

@"<Root att1=""1"" att2=""2""/>",

@"<Root/>",

@"<Root att1=""1"">
  <Child>
    <Gc1>abc<b/>def</Gc1>
  </Child>
  <Child>
    <Gc2>abc</Gc2>
  </Child>
</Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<?mso-application progid=""Word.Document""?>

<!--This is a comment at the root level.  There are also white space nodes at the root level.-->

<pkg:package xmlns:pkg=""http://schemas.microsoft.com/office/2006/xmlPackage"">
  <pkg:part pkg:name=""/_rels/.rels""
            pkg:contentType=""application/vnd.openxmlformats-package.relationships+xml""
            pkg:padding=""512"">
    <pkg:xmlData>
      <Relationships
          xmlns=""http://schemas.openxmlformats.org/package/2006/relationships"">
        <Relationship Id=""rId3""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""docProps/app.xml""/>
        <Relationship Id=""rId2""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""docProps/core.xml""/>
        <Relationship Id=""rId1""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""word/document.xml""/>
      </Relationships>
    </pkg:xmlData>
  </pkg:part>
</pkg:package>",
};

    static void Main(string[] args)
    {
        string st1 = (@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
");
        StringBuilder sb = new StringBuilder();
        // test as XElement
        for (int i = 0; i < testXml.Length; i++)
        {
            sb.Append(
                String.Format("var xElementSourceTree{0} = XElement.Parse(@\"{1}\");",
                i, testXml[i].Replace("\"", "\"\"")));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format("var xElementCodeTree{0} = {1};",
                i, LtxToCode.XElementToCode(XElement.Parse(testXml[i]))));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "if (XNode.DeepEquals(xElementSourceTree{0}, xElementCodeTree{0}))", i));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XElement Test {0} Passed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append("else");
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XElement Test {0} Failed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append(Environment.NewLine);
        }
        // test as XDocument
        for (int i = 0; i < testXml.Length; i++)
        {
            sb.Append(String.Format(
                "var xDocumentSourceTree{0} = XDocument.Parse(@\"{1}\");",
                i, testXml[i].Replace("\"", "\"\"")));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format("var xDocumentCodeTree{0} = {1};",
                i, LtxToCode.XDocumentToCode(XDocument.Parse(testXml[i]))));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
              "if (XNode.DeepEquals(xDocumentSourceTree{0}, xDocumentCodeTree{0}))", i));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XDocument Test {0} Passed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append("else");
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XDocument Test {0} Failed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append(Environment.NewLine);
        }
        string st2 = @"        }
    }
}";
        string fullProgram = st1 + sb.ToString() + st2;
        File.WriteAllText("GeneratedTestProgram.cs", fullProgram);
    }
}

!!!

6 Comments »

  1. Otaku said,

    February 9, 2011 @ 5:45 pm

    I’m probably going to catch a lot of grief over this comment, but why not use VB instead? XML Literals are significantly easier to use, especially when it comes to intellisense (compared to the “strings of XML” above), construction and namespaces.

    But secondly, can CodeDOM do the above? I haven’t had much luck with CodeDOM doing anything with new .NET features like Lamdas, but it seems like it would be what should be used for dynamic code creation.

  2. Eric White said,

    February 9, 2011 @ 9:33 pm

    Hey Otaku, glad to see you here… Regarding VB literals, they are certainly interesting, but they don’t simplify the code generation problem. You still have the problem of namespaces (particularly default namespaces). Generating the ‘expression holes’ in VB XML literals would be non-trivial. Actually, I think that generating C# code to generate the XML tree using expanded XML names is by far the easiest approach.

    Regarding using CodeDOM – my problem is that I am going to take C# code that is in a content control and inject that C# code into the generated code. Using CodeDOM would mean that I would have to compile the code entered into content controls into expression trees. This is a non-trivial project.

    My goal is to create an example that is < 300 lines of code that generates a program that can generate hundreds of documents from any data source that you can get to from .NET. Bringing CodeDOM into the picture significantly complicates this. This first version of a doc gen system that uses C# code in content controls is really serving the purpose of nailing down exactly what a doc generation template document should look like. Entering C# code into content controls will be very much a developer activity, not an end-user activity. After refining this version of a doc generation system, I plan on putting together a new version where the template designer enters XML into content controls. That will be somewhat easier to use. But the final version will be a managed add-in that uses a task pane to enable a power user (not necessarily a developer) to create the document template. I think (hope) that this all will make more sense as we go along. -Eric

  3. Otaku said,

    February 10, 2011 @ 2:28 am

    Point well-taken Eric. You never cease to amaze me with your pragmatic approach towards things. Maybe that’s why I take 10x the amount of time to code anything – taking too much time to search for the most managable situation when in fact its hiding in plain site.

  4. Eric White said,

    February 13, 2011 @ 1:39 am

    Thanks for the kind words, Otaku – Time will tell whether some variation on this approach to document generation will be useful. I’m just sitting down now to write the code to pull this all together. We’ll see how it goes!

  5. Dave Black said,

    September 30, 2011 @ 5:43 pm

    Hi Eric,

    I noticed you’re using ‘DeepEquals’ here in this code. Are you using the customized version that you wrote or the errant “built-in” version in .NET?

    BTW, I still have to send you the DeepEquals implementation of yours that I extended….

  6. Eric White said,

    October 1, 2011 @ 4:20 am

    Hi Dave,

    In this case, the built-in DeepEquals is sufficient, as the intent of the code is to determine if the *exact* same tree was created by the code as the source XML tree.

    Please remind me again, what additional functionality do you have in your deep equals? BTW, if you want to, feel free to write me directly at eric at ericwhite.com.

    -Eric

RSS feed for comments on this post · TrackBack URI

Leave a Comment