Archive for Functional Programming

Introducing a new class for PowerTools for Open XML: TextReplacer

Recently I wrote some code that implemented search-and-replace for Open XML WordprocessingML documents.  I wrote that code for an Open XML developer who needed to implement that functionality using XML DOM, although with a different language than C#.  Because XML DOM is standardized, translating the code to another language and another implementation of XML DOM is relatively straightforward.

I want to introduce search-and-replace functionality in a CMDLET in PowerTools for Open XML, but I have been moving PowerTools code away from XmlDocument, so I rewrote the search-and-replace code using LINQ to XML, using a functional transform.  It was an interesting and fun project.  The video below introduces the TextReplacer class, and compares it to the code that I presented that uses XmlDocument.  It is an interesting comparison of imperative code (using XmlDocument) and functional code (using LINQ to XML).

You can download the TextReplacer class from this blog post (in an attachment at the bottom).

Introduces TextReplacer, which is LINQ to XML code that replaces text in WordprocessingML documents.

Comments (8)

Simulating Virtual Extension Methods

When considering the problem of how to generate code that will create some arbitrary XML tree, it is interesting to examine the LINQ to XML class hierarchy, which uses polymorphism. The XObject class is an abstract base class of both the XAttribute and XNode classes. The XNode class is an abstract base class of XContainer, XComment, XDocumentType, XProcessingInstruction, and XText. XContainer is the base class for XElement and XDocument.

This post is the sixth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

The following diagram shows the class hierarchy of the LINQ to XML classes that are important when considering serializing a LINQ to XML tree as code. There are other classes in LINQ to XML, but they don’t impact this discussion.

Image1

Software systems such as LINQ to XML are typically written using a recursive approach. The XNode class has an abstract virtual method WriteTo. Each derived class must override the WriteTo method and provide its own method for serializing. When serializing an XElement object by calling WriteTo, internally, LINQ to XML calls the WriteTo method for each of the child nodes of the XElement object. This is the approach that we want to use when generating code to create an arbitrary XML tree.

What would be ideal is to write ‘Virtual Extension Methods’. There would be an abstract virtual extension method for XObject named ToCode. This extension method would be overridden in each of the concrete classes. Classes that include child objects, such as XElement and XDocument can then iterate through their children and use the virtual extension method to convert each node in the XML tree to the appropriate C# code to create the node.

However, C# does not include support for the idea of a virtual extension method. When you call an extension method, it is bound at compile time to an extension method based on the type of the variable, not based on the type of the object in the variable. To show exactly what I mean, consider the following snippet:

public static class MyExtensions
{
    public static string ToCode(this XObject o)
    {
        return "Called extension method on XObject";
    }

    public static string ToCode(this XElement e)
    {
        return "Called extension method on XElement";
    }
}

class Program
{
    static void Main(string[] args)
    {
        XElement e = new XElement("Root");
        Console.WriteLine(e.ToCode());
        XObject o = e;
        Console.WriteLine(o.ToCode());
    }
}

The above snippet calls the ToCode extension method twice for the same object. However, even though the type of the object is XElement, when the object is assigned to a variable with type XObject (which is valid because XElement is derived from XNode, which derives from XObject), and the program calls the ToCode extension method, the extension method on XElement is not called. Instead, the extension method on XObject is called.

Because the LINQ to XML programming interface is fairly simple, we can simulate virtual extension methods by implementing an extension method on XObject that simply dispatches to the appropriate extension method based on the actual type of the object. The following listing shows the implementation of the ToCode extension method for XObject.

public static string ToCode(this XObject xObject)
{
    XAttribute a = xObject as XAttribute;
    if (a != null)
        return a.ToCode();
    XElement element = xObject as XElement;
    if (element != null)
        return element.ToCode();
    XCData cdata = xObject as XCData;
    if (cdata != null)
        return cdata.ToCode();
    XText text = xObject as XText;
    if (text != null)
        return text.ToCode();
    XComment comment = xObject as XComment;
    if (comment != null)
        return comment.ToCode();
    XProcessingInstruction pi = xObject as XProcessingInstruction;
    if (pi != null)
        return pi.ToCode();
    throw new CodeGenerationException("Internal error");
}

Next, we can examine the ToCode method that is implemented on XAttribute. The code calls an Indentation method that determines the number of spaces to precede the ‘new XAttribute’. This enables the code to be properly indented, so that it is easy to examine the generated code. Other than this, it is pretty straightforward code to generate the code to new up an XAttribute object.

public static string ToCode(this XAttribute attribute)
{
    return Indentation(attribute) +
        String.Format("new XAttribute(\"{0}\", \"{1}\")," + Environment.NewLine,
        attribute.Name, attribute.Value);
}

As initially written, the generated code to create an XAttribute or XElement object includes appending the comma before the new line.  One minor issue that must be solved is that when passing a number of parameters to a function that takes a params array, a comma immediately before the closing parenthesis is invalid.

Image2

The approach I took to solve this problem is that after assembling all of the code to create the child nodes of an XElement object, the code calls a method to trim off the final comma.

public static string ToCode(this XElement element)
{
    var c = element.Attributes().Cast<XObject>().Concat(element.Nodes().Cast<XObject>());
    if (c.Count() == 0)
        return Indentation(element) +
            String.Format("new XElement(\"{0}\")," + Environment.NewLine, element.Name);
    else
        return Indentation(element) +
            String.Format("new XElement(\"{0}\"," + Environment.NewLine, element.Name) +
            TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) +
            Indentation(element) + ")," + Environment.NewLine;
}

TrimFinalComma looks like this:

private static string TrimFinalComma(string code)
{
    if (code.EndsWith("," + Environment.NewLine))
        return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) +
            Environment.NewLine;
    return code;
}

You can see the rest of the ToCode extension methods in Generating C# code from an XML Tree using Virtual Extension Methods.

One Final Note

Some people will recognize the similarity in functionality between the Paste XML as LINQ sample and the code presented in this and the last post.  The code presented in the Paste XML as LINQ sample is imperative code that iterates through the nodes outputting code.  I need a completely different structure for my code.  By coding as a recursive functional transform, I can easily alter the transform as appropriate for special purpose content controls that contain C#.

Comments