Using XML DOM to Detect Tracked Revisions in an Open XML WordprocessingML Document

Tracked revisions are one of the more involved features of Open XML WordprocessingML. There are 28 elements associated with tracked revisions, each with their own semantics. In some cases, such as with content controls and deleted paragraph marks, the semantics for tracked revisions are (of necessity) very involved.

Some time ago, I wrote an article, Accepting Revisions in Open XML Word-Processing Documents, which details the exact semantics for each of the elements that comprise revision tracking.

Many scenarios do not require that you process tracked revisions. For example, you may have an internal publishing process where articles with tracked revisions cannot be submitted for publishing. For a variety of business reasons, you do not want to expose tracked revisions. They may reflect internal debates that are not appropriate for disclosure outside of your company. Your internal policies may consider that the editing process is not completed until all tracked revisions are accepted or rejected. In these scenarios, if you are writing code to process those documents, you do not need to complicate your code by processing those many elements and attributes. However, if your code does not process tracked revision markup, you do not want to blindly process documents that may contain tracked revisions. Your code may fail spectacularly. Instead, it is appropriate to include code to validate that a document to be processed does not contain tracked revisions. Some time ago, I wrote a short article, Identifying Open XML Word-Processing Documents with Tracked Revisions, which shows how to use LINQ to XML or the strongly-typed object model of the Open XML SDK V2 to detect whether a document contains tracked revisions. However, many developers do not have the option of using LINQ to process XML, and instead must use one of a variety of implementations of XML DOM, such as System.Xml.XmlDocument in the .NET framework, or an implementation of XML DOM for php. This post presents a bit of XmlDocument code to detect tracked revisions. The important parts are those that show which Open XML parts to process, and the XPath expression to detect tracked revision markup. Because the semantics of XPath and XML DOM Document are carefully defined, it is pretty easy to translate this code to another language and implementation of XML DOM Document.

using System;
using System.IO;
using System.Xml;
using DocumentFormat.OpenXml.Packaging;

class Process
{
    public static XmlDocument GetXmlDocument(OpenXmlPart part)
    {
        XmlDocument xmlDoc = new XmlDocument();
        using (Stream partStream = part.GetStream())
        using (XmlReader partXmlReader = XmlReader.Create(partStream))
            xmlDoc.Load(partXmlReader);
        part.AddAnnotation(xmlDoc);
        return xmlDoc;
    }

    public static bool PartHasTrackedRevisions(OpenXmlPart part)
    {
        XmlDocument doc = GetXmlDocument(part);
        string wordNamespace = “http://schemas.openxmlformats.org/wordprocessingml/2006/main”;
        XmlNamespaceManager nsmgr =
            new XmlNamespaceManager(doc.NameTable);
        nsmgr.AddNamespace(“w”, wordNamespace);
        string xpathExpression =
            “descendant::w:cellDel|” +
            “descendant::w:cellIns|” +
            “descendant::w:cellMerge|” +
            “descendant::w:customXmlDelRangeEnd|” +
            “descendant::w:customXmlDelRangeStart|” +
            “descendant::w:customXmlInsRangeEnd|” +
            “descendant::w:customXmlInsRangeStart|” +
            “descendant::w:del|” +
            “descendant::w:delInstrText|” +
            “descendant::w:delText|” +
            “descendant::w:ins|” +
            “descendant::w:moveFrom|” +
            “descendant::w:moveFromRangeEnd|” +
            “descendant::w:moveFromRangeStart|” +
            “descendant::w:moveTo|” +
            “descendant::w:moveToRangeEnd|” +
            “descendant::w:moveToRangeStart|” +
            “descendant::w:moveTo|” +
            “descendant::w:numberingChange|” +
            “descendant::w:rPrChange|” +
            “descendant::w:pPrChange|” +
            “descendant::w:rPrChange|” +
            “descendant::w:sectPrChange|” +
            “descendant::w:tcPrChange|” +
            “descendant::w:tblGridChange|” +
            “descendant::w:tblPrChange|” +
            “descendant::w:tblPrExChange|” +
            “descendant::w:trPrChange”;
        XmlNodeList descendants = doc.SelectNodes(xpathExpression, nsmgr);
        return descendants.Count > 0;
    }

    public static bool HasTrackedRevisions(WordprocessingDocument doc)
    {
        if (PartHasTrackedRevisions(doc.MainDocumentPart))
            return true;
        foreach (var part in doc.MainDocumentPart.HeaderParts)
            if (PartHasTrackedRevisions(part))
                return true;
        foreach (var part in doc.MainDocumentPart.FooterParts)
            if (PartHasTrackedRevisions(part))
                return true;
        if (doc.MainDocumentPart.EndnotesPart != null)
            if (PartHasTrackedRevisions(doc.MainDocumentPart.EndnotesPart))
                return true;
        if (doc.MainDocumentPart.FootnotesPart != null)
            if (PartHasTrackedRevisions(doc.MainDocumentPart.FootnotesPart))
                return true;
        return false;
    }

    static void Main(string[] args)
    {
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open(“Test1.docx”, false))
        {
            Console.WriteLine(“Test1 {0} tracked revisions”, HasTrackedRevisions(doc) ?
                “has” :
                “does not have”);
        }
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open(“Test2.docx”, false))
        {
            Console.WriteLine(“Test2 {0} tracked revisions”, HasTrackedRevisions(doc) ?
                “has” :
                “does not have”);
        }
    }
}