Review of XPath Semantics of LINQ to XML

In this post, I review the XPath semantics of LINQ to XML, and show some concrete examples of how I use those semantics in the XPath-in-Content-Controls approach to Open XML WordprocessingML document generation.

This post is the 15th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

In the post Generating Open XML WordprocessingML Documents using XPath in Content Controls I show an example XML document that I use to drive the document generation process. To run the examples in this post, copy and save that XML document as Data.xml in the bin/debug directory, so that the example code can load that XML document.

To use the XPath extensions of LINQ to XML, in addition to the using directive for System.Xml.Linq, you need a using directive for System.Xml.XPath. This brings a few extension methods into scope. The following example demonstrates the XPathSelectElements extension method (an extension method on the XElement class).


using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;

class Program
{
    static void Main(string[] args)
    {
        XElement data = XElement.Load("Data.xml");
        IEnumerable<XElement> customers = data.XPathSelectElements("./Customer");
        foreach (var customer in customers)
            Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value,
                customer.Element("Name").Value);
    }
}

When you run this example with the data file shown in Generating Open XML WordprocessingML Documents using XPath in Content Controls, it outputs:


1:Andrew
2:Bob
3:Celcin

If you were to write this snippet using only LINQ to XML (not using the XPath extensions), it would look like this:


XElement data = XElement.Load("Data.xml");
IEnumerable<XElement> customers = data.Elements("Customer");
foreach (var customer in customers)
    Console.WriteLine("{0}:{1}", customer.Element("CustomerID").Value,
        customer.Element("Name").Value);

Performance of the XPath Extension Methods

With the small example that I show in the video of the performance of the XPath-in-Content-Controls approach, we are still IO bound! A couple of years ago, I did some rudimentary performance analysis of the XPath extension methods, and props to Ion Vasilian, the LINQ to XML developer who wrote those extension methods. They perform really well. If I recall correctly, on the particular tests that I selected, they were only about 30% slower than using the LINQ to XML axis methods (which are amazingly fast).

As an aside, Microsoft employees are often restricted from discussing actual performance numbers for various APIs. There are lots of good reasons for this – unless you do extensive analysis, making sure that you are covering the actual use cases that customers will encounter, you might make claims that do not hold true in real-world situations, raising liability issues, etc., etc. There are cases where Microsoft employees do discuss actual performance characteristics using specific numbers, but you can be sure that there were lots of meetings where architects, program managers, developers, and test developers discussed all possible ramifications ad-nauseum. But hey, I’m no longer a Microsoft employee, so I can tell you that in my off-the-cuff measurements, I saw that the XPath extension methods were only maybe 30% slower than the LINQ to XML axis methods.

Given that we are still IO bound on a 4-core laptop that is using an Intel solid state drive, and given that this approach can generate literally thousands of documents per minute, the XPath extension methods are fast enough! Good job, Ion!

Evaluating XPath Expressions in Context

In the post where I introduce the XPath-in-Content-Controls approach to document generation, Generating Open XML WordprocessingML Documents using XPath in Content Controls, I discuss how the approach that I take in the template document is analogous to putting together an XSLT style sheet using the ‘push’ approach. That approach can be summarized as follows:

  • The SelectDocuments XPath expression is evaluated in the context of the root element of the source XML document.
  • The SelectValue XPath expression is evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
  • The SelectRows XPath expression is also evaluated in the context of each one of the elements in the result set of the SelectDocuments XPath expression.
  • The XPath expressions in the prototype row (the second row) of the table are evaluated in the context of each one of the elements in the result set returned by the SelectRows XPath expression.
  • As usual with XML, the value of an element is the concatenated text nodes of the element, in other words, the textual content of the element. And, as usual with LINQ to XML, you can determine the value of an element by using the XElement.Value property.

The following example shows how each one of the XPath expressions is evaluated.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using System.Xml.XPath;

class Program
{
    static void Main(string[] args)
    {
        XElement data = XElement.Load("Data.xml");

        // The following XPath expression is evaluated in the context of
        // the root element of the XML document.
        IEnumerable<XElement> customers = data.XPathSelectElements("./Customer");

        // Each document would be generated in one iteration of the following
        // loop.
        foreach (var customer in customers)
        {
            // Assemble the filename for the document.  We get the string format
            // and the XPath expression from the Config content control.
            string fileName = String.Format("File{0}.docx",
                customer.XPathSelectElement("./CustomerID").Value);
            Console.WriteLine("Generating document: {0}", fileName);

            // Retrieve the values referenced by the SelectValue content controls.
            // The XPath expression is evaluated in the context of the Customer
            // element.
            string name = customer.XPathSelectElement("./Name").Value;
            string customerID = customer.XPathSelectElement("./CustomerID").Value;
            Console.WriteLine("CustomerID:{0}", customerID);
            Console.WriteLine("Name:{0}", name);

            // Retrieve the set of rows for the table.
            IEnumerable<XElement> rows =
                customer.XPathSelectElements("./Orders/Order");
            foreach (var row in rows)
            {
                // Retrieve the values for each row, based on the XPath
                // expressions in the prototype row.
                string productDescription =
                    row.XPathSelectElement("./ProductDescription").Value;
                string quantity =
                    row.XPathSelectElement("./Quantity").Value;
                string orderDate =
                    row.XPathSelectElement("./OrderDate").Value;
                Console.WriteLine(
                    "  ProductDescription:{0} Quantity:{1} OrderDate:{2}",
                    productDescription, quantity, orderDate);
            }
            Console.WriteLine();
        }
    }
}

When you run this example, you will see the following output, which parallels exactly the generated documents.


Generating document: File1.docx
CustomerID:1
Name:Andrew
  ProductDescription:Bike Quantity:2 OrderDate:5/1/2002
  ProductDescription:Sleigh Quantity:2 OrderDate:11/1/2000
  ProductDescription:Plane Quantity:2 OrderDate:2/19/2000

Generating document: File2.docx
CustomerID:2
Name:Bob
  ProductDescription:Boat Quantity:2 OrderDate:8/9/2000
  ProductDescription:Boat Quantity:4 OrderDate:3/25/2001
  ProductDescription:Bike Quantity:1 OrderDate:6/5/2002

Generating document: File3.docx
CustomerID:3
Name:Celcin
  ProductDescription:Bike Quantity:2 OrderDate:2/24/2001
  ProductDescription:Boat Quantity:4 OrderDate:5/6/2001

Comments

How to Research Open XML Markup

Recently, there was a question on the forums at OpenXMLDeveloper.org: How do you draw a horizontal line below a paragraph.  I used this question as the focus for a 11 minute video on the general approach that I use to research Open XML markup.  The approach that I take is:

Shows my general approach for solving Open XML developer problems and issues.

Comments (5)

Release of V2 of Doc Gen System: XPath in Content Controls

Update August 26, 2015: I have enhanced this document generation system, and published it as part of Open-Xml-PowerTools, which you can find at https://github.com/OfficeDev/Open-Xml-PowerTools. Going forward, I will be enhancing and maintaining that document generation system. Please feel free to clone / fork that repo, report issues on GitHub, and interact with me there.

Today I’m posting the release of version 2 of my simple document generation system.  In this example, you configure the document generation process by creating a template document that contains content controls.  You then enter XPath expressions in those content controls.  Those XPath expressions specify the data that the document generator pulls from the source data.  The source data is an XML document that contains data for each and every document that you generate.  The source XML document can also contain detail (children records) that populate tables in the generated document.  I detailed how the template document works in the post Generating Open XML WordprocessingML Documents using XPath Expressions in Content Controls.

This post is the 14th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Download: Generate Open XML WordprocessingML Documents using XPath Expressions in Content Controls

In my opinion, the use of XPath expressions in content controls is a superior approach to the one of entering C# code in content controls.  The code is cleaner and smaller (this first example is less than 240 lines of code).

I’ve recorded a short (2 minute) screen-cast that demonstrates this example in action.

Demonstrates the XPath-in-Content-Controls approach to document generation

So please download the example, try it out, and give me feedback.

Comments (42)

Update Data behind an embedded Chart in an Open XML WordprocessingML Document

I’ve been lurking over at OpenXMLDeveloper.org, answering questions.  A fairly involved question came up recently, which is: If you have an embedded chart in a word-processing document, how do you update the data behind the chart?  As it turns out, you have to update the data in two different places.  You have to update the data in the embedded spreadsheet, and you have to update cached values in the word-processing document.

I’ve written a post on OpenXMLDeveloper.org that contains example to update the data and update the cached values.  In addition, I recorded the following screencast, which walks through the process:

Walks through the process of updating the data behind a chart that is embedded in an Open XML WordprocessingML document.

Comments (3)

Replacing a Picture in a Picture Content Control in an Open XML WordprocessingML Document

You may have a picture content control where you want to replace the picture with a different picture.  This post shows the Open XML SDK V2 code that is necessary to find a picture content control with an alias of “MyPicture”.  It then finds the ImagePart, and then replaces the contents of the image part with a different image.


using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml.Drawing;

class Program
{
    static void Main(string[] args)
    {
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test1.docx", true))
        {
            SdtBlock cc = doc.MainDocumentPart.Document.Body.Descendants<SdtBlock>()
                .FirstOrDefault(c =>
                    {
                        SdtProperties p = c.Elements<SdtProperties>().FirstOrDefault();
                        if (p != null)
                        {
                            // Is it a picture content control?
                            SdtContentPicture pict =
                                p.Elements<SdtContentPicture>().FirstOrDefault();
                            // Get the alias.
                            SdtAlias a = p.Elements<SdtAlias>().FirstOrDefault();
                            if (pict != null && a.Val == "MyPicture")
                                return true;
                        }
                        return false;
                    });
            string embed = null;
            if (cc != null)
            {
                Drawing dr = cc.Descendants<Drawing>().FirstOrDefault();
                if (dr != null)
                {
                    Blip blip = dr.Descendants<Blip>().FirstOrDefault();
                    if (blip != null)
                        embed = blip.Embed;
                }
            }
            if (embed != null)
            {
                IdPartPair idpp = doc.MainDocumentPart.Parts
                    .Where(pa => pa.RelationshipId == embed).FirstOrDefault();
                if (idpp != null)
                {
                    ImagePart ip = (ImagePart)idpp.OpenXmlPart;
                    using (FileStream fileStream =
                        File.Open("After.jpg", FileMode.Open))
                        ip.FeedData(fileStream);
                    Console.WriteLine("done");
                }
            }
        }
    }
}

Comments (20)

Importing HTML that contains Numbering using altChunk

It is possible to import HTML that contains bullets or numbering using atlChunk.  Word 2007 or 2010 imports the numbered items and creates the appropriate WordprocessingML markup, as well as necessary numbering styles, to create a word-processing document that looks as close as possible to the original HTML.  The following example alters a document by adding an altChunk element at the end of the document.  The HTML that is imported contains an ordered list.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r =
            "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open("Test3.docx", true))
        {
            string html =
@"<html>
<head/>
<body>
<h1>Html Heading</h1>
<ol>
<li>one.</li>
<li>two.</li>
<li>three.</li>
</ol>
</body>
</html>";
            string altChunkId = "AltChunkId1";
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                "application/xhtml+xml", altChunkId);
            using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
            using (StreamWriter stringStream = new StreamWriter(chunkStream))
                stringStream.Write(html);
            XElement altChunk = new XElement(w + "altChunk",
                new XAttribute(r + "id", altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            mainDocumentXDoc.Root
                .Element(w + "body")
                .Elements(w + "p")
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }

    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }

    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}

Comments (4)

Generating Open XML WordprocessingML Documents using XPath Expressions in Content Controls

Over the last few days, I have completed a new prototype of an approach to Open XML WordprocessingML document generation. In this approach, I control the document generation process by placing XPath expressions in content controls. In contrast, the previous approach in this series of posts on document generation was controlled by writing C# code in content controls.

This post is the 13th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

When I started down this path of discovery around document generation, I would not have predicted it, but the XPath-in-Content-Controls approach is, in my opinion, much superior to the C#-in-Content-Controls approach. Going forward, I am going to abandon the C#-in-Content-Controls approach, and focus on this approach using XPath. There are some very cool places that we can take this approach.

To compare and contrast, the C#-in-Content-Controls prototype consists of less than 400 lines of code. While it was not fully fleshed-out, and there remain many necessary refinements, I would expect that a finished version would be perhaps 3000 lines of code.

The XPath-in-Content-Controls prototype that I am introducing in this post is even smaller. It is less than 240 lines of code. It is simpler, more robust, and more amenable to polishing. I expect that the finished example, including integration into a document-level add-in for Word 2010 will be less than 1000 lines of code. I’ll be posting V1 of the prototype with the next post in this series.

Driven from an XML Document

One of the nice things about the C#-in-Content-Controls approach is that you could drive the document generation process from literally any data you could get your hands on from the .NET framework. In contrast, with this approach, there is one and only one form of data source, which is an XML document. And in this first prototype, I am restricting the data to an XML document that contains XML in no namespace. Allowing for namespaces in the XML means that I would need to provide mapping between namespaces and namespace prefixes, and that would get in the way of discussing the architecture and merits of this approach. I’ll deal with this in the future.

In the meantime, if you have XML that uses namespaces (or any other variety of data sources), your first task is to transform that data source to XML in no namespace.

The XML document should look something like this:

<Customers>
  <Customer>
    <CustomerID>1</CustomerID>
    <Name>Andrew</Name>
    <HighValueCustomer>True</HighValueCustomer>
    <Orders>
      <Order>
        <ProductDescription>Bike</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>5/1/2002</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Sleigh</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>11/1/2000</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Plane</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>2/19/2000</OrderDate>
      </Order>
    </Orders>
  </Customer>
  <Customer>
    <CustomerID>2</CustomerID>
    <Name>Bob</Name>
    <HighValueCustomer>False</HighValueCustomer>
    <Orders>
      <Order>
        <ProductDescription>Boat</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>8/9/2000</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Boat</ProductDescription>
        <Quantity>4</Quantity>
        <OrderDate>3/25/2001</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Bike</ProductDescription>
        <Quantity>1</Quantity>
        <OrderDate>6/5/2002</OrderDate>
      </Order>
    </Orders>
  </Customer>
  <Customer>
    <CustomerID>3</CustomerID>
    <Name>Celcin</Name>
    <HighValueCustomer>False</HighValueCustomer>
    <Orders>
      <Order>
        <ProductDescription>Bike</ProductDescription>
        <Quantity>2</Quantity>
        <OrderDate>2/24/2001</OrderDate>
      </Order>
      <Order>
        <ProductDescription>Boat</ProductDescription>
        <Quantity>4</Quantity>
        <OrderDate>5/6/2001</OrderDate>
      </Order>
    </Orders>
  </Customer>
</Customers>

While it isn’t required, it is more convenient to use a form where the Orders element is a child of the Customer element. The reason for this will become clear.

The XPath-in-Content-Controls Template Document

The next step in introducing this approach is to take a look at the template document that will drive document generation. While looking at this template, you can compare and contrast it to the template that contains C# code in content controls.

In this template document, I am going to borrow some nomenclature from XSLT. One of the attributes of the xsl:apply-templates element is the select attribute. If you place an XPath expression in the optional select attribute, XSLT will apply templates to the set of nodes that are selected by the XPath expression. The XPath expression is applied relative to the current context of the node that is currently being transformed by the sequence constructor. I am going to use a very similar approach in the template document. In effect, I am going to turn an Open XML WordprocessingML document into something that is analogous to an XSLT style sheet. Don’t worry if this is not immediately clear. It will be before the end of this blog post series. The point of this paragraph is that I’m going to use the term Select to indicate an XPath expression that will be evaluated, and the results of the evaluation will become the current context for other operations.

As usual, I am going to show content controls in design mode. Here is the template document, in its entirety. Of course, the circles and arrows are added by me to aid in explanation.

image

The Config Content Control (*1)

Starting at the bottom of the document, there is the Config content control, which contains XML, with a root element of Config.

The DataFileName element specifies the source XML document that contains the data that drives the document generation process.

The SelectDocuments element specifies an XPath expression that when evaluated against the root element of the document returns a collection of elements, each of which represent a document to be generated. In the case of the XML data file that I presented earlier, the XPath expression “./Customer” returns a collection of the Customer child elements of the root Customers element. Given that source data file, the document generation process will generate three documents.

The DocumentGenerationInfo element, and its child elements contains the necessary information to control the actual physical generation of the documents – the directory where the documents will be placed, a .NET StringFormat that works in conjunction with the SelectDocumentName XPath expression to assemble the generated FileName.

As an aside, I initially played around with nested content controls instead of having a single content control that contains XML. While this approach works, maintaining nested content controls using the Word 2007 or Word 2010 user interface is idiosyncratic. I could write a pretty detailed bug report around the maintainability of nested content controls. Maintaining the XML in a single content control is a more satisfactory approach.

The SelectValue Content Control (*2)

At the top of the template document, you can see the SelectValue content controls. As mentioned in the last section, the SelectDocuments XPath expression selects multiple Customer elements. While generating each document in turn, each Customer element becomes the current context. The SelectValue XPath expression is then evaluated in the context of each Customer element in turn. One of the circled SelectValue XPath expressions selects the Name child element of the Customer element. The other circled SelectValue XPath expression selects the CustomerID child element of the Customer element. In XML, the value of an element is defined to be the concatenated descendant text nodes (in other words, its textual content). The document generation engine retrieves the value of the selected element and replaces the content control with the value.

The Table Content Control (*3)

Just as the SelectValue content control is evaluated in the context of a Customer element, the SelectRows content control is also evaluated in the context of a Customer element. The difference is that SelectValue is expected to select a single element, whereas the SelectRows expression is expected to select a collection of elements, one for each row in the table. For customer #1 (Andrew), the SelectRows XPath expression selects three Customer elements. The XPath expressions (pointed to by *4) stored in the prototype row (the second row in the table) are evaluated in the context of each row selected by the SelectRows expression.

You also often see a similar pattern in properly written XSLT style sheets. One template is evaluated in the context of the root element, which selects a set of elements. An xsl:apply-templates causes an XPath expression to be evaluated in the context of each element selected by the first template. And an xsl:apply-templates in the sequence constructor of the second template causes an XPath expression to be evaluated in the context of each element selected by the second template, thereby causing a third set of templates to be applied.

Once you are familiar with this approach (sometimes called the ‘pull’ approach), you never write XSLT style sheets in any other way. Inexperienced XSLT developers sometimes try to write style sheets by using loops and calling templates explicitly, instead of letting the pattern matching power of XSLT to do the heavy lifting. This incorrect approach is sometimes called the ‘push’ approach.

To summarize, the SelectDocuments expression selects multiple elements, one for each document. The SelectRows expression, evaluated in the context of the elements selected by SelectDocuments, selects multiple elements, one for each row. The XPath expressions in the prototype row are evaluated in the context of the row elements selected by SelectRows.

The Conditional Content Control (*5)

The conditional content control works in exactly the same way as SelectValue and SelectRows. The SelectTestValue expression is evaluated in the context of the Customer element. The retrieved value is compared to the contents of the Match content control. If there is a match, the Conditional content control is replaced by the contents of the Content content control in the generated document.

Advantages of the XPath-in-Content-Controls Approach

There are several advantages to the XPath-in-Content-Controls approach over the C#-in-Content-Controls approach:

  • We eliminate the two-step process for generating documents. The program that processes the template (and processes all of the XPath expressions in the template) does the actual document generation. We don’t need to generate code, and then compile and run the generated code.
  • We can catch errors in the XPath expressions, and supply the template designer with good error messages that indicate the specific XPath expression that contains the error.
  • We eliminate all of the issues associated with typing C# code into content controls. When entering C# code in Word, of course there is no Intellisense. It could be difficult to catch errors in the C# code. The issues associated with replacing single or double quotes with smart quotes is significantly reduced. Note that the issues around quotes is not entirely eliminated. There are circumstances where the template designer may need to use single or double quotes in XPath expressions.

In the next post, I’ll show a video of this approach in action.

Future posts:

  • Show this approach at scale
  • Review XPath semantics of LINQ to XML
  • Examine the issues around namespaces in the source XML document
  • Show the process of changing the schema
  • Add robustness and error handling
  • Integrate as a document-level managed add-in for Word 2010.

This is fun!

Comments (18)

Changing the Schema for this Open XML Document Generation System

Flexibility in a document generation system is very important to its usability.  We all know how it works.  You’ve been commissioned by the marketing department to put together a mailing to 50,000 customers.  After doing the work of putting together the template document, the marketing department *will* come ask for changes to the data and to the template document.  In the following screen cast, I show the process of adjusting the XML data that drives the document generation system, as well as adjusting the template document to use that data.

This post is the 12th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Shows changing the schema and the template document.

There are lots of disadvantages to this approach of editing C# code in content controls in a Word document:

  • It requires a developer to put together the template document.
  • If you write C# code that doesn’t compile in a content control, you don’t see any errors until you try to compile the generated program.
  • There is no Intellisense when editing this code.  In a couple of places, I ended up first getting a snippet of code to work in Visual Studio, and then pasting that code into the content control.  This is far from ideal.
  • The code that generates C# code from the template document is not long – only about 390 lines of code (see ProcessTemplate.cs in the Zip file).  However, it is a bit gnarly, particularly the bits that make it so that you can have Value, Table, or Conditional content controls within a Conditional content control.  However, the C# code that you write inside the template document is not so complex – just the code to generate the code.

There are advantages too:

  • The code is directly associated and stored with the document.  This is called ‘lexical proximity’ – you don’t need to find code in another file somewhere, and you don’t need to keep code and the template document in sync.
  • You can pull data from *any* data source.  I could easily modify the template document to use OData or the Managed Client Object Model to pull data from a SharePoint list.  I could also write some ADO.NET code to pull data from any SQL database.

It is not clear that the advantages outweigh the disadvantages.  In the next post in this series, I’m going to limit the data source to XML, and use XPath in the content controls.

Comments (1)

Data Warehouse Book Recommendations

Some time ago, while still at Microsoft, I was involved in a small business intelligence project.  Many, many years ago, I was a database application developer, but times have changed.  I had to get up to speed on how to build and use a data warehouse in a hurry.  Fortunately, one of my best friends is a data warehouse developer for a large insurance company, and knew exactly which books to study.  Recently, I was asked for book recommendations, so I’m passing along the books that Bob McClellan recommended to me.

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset

Comments

Getting Started with Open XML PowerTools Markup Simplifier

On OpenXmlDeveloper.org, in one of the forums, there is a thread about how to clean Word proofing errors clutter out of an Open XML WordprocessingML document.  In PowerTools, in the HtmlConverter project, there is a class called MarkupSimplifier, which can remove proofing errors.  In addition, it can simplify WordprocessingML markup in a variety of ways, including removal of comments, content controls, and etc.  The blog post, Enabling Better Transformations by Simplifying Open XML WordprocessingML Markup describes MarkupSimplifier in more detail.

Here is a small screen-cast that shows the use of MarkupSimplifier.  In the screen-cast, I use Open XML Package Editor Power Tool for Visual Studio 2010.

Walks through the process of downloading and compiling a sample for MarkupSimplifier.

Here is the listing of the small program that uses MarkupSimplifier

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using OpenXmlPowerTools;
using DocumentFormat.OpenXml.Packaging;

class Program
{
    static void Main(string[] args)
    {
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            SimplifyMarkupSettings settings = new SimplifyMarkupSettings
            {
                RemoveComments = true,
                RemoveContentControls = true,
                RemoveEndAndFootNotes = true,
                RemoveFieldCodes = false,
                RemoveLastRenderedPageBreak = true,
                RemovePermissions = true,
                RemoveProof = true,
                RemoveRsidInfo = true,
                RemoveSmartTags = true,
                RemoveSoftHyphens = true,
                ReplaceTabsWithSpaces = true,
            };
            MarkupSimplifier.SimplifyMarkup(doc, settings);
        }
    }
}

Comments (17)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »