Opening and Modifying Embedded Documents in WordprocessingML Documents

There are several scenarios where you have an entire Open XML document or spreadsheet embedded in another document.  For example, you may have a chart that contains data from an Excel worksheet, and this chart and Excel worksheet are embedded in a WordprocessingML document.  Another scenario is where you are using altChunk to import several WordprocessingML documents into another WordprocessingML document.

In the case of the embedded Excel workbook, you may want to update the data for the chart.  After you have updated the cached data for the chart, you also need to update the data in the embedded Excel workbook, so that when the user double-clicks on the chart, he or she can alter the data in the embedded Workbook.

The main idea is that you first have to open up the containing WordprocessingML document, find the part that contains the embedded spreadsheet or document, stream the Open XML document from the part, create a byte array from the stream, and then open up the embedded Open XML package for modification.  After completing your modifications, you need to stream the modified document back into the part, and then finally close the containing document.  It is a bit of effort to get this code to work properly.  After getting this code working properly, I decided to write a blog post – I’m don’t want to have to work it out again!

The following little program goes into a WordprocessingML document, looks for and fixes a circumstance where a run element contains multiple run property elements (which is invalid).  Further, this problem possibly could exist in some documents that are being imported using the altChunk technique, so this little program goes through all such embedded documents and looks for and fixes the same issue.

It would be a minor variation on this program to open a WordprocessingML document, then open an embedded workbook, make modifications to the embedded workbook, and finally make modifications to the containing document.

This little program makes use of three modules from PowerTools for Open XML: PtUtil.cs, PtOpenXmlUtil.cs, and PtOpenXmlDocument.cs.

Download: Example code that contains the entire project

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using OpenXmlPowerTools;

class Program
{
    static void Main(string[] args)
    {
        var fileToFix = new FileInfo("../../../FixToFix.docx");
        var newWmlDoc = DocumentFixer(new WmlDocument(fileToFix.FullName));
        var fi = new FileInfo("../../../FixedDocument.docx");
        if (fi.Exists)
            fi.Delete();
        newWmlDoc.SaveAs(fi.FullName);
    }

    static WmlDocument DocumentFixer(WmlDocument wmlDoc)
    {
        using (OpenXmlMemoryStreamDocument streamDoc = new OpenXmlMemoryStreamDocument(wmlDoc))
        {
            using (WordprocessingDocument wDoc = streamDoc.GetWordprocessingDocument())
            {
                foreach (var part in wDoc.ContentParts())
                {
                    var runsWithMultipleRpr = part
                        .GetXDocument()
                        .Descendants(W.r)
                        .Where(r => r.Elements(W.rPr).Count() > 1)
                        .ToList();
                    foreach (var run in runsWithMultipleRpr)
                    {
                        run.Elements(W.rPr).Skip(1).Remove();
                    }
                    part.PutXDocument();
                }
                foreach (var altChunkPart in wDoc.MainDocumentPart.AlternativeFormatImportParts)
                {
                    WmlDocument fixedDoc;
                    using (Stream partStream = altChunkPart.GetStream())
                    {
                        byte[] ba = ReadFully(partStream);
                        var doc = new WmlDocument("file.docx", ba);
                        fixedDoc = DocumentFixer(doc);
                    }
                    using (MemoryStream ms = new MemoryStream())
                    {
                        ms.Write(fixedDoc.DocumentByteArray, 0, fixedDoc.DocumentByteArray.Length);
                        ms.Seek(0, SeekOrigin.Begin);
                        altChunkPart.FeedData(ms);
                    }
                }
            }
            return streamDoc.GetModifiedWmlDocument();
        }
    }

    public static byte[] ReadFully(Stream input)
    {
        byte[] buffer = new byte[16 * 1024];
        using (MemoryStream ms = new MemoryStream())
        {
            int read;
            while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
            {
                ms.Write(buffer, 0, read);
            }
            return ms.ToArray();
        }
    }
}