Workaround for Bad Link Issue in OpenXML SDK

A number of people have reported a problem with the OpenXML SDK throwing an exception, “Invalid URI:The hostname could not be parsed.” This exception is actually coming from the Uri class. Since the OpenXML SDK is using System.IO.Packaging to get relationships for the document and System.IO.Packaging is throwing the exception when it tries to create the Uri object, it is impossible to open or modify the document using either of those. I have created a workaround in C# that allows you to modify or remove the bad links so that the document can then be opened successfully with the OpenXML SDK. If you just want the workaround without all the explanation, scroll down to the end of this post and you will see the code there. You can also see a video that I have created that quickly explains the situation and workaround.

First, if you type the following into a Word document and press Enter, it will automatically make it a link:

http://(www.bing.com)

As explained above, the Uri class will throw an exception when System.IO.Packaging processes the relationship for that link. This is seen when using the following code to open the document using the OpenXML SDK:

    document = WordprocessingDocument.Open(fileName, true);

If you were to try using System.IO.Packaging directly, the exception wouldn’t occur immediately. For example:

    Package pkg = Package.Open(fileName);
    PackagePart docPart = pkg.GetPart(PackUriHelper.ResolvePartUri(new Uri(“/”, UriKind.Relative), new Uri(“word/document.xml”, UriKind.Relative)));
    foreach (PackageRelationship item in docPart.GetRelationships())
    {
        Uri test = item.TargetUri;
    }

This code won’t throw the exception until the method “GetRelationships” is called. Unfortunately, that means that System.IO.Packaging cannot be used to correct the problem.

In order to fix the file so that it can be opened, I needed to access the relationship file in the docx package directly. I used the DotNetZip library(http://dotnetzip.codeplex.com/) to extract that file, modify it, and save it so that the document could be opened normally using the OpenXML SDK. Here is the C# code that does that:

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using Ionic.Zip;
using System.Xml.Linq;

namespace BadLinkTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string fileName = “../../Link.docx”;
            WordprocessingDocument document;
            try
            {
                document = WordprocessingDocument.Open(fileName, true);
            }
            catch (UriFormatException)
            {
                Clean(fileName, FixIt);
                document = WordprocessingDocument.Open(fileName, true);
            }
            // Process document here
            document.Close();
        }

        static string FixIt(string old)
        {
            return old.Replace(“(“, “”).Replace(“)”, “”);
        }

        static void Clean(string fileName, Func<string, string> fixUri)
        {
            using (ZipFile zip = ZipFile.Read(fileName))
            {
                ZipEntry item = zip[“word/_rels/document.xml.rels”];
                MemoryStream stream = new MemoryStream();
                item.Extract(stream);
                stream.Position = 0;
                XElement doc = XElement.Load(new StreamReader(stream));
                bool changed = false;
                foreach (XElement el in doc.Descendants()
                    .Where(n => n.Attribute(“TargetMode”) != null && n.Attribute(“TargetMode”).Value == “External”
                        && !Uri.IsWellFormedUriString(n.Attribute(“Target”).Value, UriKind.Absolute)))
                {
                    el.Attribute(“Target”).Value = fixUri(el.Attribute(“Target”).Value);
                    changed = true;
                }
                if (changed)
                {
                    zip.UpdateEntry(item.FileName, doc.ToString());
                    zip.Save();
                }
            }
        }
    }
}

Notice that I am using Ionic.Zip.dll from DotNetZip. There are three methods here.

The Main method is an example of how the code would look in your own program. It catches the UriFormatException and calls the Clean method to fix it. The Clean method requires a delegate to the function that will actually change the link. My version of the delegate, FixIt, removes any parenthesis from the link, since that is a common cause of the exception. Another option would be to return a dummy link, like “http://dummy.com”, no matter what the original link is.

The Clean method opens the document as a Zip file and extracts the document relationships into a MemoryStream. That can then be loaded into an XElement object for processing. The foreach loop finds all external relationships that are not well formed Uri’s. Those are modified by calling the delegate method, then the modified file is updated in the document and saved. As long as FixIt returns a modified link that is valid, the document can then be opened using the OpenXML SDK, as shown in the Main method.

It is possible to have invalid links in other relationship files (e.g. headers and footers). I expect that you would be able to extend this code to include other relationship files as needed, but I could extend this example to handle other relationship files in a general way, if there was interest. That code would be a lot more involved than this simple example.

I hope this helps out everyone who has been struggling with this issue. Please let me know if you still have problems with this issue.