Handling Invalid Hyperlinks (OpenXmlPackageException) in the Open XML SDK

One of the pernicious problems around the Open XML SDK is that if a document contains a hyperlink that has an invalid Uri, then Word, Excel, or PowerPoint will happily open the document, whereas System.IO.Packaging (and therefore the Open XML SDK) will throw an exception.

The problem that we face with this is that we can’t fix this in the Open XML SDK; it is thrown from deep inside the classes in System.IO.Packaging.  This problem occurs because System.IO.Packaging creates a Uri object for every external relationship, and if the target of that relationship is not a valid Uri, then we see the thrown exception.  System.IO.Packaging has been frozen for quite some time, and it currently isn’t possible to submit this as a bug and expect it to be fixed.  This is understandable in that there are hundreds of thousands of programs that rely on these classes, and changing semantics in the smallest way very well might cause compatibility problems.

However, there is a way to work around this in a fairly clean way.  I have written a small method that uses the classes in System.IO.Compression to open an Open XML document (of any type – DOCX, XLSX, or PPTX), examine all external relationships and if the relationship does not contain a valid Uri, the method calls a callback with the invalid target.  You can write this callback to return any valid Uri that you want, so long as it is a valid Uri.  A possible candidate would be the target of http://broken-link/.  The FixInvalidUri method then updates the target of the external relationship with the valid Uri.  You can then open and process the document as usual.

Following is the complete listing of the class UriFixer, as well as the code to use it.  The approach that you take when using this class is to first attempt to open the document as usual, catching OpenXmlPackageException.  If that exception is thrown, and if the text of that exception contains “Invalid Hyperlink”, then the code calls UriFixer.FixInvalidUri.  After calling FixInvalidUri, the code then opens the fixed document (or spreadsheet / presentation) as usual.

using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;

class Program
{
   
static void Main(string[] args)
   
{
       
var fileName = @"..\..\..\Test.docx";
       
var newFileName = @"..\..\..\Fixed.docx";
       
var newFileInfo = new FileInfo(newFileName);

       
if (newFileInfo.Exists)
            newFileInfo
.Delete();

       
File.Copy(fileName, newFileName);

       
WordprocessingDocument wDoc;
       
try
       
{
           
using (wDoc = WordprocessingDocument.Open(newFileName, true))
           
{
               
ProcessDocument(wDoc);
           
}
       
}
       
catch (OpenXmlPackageException e)
       
{
           
if (e.ToString().Contains("Invalid Hyperlink"))
           
{
               
using (FileStream fs = new FileStream(newFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
               
{
                   
UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
               
}
               
using (wDoc = WordprocessingDocument.Open(newFileName, true))
               
{
                   
ProcessDocument(wDoc);
               
}
           
}
       
}
   
}

   
private static Uri FixUri(string brokenUri)
   
{
       
return new Uri("http://broken-link/");
   
}

   
private static void ProcessDocument(WordprocessingDocument wDoc)
   
{
       
var elementCount = wDoc.MainDocumentPart.Document.Descendants().Count();
       
Console.WriteLine(elementCount);
   
}
}

public static class UriFixer
{
   
public static void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
   
{
       
XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
       
using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update))
       
{
           
foreach (var entry in za.Entries.ToList())
           
{
               
if (!entry.Name.EndsWith(".rels"))
                   
continue;
               
bool replaceEntry = false;
               
XDocument entryXDoc = null;
               
using (var entryStream = entry.Open())
               
{
                   
try
                   
{
                        entryXDoc
= XDocument.Load(entryStream);
                       
if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
                       
{
                           
var urisToCheck = entryXDoc
                               
.Descendants(relNs + "Relationship")
                               
.Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
                           
foreach (var rel in urisToCheck)
                           
{
                               
var target = (string)rel.Attribute("Target");
                               
if (target != null)
                               
{
                                   
try
                                   
{
                                       
Uri uri = new Uri(target);
                                   
}
                                   
catch (UriFormatException)
                                   
{
                                       
Uri newUri = invalidUriHandler(target);
                                        rel
.Attribute("Target").Value = newUri.ToString();
                                        replaceEntry
= true;
                                   
}
                               
}
                           
}
                       
}
                   
}
                   
catch (XmlException)
                   
{
                       
continue;
                   
}
               
}
               
if (replaceEntry)
               
{
                   
var fullName = entry.FullName;
                    entry
.Delete();
                   
var newEntry = za.CreateEntry(fullName);
                   
using (StreamWriter writer = new StreamWriter(newEntry.Open()))
                   
using (XmlWriter xmlWriter = XmlWriter.Create(writer))
                   
{
                        entryXDoc
.WriteTo(xmlWriter);
                   
}
               
}
           
}
       
}
   
}
}

We are considering including this method in the Open XML SDK itself.  We would make a few overloads of the WordprocessingDocument.Open method, the SpreadsheetDocument.Open method, and the PresentationDocument.Open method.  These overloads would take the callback as an argument, just as in the above example.  These new methods would first attempt to open the document in the normal way.  If the attempt to open is successful, then these methods would return the newly opened document.  However, if System.IO.Packaging throws the OpenXmlPackageException, and if the document were opened for writing, then the method would open, modify, and save a fixed document.  It would then attempt to open again, and return the newly opened document.

With this approach, the idiom to open the document would be almost identical to the current approach to opening a document.  The only difference would be the inclusion of the callback method as an argument.

If the document was opened for read-only access, then the various methods would create a copy of the document in memory, fix the broken Uri objects, and then open and return the fixed document (for read-only access).

Please feel free to comment about how this approach would work for you.  If we have agreement on this approach, then in a month or two, we will make the change to the open source version of the Open XML SDK.