Screen-Cast: Ultra High Performance Open XML Document Modification

Some applications need to achieve a high level of performance when accessing or modifying Open XML documents.  As an example, you might have a search crawler that runs for your CMS that needs to regularly process hundreds of thousands of Open XML documents.  Another example is that you may need to sanitize documents that are internal to your company for public consumption.  You might want to search for banned phrases, correct use of trademarks and servicemarks, and so on.  Therefore you need to architect a system that can process Open XML documents as fast as possible.  You might be tempted to make a multi-threaded application.  However…

The Open XML SDK is not thread safe.  The reason is that the Open XML SDK uses System.IO.Packaging, which in turn uses System.IO.IsolatedStorage.  IsolatedStorage stores information in a secret directory that is unique based on the strong name of the assembly or EXE that is using it.  If two threads (or the same EXE run twice) use the Open XML SDK, then both threads or processes will attempt to modify content in this secret directory, and the result will be exceptions that are thrown from deep inside System.IO.Packaging.

The solution presented in this screen-cast is:

The following screen-cast discusses the problem in depth, discusses one approach to mitigating the problem, and demonstrates an example program (which is attached to this post).

The explanation of the basic issues is repeated in videos #1, #3, and #4, so that you can go directly to the video that most closely addresses your scenario.

Link Summary
1 Handling ObjectDisposedExceptions in the Open XML SDK. Discusses the root cause of the spurious ObjectDisposedExceptions, and discusses one approach to mitigate them. This screen-cast focuses on the fix for the scenario where you have a web site with Open XML functionality, and it is possible that two users are accessing the Open XML SDK at the same time.
2 Walkthrough of Code that avoids ObjectDisposedExceptions Walks through the code that I introduced in the first screen-cast.
3 Ultra High-Performance Open XML Document Generation Demonstrates an approach that uses multiple processes to enable ultra-high performance Open XML document generation, while avoiding the thrown ObjectDisposedExceptions that you would see if you took a naive multi-threaded approach.
4 Ultra High-Performance Open XML Document Modification and Processing In a similar way to screen-cast #3 in this series, this screen-cast demonstrates an approach that uses multiple processes to enable ultra-high performance Open XML document generation, while avoiding the thrown ObjectDisposedExceptions that you would see if you took a naive multi-threaded approach. Processing documents (as opposed to modifying documents) complicates issues a bit. We need to be prepared for invalid documents that cause code to throw exceptions, and for documents that cause code to hang.
5 Ultra High-Performance Open XML Document Modification and Processing Code Walkthrough Walks through the code that I introduced in the fourth screen-cast.