How the OpenXmlRegex Class Works

Return to the
OpenXmlRegex Developer Center
In a previous blog post, I introduced a new class (OpenXmlRegex) that is in PowerTools for Open XML.  This class enables you to search for content via a regular expression, and if it finds the content, you can optionally replace it with new content.  Further, this class supports revision tracking.  It can search through content that contains revision tracking, and it can optionally introduce revision tracking into the document as it replaces content.

The OpenXmlRegex class is pretty useful in a variety of scenarios.  You could, for instance, define specific patterns such as <# KeyWord #>, search for those patterns, and then replace them with customized content based on the key word.  Another interesting scenario is where you want to search for content that doesn't meet your corporate standards.  You might have very specific ways to refer to certain products, or to divisions of your enterprise, and you might want to write a utility that searches for a variety of ways that document authors make incorrect references.

In the previous screen-cast (Search and Replace Content in DOCX, PPTX using Regular Expressions), I walked through the API, and explained some of the more interesting aspects of the behavior of the class.  In the following screen-cast, I explain the algorithm.  This will be interesting to students of Open XML and document formats in general.  It will be interesting to anyone who is porting this code to another platform such as Java, C++, or JavaScript.  And finally, it will be interesting to me in a few years – it will remind me of how the code works!

Cheers, Eric White