How to Retrieve the Text of a Bookmark from an OpenXML WordprocessingML Document

Sometimes developers use bookmarks to delineate text in an OpenXML WordprocessingML document.  Then, as part of a larger system, they want to retrieve the text of that bookmark to process in some fashion.  Retrieving the text of bookmarks is made somewhat more complicated because the w:bookmarkStart and w:bookmarkEnd elements may very well not be at the same level of hierarchy in the XML.  This means that the developer may need to assemble text from paragraph that the w:startBookmark element is in (the text after the w:startBookmark element and before the end of the paragraph), assemble text from all intervening paragraphs between the w:startBookmark and w:endBookmark elements, and assemble text from the paragraph that the w:endBookmark element is in (the text before the w:endBookmark element).  This all leads to a fairly messy and involved algorithm.  However, there is an easier way to accurately assemble the text of a bookmark.  You can ‘flatten’ the paragraphs, transforming the WordprocessingML to another form that is actually invalid WordprocessingML markup, but is, in fact, easier to use to extract the text of the bookmark.

The following screen-cast walks through this algorithm.

Code is attached.

Download – Example Code