Introducing a new class for PowerTools for Open XML: TextReplacer

Recently I wrote some code that implemented search-and-replace for Open XML WordprocessingML documents.  I wrote that code for an Open XML developer who needed to implement that functionality using XML DOM, although with a different language than C#.  Because XML DOM is standardized, translating the code to another language and another implementation of XML DOM is relatively straightforward.

I want to introduce search-and-replace functionality in a CMDLET in PowerTools for Open XML, but I have been moving PowerTools code away from XmlDocument, so I rewrote the search-and-replace code using LINQ to XML, using a functional transform.  It was an interesting and fun project.  The video below introduces the TextReplacer class, and compares it to the code that I presented that uses XmlDocument.  It is an interesting comparison of imperative code (using XmlDocument) and functional code (using LINQ to XML).

You can download the TextReplacer class from this blog post (in an attachment at the bottom).

Introduces TextReplacer, which is LINQ to XML code that replaces text in WordprocessingML documents.

!!!

8 Comments »

  1. Iman said,

    August 5, 2011 @ 10:46 pm

    Hello Eric, we did meet about year ago in Seattle (Microsoft) not sure if you remember or not. I really like to speak/email you. Is it possible?
    Thanks again
    -Iman

  2. Eric White said,

    August 6, 2011 @ 8:38 am

    Hi Iman, of course I remember. 🙂 You can email me at eric at ericwhite.com.

    -Eric

  3. Baz said,

    August 10, 2011 @ 1:07 am

    Would it be possible to adapt this approach to multi-line replacement text?

    For example, to take:

    Foo goes [INSERT FOO] here and [INSERT BAR] here.

    and end up with:

    Foo goes

    Foo with
    multiple lines

    here and

    Bar with
    multiple lines

    here.

    On the other extreme, could you replace a word/token with a table, or something similarly complex?

  4. Eric White said,

    August 12, 2011 @ 11:29 am

    Hi Baz,

    It would be possible to do a variation on this approach. The approach to take would be to invent a new ‘character’ that can be contained in a run that represents a paragraph mark. This ‘character’ would carry around the paragraph properties. The transform would need to be done at the level of a block-level content container (see http://msdn.microsoft.com/en-us/library/ff686712.aspx). First the transform would be to some markup that was invalid Open XML WordprocessingML markup, but is in a much better form for searching and replacing multiline text. Then there would be the search/replace transform. Then there would be one more transform back into valid WordprocessingML markup.

    This is on my list, but it may take a while before I get to it.

    -Eric

  5. Baz said,

    August 16, 2011 @ 1:03 am

    Eric,

    I sent you a reply via e-mail with more details. I hope you’ll take a look.

    On another note, your TextReplacer class doesn’t appear to handle empty replacements properly.

    Forgive me if WordPress butchers the formatting.

    your code:

    // The following code is locally impure, as this is the most expressive way to write it.
    XElement paragraphWithReplacedRuns = (XElement)CloneWithAnnotation(paragraphWithSplitRuns);
    for (int id = 1; id < matchId; ++id)
    {
    List elementsToReplace = paragraphWithReplacedRuns
    .Elements()
    .Where(e => {
    var sem = e.Annotation();
    if (sem == null)
    return false;
    return sem.MatchId == id;
    })
    .ToList();
    elementsToReplace.First().AddBeforeSelf(
    new XElement(W.r,
    elementsToReplace.First().Elements(W.rPr),
    new XElement(W.t, replace)));
    elementsToReplace.Remove();
    }

    I modified the above with a String.IsNullOrEmpty check like this to fix the issue:


    // The following code is locally impure, as this is the most expressive way to write it.
    XElement paragraphWithReplacedRuns = (XElement)CloneWithAnnotation(paragraphWithSplitRuns);
    for (int id = 1; id < matchId; ++id)
    {
    List elementsToReplace = paragraphWithReplacedRuns
    .Elements()
    .Where(e => {
    var sem = e.Annotation();
    if (sem == null)
    return false;
    return sem.MatchId == id;
    })
    .ToList();
    if (!String.IsNullOrEmpty(replace))
    {
    elementsToReplace.First().AddBeforeSelf(
    new XElement(W.r,
    elementsToReplace.First().Elements(W.rPr),
    new XElement(W.t, replace)));
    }
    elementsToReplace.Remove();
    }

    Cheers,
    -Baz

  6. Hong Tat said,

    October 17, 2012 @ 4:26 am

    Hi, thank you for your hard work and i enjoy reading your blog.
    Just for your information, i would like to highlight a bug inside the code. When we replace text with empty string, it hits error. The error is “Index was outside the bounds of the array.” at line 144 @ TextReplacer.cs.
    I have added code checking at the line
    if (textValue.Length > 0 && (textValue[0] == ‘ ‘ || textValue[textValue.Length – 1] == ‘ ‘))

  7. Ed OBrien said,

    December 6, 2013 @ 7:49 pm

    Hong Tat: Thank you for that fix!

  8. elumalai said,

    November 24, 2014 @ 10:53 am

    hi eric
    Im new to openxml, i want toc in openxml but the content in toc should show less than 50 characters.

    Please help ASAP

    Regards
    Elumalai Narayanan

RSS feed for comments on this post · TrackBack URI

Leave a Comment