Getting Started with Open XML PowerTools Markup Simplifier

On OpenXmlDeveloper.org, in one of the forums, there is a thread about how to clean Word proofing errors clutter out of an Open XML WordprocessingML document.  In PowerTools, in the HtmlConverter project, there is a class called MarkupSimplifier, which can remove proofing errors.  In addition, it can simplify WordprocessingML markup in a variety of ways, including removal of comments, content controls, and etc.  The blog post, Enabling Better Transformations by Simplifying Open XML WordprocessingML Markup describes MarkupSimplifier in more detail.

Here is a small screen-cast that shows the use of MarkupSimplifier.  In the screen-cast, I use Open XML Package Editor Power Tool for Visual Studio 2010.

Walks through the process of downloading and compiling a sample for MarkupSimplifier.

Here is the listing of the small program that uses MarkupSimplifier

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using OpenXmlPowerTools;
using DocumentFormat.OpenXml.Packaging;

class Program
{
    static void Main(string[] args)
    {
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            SimplifyMarkupSettings settings = new SimplifyMarkupSettings
            {
                RemoveComments = true,
                RemoveContentControls = true,
                RemoveEndAndFootNotes = true,
                RemoveFieldCodes = false,
                RemoveLastRenderedPageBreak = true,
                RemovePermissions = true,
                RemoveProof = true,
                RemoveRsidInfo = true,
                RemoveSmartTags = true,
                RemoveSoftHyphens = true,
                ReplaceTabsWithSpaces = true,
            };
            MarkupSimplifier.SimplifyMarkup(doc, settings);
        }
    }
}

!!!

17 Comments »

  1. Douglas Laudenschlager said,

    March 11, 2011 @ 10:17 pm

    Thanks, Eric! This is better than I hoped for. -Doug

  2. Eric White said,

    March 12, 2011 @ 12:01 am

    Hey Doug! Glad to see you here, and glad the code is helpful.

    -Eric

  3. Bhavesh said,

    September 19, 2012 @ 10:07 am

    Hi Eric, Do you have any pointers of Markup Simplifier for Power Point files?

    Thanks,
    Bhavesh

  4. Eric White said,

    September 21, 2012 @ 4:09 pm

    Hi Bhavesh, sorry, I don’t have any thing to recommend in that area. It is a great idea, though.

  5. Otaku said,

    March 14, 2011 @ 1:28 am

    Eric – I hate to say this as it may send you down a path of doing absolutely nothing else for months on end (like what happened to me a while ago!), but we sure could use you over at Stackoverflow.com for some of the harder Office questions. I reference many of your writings there when applicable, like: http://stackoverflow.com/q/5278804/149573.

    I hang out in the usual suspect tags like openxml and vsto and vba. Come over and dig in 🙂

  6. Eric White said,

    March 15, 2011 @ 5:35 am

    Hi Otaku, I think that is a great idea! Due to other stuff (traveling, moving, new house), it will be 3-4 weeks before I can put any time in, but I’ll be there. As this post shows, I’ll also be helping out at OpenXMLDeveloper.org.

  7. infoware said,

    September 9, 2011 @ 5:49 pm

    I also have the same problem

    Hi,
    We are try to read powerpoint text using openxml but in some slide we face some issues :
    > in open xml font size is not same as powerpoint slide.
    > in slide we get each object using it’s x & y Position but in some slide it give wrong

    x & y position
    > we insert slide text in mysql but if there is any special character then it will not

    get proper charcter(i think it’s problem about utf8)

    Thanks in advance

  8. mangesh jagtap said,

    September 3, 2012 @ 8:25 am

    i am facing one issue……
    in a word document i have placed content as e.g [test]
    but in xml it appears as [test]
    At run time i want to replace content of [test]…..
    due to above problem i am not able to replace the content…..

    Please advise

  9. mangesh jagtap said,

    September 3, 2012 @ 8:27 am

    // correction in above comment

    i am facing one issue……
    in a word document i have placed content as e.g [test]
    but in xml it appears as [ some tags test some tags]
    At run time i want to replace content of [test]…..
    due to above problem i am not able to replace the content…..

    Please advise

  10. Himanshu said,

    May 1, 2013 @ 4:00 pm

    Hi,
    When I try to use this code, it gives me error “Assembly generation failed — Referenced assembly ‘OpenXmlPowerTools’ does not have a strong name”

    I’ve downloaded the “PowerTools for OpenXML 2.2” from codeplex and added the reference to OpenXmlPowerTools.dll in the project.

    The example download on the codeplex that you have shown in video doesn’t exist now. Not sure whats the best way to use this DLL?

    Himanshu

  11. Eric White said,

    May 2, 2013 @ 1:05 am

    Hi, I’ll have to take a look at this. PowerTools 2.2 was put together with Visual Studio 2010 and PowerShell 2.0, so the project needs some tweaking to work with VS2012 and PowerShell 3.0.

    I haven’t examined building PowerTools for quite a while – I mostly use it in source code form. I am also in the process of putting together a new version of PowerTools. My goal is to make something that doesn’t require building anything. Just put the module in place and it works.

    Cheers, Eric

  12. Himanshu said,

    May 2, 2013 @ 11:41 am

    Hi Eric,
    Thanks for the reply. But I’m actually using it with Visual Studio 2010 in the source code.
    I’ve included the PowerTools 2.2 project that I downloaded from codeplex to my solution and then added the reference to OpenXmlPowerTools in my project so I can use MarkupSimplifier.
    I’ve got around the issue of “Referenced assembly ‘OpenXmlPowerTools’ does not have a strong name” by signing (Project Properties–> Signing tab–> Sign assembly). But now I’m getting another issue, when I try to debug the solution initially it loads fine but later gives an exception:
    System.IO.FileNotFoundException was caught
    Message=Could not load file or assembly ‘OpenXmlPowerTools, Version=2.2.0.0, Culture=neutral, PublicKeyToken=981466f262afc448’ or one of its dependencies. The system cannot find the file specified.

    I’m really stumped, I’ve tried adding reference to only the OpenXmlPowerTools DLL but that doesn’t seem to work either and gives same message.

    Not sure what might be causing it, please could you help.

    Thanks
    Himanshu

  13. Himanshu said,

    May 2, 2013 @ 11:57 am

    Also, I’m using Windows Server 2008 R2, and Sharepoint 2010 with Powershell 2.0.

  14. Matthew Osborn said,

    July 23, 2013 @ 10:14 pm

    Eric, Excellent work. Thank you for sharing! Do you have any ideas why the merging will not descend into Drawing Text Boxes?

  15. Eric White said,

    August 8, 2013 @ 1:26 pm

    Hi Matthew,

    I think it is a bug. I need to revisit that code, in any case. It is on the list of things to do.

    Cheers, Eric

  16. John said,

    March 19, 2015 @ 1:53 pm

    Hi Eric.
    I am currently using the logic above. Problem I am seeing is that it never changes the length of the innerxml for the Document. Isnt that where I would see a lot of the changes and benefit. Here is my logic.
    Any help would be great.
    Thx,
    ~john
    using (WordprocessingDocument doc =
    WordprocessingDocument.Open(filename, true))
    {
    SimplifyMarkupSettings settings = new SimplifyMarkupSettings
    {
    RemoveComments = true,
    RemoveContentControls = true,
    RemoveEndAndFootNotes = true,
    RemoveFieldCodes = true,
    RemoveLastRenderedPageBreak = true,
    RemovePermissions = true,
    RemoveProof = true,
    RemoveRsidInfo = true,
    RemoveSmartTags = true,
    RemoveSoftHyphens = true,
    ReplaceTabsWithSpaces = true,
    };
    MarkupSimplifier.SimplifyMarkup(doc, settings);
    }

  17. Fun and games with SharePoint and Open XML said,

    May 8, 2015 @ 11:16 am

    […] also discovered the “Power Tools for Open XML” library, which can be used to simplify the formatting on a Word document a bit. I ran it on my template, then generated new C# code, and now I’m […]

RSS feed for comments on this post · TrackBack URI

Leave a Comment