Importing HTML that contains Numbering using altChunk

It is possible to import HTML that contains bullets or numbering using atlChunk.  Word 2007 or 2010 imports the numbered items and creates the appropriate WordprocessingML markup, as well as necessary numbering styles, to create a word-processing document that looks as close as possible to the original HTML.  The following example alters a document by adding an altChunk element at the end of the document.  The HTML that is imported contains an ordered list.


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r =
            "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open("Test3.docx", true))
        {
            string html =
@"<html>
<head/>
<body>
<h1>Html Heading</h1>
<ol>
<li>one.</li>
<li>two.</li>
<li>three.</li>
</ol>
</body>
</html>";
            string altChunkId = "AltChunkId1";
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                "application/xhtml+xml", altChunkId);
            using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
            using (StreamWriter stringStream = new StreamWriter(chunkStream))
                stringStream.Write(html);
            XElement altChunk = new XElement(w + "altChunk",
                new XAttribute(r + "id", altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            mainDocumentXDoc.Root
                .Element(w + "body")
                .Elements(w + "p")
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }

    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }

    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}

!!!

3 Comments »

  1. Navin Agarwal said,

    June 26, 2011 @ 12:08 pm

    Can I import XHTML that contains tables and images in PowerPoint using altChunk.
    The XHTML is coming from InfoPath form, that contains RTF field, it gets stored in SQL and I want to insert this XHTML data in PowerPoint.

  2. Graham said,

    August 22, 2011 @ 3:51 pm

    Nice article Eric. Considering how long this stuff has been out there, there is very little good information about this.
    Question for you – I want to take a web page (or part of it) and generate a document. The above code works great, but there is no mechanism for applying the css styles that are present when the original web page is viewed in a browser.
    Apart from adding them inline to me HTML string, or basing the word document from a template which has equivalent styles, is there any way of achieveing this (would be nice to simply point the altchunk to a css file but I suspect that may not be possible!)?
    Thanks.

  3. Rohit said,

    October 10, 2016 @ 7:47 am

    I am creating one slide PowerPoint file using open XML in .net c#. I have tagged the placeholder in the PPT which i need to update programatically. I am able to find the placeholder and can update its value from the database.

    Now the problem is I need to display some HTML code which user has input using WYSIWYG editor in a same format as user has entered e.g. like in bullets.

    When i try to replace the place holder with the HTML text, the HTML text was pasted as is like all the tags etc.

    Please advice.

RSS feed for comments on this post · TrackBack URI

Leave a Comment