Eric White's Blog
Open XML, SharePoint, and Office
Generating Open XML WordprocessingML Documents
This is a blog post series on parameterized Open XML WordprocessingML document generation. While it is easy enough to write an purpose-built application that generates WordprocessingML documents, too often, developers find themselves building new applications for similar but somewhat different scenarios. However, if we take the right approach, it is possible to build a simple document generation system that makes it far easier to address a wide variety of scenarios. I believe that a flexible document generation system can be written in a few hundred lines of code. This page lists the posts that are part of the series. I’ll be updating this page with new posts as I write them.
When I started this series, I initially started using a design where the document template designer writes C# code in content controls. That approach is interesting, and it was fun to write. However, I have determined that writing XPath expressions in content controls is a superior approach. I am going to let those twelve posts stand on their own, and start a new list of posts. I’ll maintain both lists on this page.
XPath-in-Content-Controls
| Post Title | Description | |
| 1 | Generating Open XML WordprocessingML Documents using XPath Expressions in Content Controls | In this post, I present the ideas around configuring the document template for mass document generation using XPath expressions in content controls. |
| 2 | Release of V2 of Doc Gen System: XPath in Content Controls | Release of V2 of a simple document generation system. In this example, you configure the document generation process by creating a template document that contains content controls. You then enter XPath expressions in the content controls to configure how the document generator pulls data from a source XML file. |
| 3 | Review of XPath Semantics of LINQ to XML | In this post, I discuss the semantics of the XPath extension methods. In addition, I provide a small example that demonstrates how the various XPath expressions in the template document are related to each other. |
| 4 | Change the Schema for Simple Free Doc Generation System | Short (3-minute) screen-cast that shows changing the schema for the XPath-in-Content-Controls approach to document generation. |
C#-in-Content-Controls
| Post Title | Description | |
| 1 | Generating Open XML WordprocessingML Documents | Introduces this blog post series, outlines the goals of the series, and desribes various approaches that I may take as I develop some document generation examples. |
| 2 | Using a WordprocessingML Document as a Template in the Document Generation Process | In this post, I examine the approaches for building a template document for the document generation process. In my approach to document generation, a template document is a DOCX document that contains content controls that will control the document generation process. |
| 3 | The Second Iteration of the Template Document | Based on feedback, this post shows an updated design for the template document. |
| 4 | More enhancements to the Template Document | This post discusses an enhancement to the document template that enables the template designer to add infrastructure code. In addition, it discusses how the document generation process will work. |
| 5 | Generating C# Code from an XML Tree using Virtual Extension Methods | Presents code that given any arbitrary LINQ to XML tree, can generate code that will create that tree. The code to generate code is written as a recursive functional transform from XML to C#. |
| 6 | Simulating Virtual Extension Methods | Shows one approach for extending a class hierarchy by simulating virtual extension methods. |
| 7 | Refinement: Generating C# code from an XML Tree using Virtual Extension Methods | Makes the approach of generating code that will generate an arbitrary XML tree more robust. |
| 8 | Text Templates (T4) and the Code Generation Process | Explores T4 text templates, and considers how they can be used in the Open XML document generation process. |
| 9 | A Super-Simple Template System | Defines a template system that makes it easier to generate C# code. |
| 10 | Video of use of Document Generation Example | Screen-cast that shows the doc gen system in action. |
| 11 | Release of V1 of Simple DOCX Generation System | Release of the first version of this simple prototype doc gen system. |
| 12 | Changing the Schema for the Document Generation System | Contains a short screen-cast that shows how to adjust the data coming into the doc gen system, and to adjust the document template to use the new data. |
Hi Eric,
Would you mind if you could elaborate on your msdn write up entitled “Generating Documents from SharePoint with Open XML Content Controls (http://msdn.microsoft.com/en-us/magazine/ee532473.aspx)”?
If I want to run your code which you graciously made it avilable in the above-mentioned page, what are the necessary steps I need to follow? For instance do I need to create document library(ies) in the site that I am going to deploy the SharePoint solution, etc..? Please advise.
Sincerely
Hi Starkey,
That code was written for SharePoint 2007, and I didn’t include any procedures for building / running it. However, before the article was published, I tested the code on the beta version of SharePoint 2010, and the code ran without modifications.
I will record a short screen-cast of building that example on SharePoint 2010, and post it shortly.
-Eric
Hi Eric,
I too would benefit from a closer look at you revisiting this topic: http://msdn.microsoft.com/en-us/magazine/ee532473.aspx. A screen cast would be great!
I am learning ASP.NET, C# and SharePoint Development – I’m not from a programming background – so it’s a bit of a steep learning curve!!
I think all of the work you do re: Open XML is awesome – I just wish I could understand it all…
Nicola
Hi Nicola,
I would love to get to that screen-cast. I will as soon as I am able.
-Eric
Hi Eric,
Thanks for the excellent work and sharing it with the community.
I hope someday to add to the discussion. My project area deals with low volume high complexity documents in contrast to this series which addresses higher volume lower complexity documents. Are you interested in comments that are somewhat off topic but that would introduce ideas or questions associated with these contrasting situations?
Sincerely, Mike
Hi Mike,
Yes! I’m absolutely interested in ideas/questions associated with low volume high complexity docs.
-Eric
Great.
Even though some of the blogs are superseded by others, I’d still like to comment on some of the earlier versions potentially. WIll you be reviewing older blogs or just the more recent ones that reflect the current version of things?
- Mike
I receive notification and respond to any comment on any post, so yes, please post comments on the relevant posts.
Toward my goal of contributing to the discussion and as a point of reference, let me describe the business situation behind my categorization of Low Volume High Complexity (LVHC):
1) Document sets are created to support a project. A typical set can include 50-70 documents and contains simple information letters, moderately complex legal documents, interview and data collection forms, calculation forms, summarization forms, task tracking forms and client supplied forms. Some of the documents blend a letter style with a form like style of information.
Forms are sometimes dictated by clients and do not fall into computationally convienent patterns. Other computationally inconvienent situations arise when forms are tight or the real estate is at a premium.
2) The data embedded in the documents has a lot of interdependence and does not lend itself easily to support by a relational data model. Partial support of documents by a data source and manual completion of the rest is sometimes necessary. Over time, the data source support is enhanced to replace the manual completion part.
3) Each document set is used 1-20 times for a project depending on the number of units handled by the project.
4) Units are worked individually, although occassionally generation of a document is done for all units at once. Typically, documents are generated 1 at a time, occassionally 5-20, and rarely 100 or more.
5) Different documents are needed at different times and are created over a period of a year or more. Regeneration of a new version of a document is common.
6) Documents are often taken from an existing project to form the basis of a new project. Typically the templates are copied and modified for the new project. However, sometimes a generated document that was modified manually after generation to suit a particular situation is used. The process used is ad-hoc and depends on how long it takes to create a new document template or document set. Generated documents need to be capable of being turned back into a template.
7) The business involves consulting, is quite dynamic and needs to be responsive to the business situation. Often there is not time for IT involvement in the particulars of the document creation. Many documents end up being done manually or partially generated and completed manually. The manual component here refers to users entering data in a content control in place of the generation process supplying data.
9) Sometimes the same data needs to be formatted in different ways within the same document.
10) As an aside, some of the forms are Excel spreadsheets so it would be nice to have an approach that applies to both Word and Excel.
Hopefully, this will help pave the way into thinking about how LVHC could impact things. I anticipate adding comments along the way indicating where these LVHC needs are satisifed and where additional considerations are needed.
Hi Mike,
Thanks for a detailed list of requirements. This is very interesting. Here are a few thoughts about characteristics of a system that would meet those business requirements:
The process must be interactive. One good platform for this doc gen system would be SharePoint 2010. When the document generation process is kicked off, the system might put up web parts that query the user for generation specific content that applies to all generated documents. The system then might put up web parts serially for each specific document. Content from these are then integrated into the documents.
The content is quite a bit more involved than my simple examples. For instance, as you mention, you may want to generate Excel spreadsheets also. You may want to automatically embed those spreadsheets in Word documents.
It may be useful to use SharePoint document sets as a way to manage a document project. I’m not certain of this, but it might be applicable. Another approach would be to use folders in document libraries to manage a project. Alternatively, each project might be in its own document library, or even a set of document libraries. Using SharePoint also would provide infrastructure so that a project can be maintained over a year or more.
It would be necessary to supply more intelligent content, such as automatically calculated content, content that is looked up in a dictionary, etc.
I think it would be possible to build a system that handles both low-volume high complexity situations, and high-volume low complexity projects.
-Eric
Eric,
Thanks for bringing upto speed with OpenXML and XPath. I have a question which i noticed in your Transform method, where the data in the xml node values loose their linebreaks when transformed to word. Any advise on how to prevent tht?
Thanks,
Atanu
Hi Atanu,
Can you send an example to me that shows this issue? You can send directly to me @ eric at ericwhite.com. I’m not aware of any issue like this, and want to make sure I get it fixed! Thanks!
-Eric
Sent you an email with some sample code.
Hello Eric,
I have a bunch of questions. A question on your article on “Generating Documents from SharePoint with Open XML Content Controls”. how would you add a Table ContentControl as I notice in your Firgure 2 there is a border outside the table.
Also I have this requirement for my current task:
- I have a set of documents to be approved in SharePoint. Once a document status is set to “Approved”, a list of approver’s name and date of approval should be inserted in the document.
- To insert this information to the document I have two approaches:
1. To programmatically add a page with all contentcontrols at the beginning of the document and insert all needed information.
2. To create a document template with page one containing all the needed contentcontrols. This means users need to use this template to add contents for approval. And when it is approved, I just need to add information to specific contentcontrols. This has its drawback in that the contentcontrols could be deleted by the users.
Can you recommend to me a book that I can use for Open XML programming?
Thanks a lot!
Hi Eric,
Great stuff here – looking forward to reading through it all. I do have a question for you.
I’ve been searching for an answer on this, but haven’t found one. You seem to have good experience in this area so I thought you might be able to point me in the right direction. Basically, I want to be able to add custom attributes to wordml markup that I am creating from code. I want to allow users to edit the word xml based docs in word and then be able to reprocess them with xslt later, leveraging those custom attributes for filtering logic (say on a table row) and retaining the edits. If I just add in the attributes, they are removed after the document is saved through the word UI. Is there a way to add custom attributes to wordml elements that you know of? Thanks for any pointers. – Pete
PS I’ve also posted a question on msdn for this as well Here
Hey Eric, do you something that will bring the data from list of SharePoint and along with it use the word template to save/display data as HTML using OpenXML.
I don’t, but check back with me early next year. It is something I am definitely thinking about.
Hi Eric,
We have a business requirement to generate word documents based on the content from Sharepoint lists.
We have a master template with bunch of content placeholders. We have lot of customers in different cities. Each city will have a row in the sharepoint list. The city specific information is stored in the sharepoint lists…
When they click the generate button, the document should be generated by binding content placeholders in the document to the columns in the sharepoing list.
Is there a way to do this without writing any programs? I came across this article, but I need it other way.
http://blogs.office.com/b/microsoft-word/archive/2007/01/18/xml-mapping-with-word-sharepoint.aspx
I just need to generate a document rather than modifying it and saving it.
[...] http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/ [...]
I’m generating word document using open xml.When multiple users click the generate button at the same time it is throwing an error”File is used by another process”.How to prevent it?please help!
Hi Santh,
When document is it failing to open? The template document? Or the data document?
In my example, the code opens the template document as read only, and the data document is loaded in a single operation, so I can’t see how either of those will cause the error that you are seeing?
Where are you generating documents? Are you generating documents for each user in their own separate directory?
-Eric
I mean, the document is not generated for concurrent users.when two users at the same time click on generate button it’s throwing the error I’ve mentioned for one user ,for the other user it’s generating.I’m generating the document from base template and bookmarks are replaced according to the user.I’ve also set the filename different for different users by taking the username from session.hope u now get wat I’m saying.
Hi,I’m generating document from base template.When generate button is clicked the base template is copied & the bookmarks are replaced by some text & the final document gets generated. The issue is when two user(or more than that) click the generate button at the same time ,either the document is not generated for both users and the page gets post back or an exception is thrown”file is being used by another process”.how to fix it?
Sorry, the question that I’m trying to get an answer to – when that exception is thrown for the second user, it is thrown when the second user is opening a specific file. What file is that?
sorry jus I got what ur trying to ask. I’m new to this so I know only a little. The template document is failing to open.when both users click to generate document at the same time ,only one copy of the base template is copied for one of the user and final document is generated.the other user is ended up with an exception.At sometimes no excepton was thrown for both users and no document was generated for both and the page gets postback. Even if its not possible to generate the document for both at the same time.I need a solution atleast to make one of the user to be shown “loading in progress” or any of that kind instead of showing error.Hope I made u clear with what’s the issue.TQ.
No prob, I remember when I was totally new to the .NET framework, and etc. There is a lot to learn!
My original code does not lock the template document – it opens the document in read-only mode. The dot net framework is just fine with two separate applications opening a document twice. I suspect that somehow you are opening the template document with the argument isEditable set to true. There are a number of ways in which you could modify the code to open the template document, but in any case, you need to make sure that you are opening it with isEditable set to false.
-Eric
Basically, I am creating the word xml document using Word ML. Only issue I have is I have to use office autmation to save the xml file to .doc extension.
Question I have is:
Can I use Office OpenXML to save the word xml file I have created to .doc?
Is there any other method besides using office autmation to save word xml do doc.
Thanks in advance for your help