Screen-Cast Introduction to DocumentBuilder 2.0, and new DocumentBuilder Resource Center

July 7, 2011 at 7:29 am · Filed under Open XML, WordprocessingML

I’ve put together the first of three screen-casts that discusses DocumentBuilder 2.0 in depth. This first screen-cast shows how to download, build, and run DocumentBuilder. In addition, it walks through one scenario of interrelated markup, and shows how DocumentBuilder solves the issues around interrelated markup.

In addition, I’ve put together a DocumentBuilder Resource Center, which lists all the content on DocumentBuilder 2.0. I plan on putting together a number of blog posts and screen-casts about DocumentBuilder over the next two months, and that page will be where I will aggregate links to all of the DocumentBuilder 2.0 content.

The following screen-cast is a bit long – 20 minutes – but it contains important information for developers who want to know how DocumentBuilder works.

Shows how to build and run DocumentBuilder, and walks through one scenario of interrelated markup, and shows how DocumentBuilder deals with that markup.

Permalink

!!!

14 Comments »

Maria Rivera said,

July 14, 2011 @ 10:41 pm

Hello,

I’m new to OpenXML and C#… I usually code using VB. I’m working on a project where I need to build documents based on user input. Document Builder is a life saver for me since I have an agressive dealine. I will be capturing data from the user through a web app. then based on their response, build a document. I have several document sources which I will use the piece the target document together. I would like to pass a list of the documents that I would like to use to create the new document. Do you have an example I could use to do that?

The idea is to give administrator users access through the web app to upload the source files and update a table with a list of rules that I will use to create the list of source files on the fly to pass to Document builder to create the new document for the end users. Example, if Service = x then I need this source document to be added to the final document, if service = y, then add this source document to the source list, etc.

Does that make sense?…

Any help will be greatly appreciated…

Thank you and Regards,

-Maria
Eric White said,

July 15, 2011 @ 7:05 am

Hi Maria,

It sounds like an interesting problem. I have a question or two about your description of your problem. Are your source documents always WordprocessingML documents? Or are will you be composing documents of other sources such as HTML?

I am not sure what you mean by an example of passing a list of documents that you would use to create the new document, in that this is the way that DocumentBuilder always works. Is there something different about your scenario?

If you want to describe your scenario in more detail, feel free to send mail directly to me at eric at ericwhite.com – I’m always happy to give my opinion about best approaches for Open XML development.

-Eric
Dirk said,

July 19, 2011 @ 11:23 am

Hi Eric,
DocumentBuilder seems to be a hit.
Do I also need Windows Powershell and PowerTools for Open XML to be installed? And if so which version would be required?
I’m running Visual Studio 8 and have Microsoft Office Open XML SDK 2.0 installed, is that suitable to work with DocumentBuilder 2.0?
Thanks
Dirk
Eric White said,

July 19, 2011 @ 2:31 pm

Hi Dirk,

You don’t need Windows Powershell and PowerTools for Open XML in order to use DocumentBuilder 2.0. You do need the Open XML SDK 2.0. I believe that DocumentBuilder will work just fine with Visual Studio 2008, although you will need to put together the solution / project, as the project that comes with DocumentBuilder is for Visual Studio 2010. I am pretty sure that I did not use any language or framework features of C# 4.0 or .NET framework 3.5.

Glad DocumentBuilder is interesting to you. It has been a super-fun project. More to come…

-Eric
Jim said,

July 27, 2011 @ 2:03 pm

Not seeing the screencast content….
Eric White said,

July 27, 2011 @ 3:48 pm

Hi Jim, are you not able to see the video in the blog post? Do you see the videos if you go to the DocumentBuilder resource center?

-Eric
Dirk said,

October 21, 2011 @ 1:04 pm

Hi Eric,
i have learned that it is possible to build documents, add paragraphs and fill content controls with content from other WordProcessingML-Files. Instead of taking a source document as a whole or grabbing certain paragraphs, is it also possible to extract content from fields or bookmarks and insert it somewhere (e.g. into content controls) in the target document? Thanks in advance.
Dirk
Eric White said,

October 30, 2011 @ 7:06 pm

Hi Dirk

It is possible. First of all you need to write code that can determine the start child element and count, i.e. a DocumentBuilder Source object – this becomes your source. You then use the technique introduced in this video:

Advanced use of DocumentBuilder 2.0

That video shows how to insert arbitrary content (a ‘labeled’ source) into arbitrary spots into the document being built.

-Eric
Dirk said,

October 31, 2011 @ 7:52 pm

That’s good news. Thank you Eric, I will take a closer look at it.
Dirk
arbi said,

November 4, 2011 @ 6:54 pm

I was wondering can i make document with document builder containing styles, header, footer, and images. I want to be able to give font size, color and be able to modify images, text, paragraph, and also set margins and so forth in code. Is this all possible with documentbuilder.
Eric LAURENT said,

July 9, 2012 @ 2:20 pm

Hi Eric,
Thank you a lot for all your posts about OpenXML and particularly for these:

http://msdn.microsoft.com/fr-fr/library/ee922775.aspx#odc_Office14_ta_WorkingWithNumbering_MarkupLinkedToStyles

http://msdn.microsoft.com/library/ee361919(office.11).aspx

http://blogs.msdn.com/b/ericwhite/archive/2009/02/16/finding-paragraphs-by-style-name-or-content-in-an-open-xml-word-processing-document.aspx

And thank you for your screen casts.

I’ve got a question for you, because after my readings, I’m convinced that you are probably the best person to answer me.
In your opinion, what is the best way to do this? :

Actually, I’m in an internship and I have to develop a solution to merge different word documents. My first solution works fine, but it just adds the documents after the last added. I need to enhance it in order to take care of the numbering and the styles of the titles.

This is the user interface:
There are two tree views. The right tree view offers an access to the sources documents. The left tree view allows the user to create an arborescence by dragging and dropping documents from the right tree view. This arborescence constitutes the content of the final document. The specific thing is that the tree view presents different levels. Each level corresponds to a specific style. The first level corresponds to Heading1, the second corresponds to Heading2…

Illustration:
Left tree view:
Doc1
\__Doc2
\__Doc3
Doc4

Correspondents “list level”:
Heading1
\__Heading2
\__Heading3
Heading1

It means that for the “Doc1”, the styles and the numbering of the paragraphs will stay the same if the “list level” starts with Heading1.
For the “Doc2”, the styles will be incremented. The Heading1 styled paragraphs will become Heading2, and the Heading2 will become the heading3… The numbering will be continued in function of the last number of Heading2 styled paragraph from the “Doc1”.
For the “Doc3”, the styles will be incremented twice. The Heading1 styled paragraphs will become Heading3, and the Heading3 will become the heading4… The numbering will be continued in function of the last number of Heading3 styled paragraph from the “Doc1”.
For the “Doc4”, the styles will start at Heading1, and the numbering will be continued in function of the last number of Heading1 styled paragraph from the “Doc1”.

I’m sorry if it’s not clear, I’m not fluent in English, and it’s hard to explain it.

Concrete examples with three simple documents:
Each document contains a paragraph with Heading1 style, and a text with Normal style.
Example: “Doc1” contains
TITLE DOC1 (Heading1)
Text Doc1 (Normal)

First case:
Left tree view:
Doc1
Doc2
Doc3

Result expected after merging:

1. TITLE DOC1 (Heading1)
Text Doc1 (Normal)
2. TITLE DOC2 (Heading1)
Text Doc2 (Normal)
3. TITLE DOC3 (Heading1)
Text Doc3 (Normal)

Second case:
Doc1
\__Doc2
\__Doc3

Result expected after merging:
1. TITLE DOC1 (Heading1)
Text Doc1 (Normal)
1.1. TITLE DOC2 (Heading2)
Text Doc2 (Normal)
1.1.1. TITLE DOC3 (Heading3)
Text Doc3 (Normal)

Third case:
Doc1
\__Doc2
Doc3

Result expected after merging:
1. TITLE DOC1 (Heading1)
Text Doc1 (Normal)
1.1. TITLE DOC2 (Heading2)
Text Doc2 (Normal)
2. TITLE DOC3 (Heading1)
Text Doc3 (Normal)

I expect that it is clearer for you.

Thanks to your experience, do you think that it is possible to realize that kind of solution? (Just considering the merging with styles and numbering management, not the tree view…)
If you think that there is a way to realize that, what is for you the best way?
What are the requested tools?
What is the level of difficulty /5?

Thank you a lot for if you read my comment so far, and thank you for your consideration.

Eric LAURENT
Eric LAURENT said,

July 9, 2012 @ 3:11 pm

Correction for my previous comment:
spaces have been removed.

Illustration:
Left tree view:
Doc1
\__Doc2
……\__Doc3
Doc4

Correspondents “list level”:
Heading1
\__Heading2
…… \__Heading3
Heading1

Second case:
Doc1
\__Doc2
…….\__Doc3

Sorry.

Eric
Eric White said,

July 9, 2012 @ 5:11 pm

Hi Eric,

I understood what you meant – your description was clear.

You are attempting a non-trivial project. It is doable, but I would not attempt it in the fashion you are considering, that is, to merge styles, and adjust styles as necessary. That path will result in a bit of a mess.

Depending on your other requirements, i.e. how much formatting do you want to preserve, and etc., I would approach this project as follows:
- First transform each document into a far simpler XML document that encapsulates the exact information that you need to capture in the merging process. You want an abstraction that will be closer to the concepts in your merging process. You may even want to transform into hierarchical XML.
- I think there is only one way to approach this, which is through recursive pure functional transforms. This will, however, require that you are up on your functional programming techniques, not a trivial learning project. Also see this blog post on Transforming WordprocessingML to Simpler XML for Easier Processing.
- Then, after transforming each word document into a simpler form of XML, you can do the merging, and more easily adjust the levels of each node, as appropriate to the location the node will end up.
- Finally, you would write a transform back to valid WordprocessingML.
The advantage to taking this approach is that it breaks the project into more manageable chunks. Once you have defined your intermediate format, you can then do the transform form WordprocessingML to the intermediate format and get that transform to be correct before proceeding on. Then you can do the merging of two documents in the intermediate format, and see the appropriate results. Finally, you can code the transform back to WordprocessingML as a discrete task.

It is a very interesting project you are embarking on. Don’t underestimate the difficulty of it.

-Eric
Eric LAURENT said,

July 10, 2012 @ 9:39 am

Hi Eric,
Thank you very much for your answer, I will study your approach.
I do not doubt the complexity of the project, I just doubted its feasibility. I had read in “Working with Numbered Lists in Open XML WordprocessingML”: “Numbering in Open XML WordprocessingML is complex, and justifiably so. Understanding When numbering is important to accurately extracting the text of a document. In addition, use numbering Generating Documents That markup is a Powerful Document assembly approach in solutions.”

Thank you again,
Sincerely,

Eric LAURENT

RSS feed for comments on this post · TrackBack URI

Eric White's Blog

Screen-Cast Introduction to DocumentBuilder 2.0, and new DocumentBuilder Resource Center

14 Comments »

Maria Rivera said,

Eric White said,

Dirk said,

Eric White said,

Jim said,

Eric White said,

Dirk said,

Eric White said,

Dirk said,

arbi said,

Eric LAURENT said,

Eric LAURENT said,

Eric White said,

Eric LAURENT said,

Leave a Comment

Forums

Developer Content

User

Blog TOC

Archives

Categories

Search