Eric White
Forum Replies Created
-
AuthorPosts
-
Hi Roger,
Unfortunately, I don’t have enough information from your description to understand where the problem is. PresentationML (and DrawingML) are pretty complex in the way that they inherit properties throughout the object graph, so I suspect you are affecting the object graph in some undesired way. But unfortunately, given current schedules, I don’t have time to debug the specific problem.
I suggest that you take the approach you mentioned, which is to always start with a presentation that contains the CC, and to not attempt to replace the data for a chart more than once. Once you generate a presentation/chart with the correct data, you can include it in another document using the DocumentBuilder module.
Cheers, Eric
March 24, 2016 at 12:19 am in reply to: Why new slides are not added to presentaion.xml sldIdList? #3257Hi,
You are taking the right approach – after adding the slides, you must GetXDocument on the presentation part, add the slide references, and call PutXDocument.
I think you have to look for a bug somewhere – dump out the XDocument immediately before you call PutXDocument, make sure that you are putting the content that you want to put.
Another thought – make a super small example that simply gets, changes, and puts the presentation part, make sure that you can do this operation in isolation. You need not create a valid presentation – just look at the results in the Open XML Package Editor Power Tool.
Typically, when I see the case where parts are not properly updated, it is because of not properly using the ‘using’ statement to deal with disposable objects, but because you are using GetXDocument and PutXDocument, you are bypassing that whole issue.
There must be a bug somewhere, I think.
PresentationBuilder was not designed to work at the sub-slide level. Code from within PresentationBuilder could certainly be re-used, but I haven’t done any analysis on that problem.
In general, when I need to insert a new slide (or any specific slide) I create a presentation that contains a single blank slide (or any specific slide) and then use PresentationBuilder to pull that slide into the generated presentation.
Because PresentationBuilder is a stable platform that you can use to shred and combine presentations, and your use case is somewhat different, where you want to pull specific DrawingML objects from one slide to another, I recommend that instead of modifying PresentationBuilder, you create a new module (perhaps call it PresentationContentMover or some such). I don’t recommend attempting to combine this functionality with the functionality of PresentationBuilder.
With this new module, perhaps the way to approach it is that you pass two open presentations to it, find the source object, find the slide in the destination, specify the location in the slide, and then move the content. You could certainly reuse some of the code around moving related content in PresentationBuilder.
Before I released PresentationBuilder, it put it through extensive, comprehensive testing to make sure that a) it will not crash, b) it always produces a valid Open XML presentation, and c) it does the right thing. We should be hesitant to add new functionality to this module, and instead, create a new one.
Cheers, Eric
Hi,
What you are seeing is expected behavior. When processing content in slides, your code must take into account that text might be split into multiple runs.
The same situation exists in WordprocessingML – you must be prepared for a paragraph to contain multiple runs, and you do not know where splits will or will not come.
This, for instance, presents challenges when attempting to search and replace text. See the screen-casts at the following link, which explains how I address this issue when using a regular expression to search for text, and that text might be split across runs.
Cheers, Eric
March 21, 2016 at 1:23 pm in reply to: How to create Header and footer with VB.net in Open XML Word document #3220Have you watched the screen-casts in the following series:
In particular, you will be interested in #8 on Sections, Headers, and Footers.
I estimate that I have spent >500 hours dealing with this bug – first figuring why I was seeing what I was seeing, work-arounds to mitigate, re-writing System.IO.Packaging, and etc. It is nasty, costly bug. I didn’t have any deep desire to re-write System.IO.Packaging, but I hated this bug so badly I would do it.
My specific problem is that I needed to test the Open-Xml-Sdk and Open-Xml-PowerTools on ~500,000 files. Done serially, this takes >50 hours. So I built a system that spreads testing out to many quad-core machines. But because I was running the same executable multiple times, I ran into this bug.
In the end, the only way I could mitigate this (and it was a must, in order to get my job done) was to put the Open XML processing into a separate process. If you need to run more than one process on a single machine, you need to tweak the version number so that the assembly has a unique strong name.
I used MSMQ to communicate back and forth to the separate EXE / process. In the end, I got it to be rock-solid stable. I could test 500,000 documents in two hours.
Given that you MUST make use of WindowsBase elsewhere, you could build your Open XML processing into a separate EXE that is built with the new System.IO.Packaging. You could write code to see if the process is running, and start it if not, and then use MSMQ to communicate to the process, or just pass input/output file names as arguments to the EXE. I hate to have to suggest this, but if you need rock solid stability, IMO, this is how you could get it.
Ok, no worries, I added issues to both Open-Xml-Sdk and Open-Xml-PowerTools on GitHub, so this will ensure that this gets addressed.
Please let me know if that hotfix fixes things for you. Were you seeing the symptoms of hanging, or of throwing of exceptions?
To use git from within PowerShell, you need to add the git bin directory to your path.
I like to install gitextensions – it installs all the things you need to use git. The installer is not the most intuitive – it puts up dialog boxes under windows, and you have to respond to the dialog boxes in order to finish the installation.
https://sourceforge.net/projects/gitextensions/
Don’t underestimate the time necessary to learn git fully. Git solves a very hard problem – large number of distributed developers working on the same project, and the complexities (while sometimes hidden) are needed in order to facilitate this scenario. You may find the following two videos of interest:
Cheers, Eric
I think you are looking for the Open XML Package Editor PowerTool for Visual Studio, not Open-Xml-PowerTools. This is what enables editing an Open XML document in Visual Studio. I believe that it works with VS 2015, but perhaps it doesn’t work with community edition. I haven’t tried.
https://visualstudiogallery.msdn.microsoft.com/450a00e3-5a7d-4776-be2c-8aa8cec2a75b
I recommend taking a quick look at the screen-cast series on Open XML, which contains several screen-casts on using tools.
Screen-cast Series: Introduction to Open XML
Your point about StackOverflow is noted. Just FYI, unless something changes, I anticipate primarily attending to the forums here at EricWhite.com.
March 17, 2016 at 12:07 am in reply to: Extract all charts and SmartArts from the Word Document. #2573Question #1: I have never seen Word put a chart, smartArt, or image into a run with other content. The Open XML standard does not prohibit this, and Word will process a run just fine if it contains both an image and text. However, I have never seen Word write this markup. It probably is an OK assumption for your program.
Question #2: Are you anticipating processing documents that contain tracked revisions? One option is to use the RevisionAccepter module to first accept tracked revisions, and then process the document. In general, are you making use of content controls? You might find content controls in a document, such as to contain a TOC or a page number in a header/footer, but these probably are not the paragraphs that will contain the charts / smartArt / images that you are interested in. If you are not using content controls for introducing metadata into your document, then you can probably ignore them. You probably would want to check to see if there are any content controls in the document (other than expected CCs, such as for TOC) before processing it.
It is worthwhile to scan the w:p and w:r elements in the standard, and note all of the child elements of each.
Hi Prince,
No, PresentationBuilder only helps with splitting apart and joining presentations.
At this point, if you want to convert each PPT slide to an image, the best way is to automate PowerPoint, and let it do the work.
There may be commercial products out there that can help with this, but I have no experience with them.
I have the idea that if we could generate very nice HTML/CSS for PPTX, then we could use the HtmlToWmlConverter to import that content with high fidelity into DOCX. However, I haven’t started on PmlToHtmlConverter, and it is not in my current plans.
Wish I had a better answer for you, but I don’t.
Cheers, Eric
March 16, 2016 at 12:36 pm in reply to: Extract all charts and SmartArts from the Word Document. #2564Hi Prince,
You are correct – DocumentBuilder works at the granularity of a paragraph. It doesn’t have facilities to break out a run in a paragraph, and do something with it.
It is possible to write the code to directly do this, but it isn’t trivial.
Unfortunately, I don’t know of any samples or documentation that I can point you to. In general, you can take the approach:
- Take a copy of the document before you have deleted the the chart or smartArt
- Take another copy, open in Word, modify the content by deleting one or the other, save
- Use the Open XML SDK Productivity Tool to compare the two, and make detailed notes on all the changes you need to make.
- Write your code to make the same changes. Validate your code by comparing with the second copy in the above procedure.
It is certainly possible, but not trivial. You will need to write the code to manipulate the markup, parts, and relationships per my previous answer. Probably is about 30-50 lines of code, so not too bad.
Hi,
This is certainly possible.
You may want to use content controls to delineate the area where users will enter their answers. You can protect the remainder of the document from modification, so that users can only change the document inside the content controls. You can tag the content controls so that you can more easily extract the information. There are a number of code samples out there that show how to retrieve the contents of content controls, including by tag.
Cheers, Eric
-
AuthorPosts