Eric White
Forum Replies Created
-
AuthorPosts
-
Hi,
This is not adding a page break, but a section break. When you keep sections, as your code does, it keeps the section for the document.
You can use continuous sections for the document, which will keep the same formatting, but not include a page break. I am not sure of the details to change the section break for the entire document to be a ‘continuous’ section break, but it should be possible.
If not possible, it would be easy to go into the document after the fact, and alter the section breaks to be ‘continuous’ section breaks. It is a matter of 20 lines of code to do this.
In order to find out the changes you want to make, generate a document that includes the section breaks that force a page break, save it. Copy it, and alter the copy so that each section is a ‘continuous’ section break. Then use the Open XML SDK productivity tool to compare the two. This will identify the changes to the markup that you need to make.
Then you can write code to make the same changes to your result document, after DocumentBuilder is done doing its thing.
Look at screen-cast #13 in the following series:
-Eric
Also, take a look at this screen-cast:
There is the ListItemRetriever module in Open-Xml-PowerTools, which will return the list item (the actual numbering / bulleting) for any paragraph in a document. This module is used by the WmlToHtmlConverter module, which converts a document to HTML that is formatted with CSS.
What are you doing with the bulleted / numbered lists? What is your scenario?
Yes. Email me at ericwhitedev@gmail.com.
Best, Eric
Hi,
The main issue is that the copied range may contain ‘interrelated markup’, such as references to images, smart art, comments, and even footmarks / endmarks.
The best approach to copy and paste ranges is to use DocumentBuilder. Watch the screen-casts at the following link:
http://www.ericwhite.com/blog/blog/documentbuilder-developer-center/
Best, Eric
April 5, 2017 at 4:57 pm in reply to: c# Openxml – Unable to change the slide size to 16×9 ratio #4277Whatever you can do in the PowerPoint application, you can also do using Open XML.
Watch screen-cast #13 in the following series:
You may also be interested in the other screen-casts in that series.
Best, Eric
March 30, 2017 at 2:49 pm in reply to: How to read Form fields (ActiveX contol) values from word 2010 using OpenxmlSDK #4264Hi Prasad,
I’m not sure how the ActiveX controls store the data, but an easy way to find out is to use the Open-Xml-Sdk productivity tool to compare before/after versions of the document after you change values.
Best, Eric
Hi,
You can certainly embed custom XML files in an XLSX.
If you don’t want to convert to XML format, you could convert your file to base64 encoded ascii, and put that in a simple XML file. This would probably entail almost no work at all for you.
You can put the custom XML file in the root directory of the XLSX, or you can create your own subdirectory, and put the custom XML file there. It probably would be a little neater to put it into your own subdirectory.
When you save as PowerPoint XML Presentation, you are saving as “Flat OPC”.
Take a look at the following series of blog posts:
https://blogs.msdn.microsoft.com/ericwhite/2008/09/29/the-flat-opc-format/
One of those posts contains the precise code to convert a PPTX to flat OPC.
Hi Santosh,
This is a great question, and one that I don’t have an answer to off the top of my head.
I remember that when I wrote WmlToHtmlConverter, I spent a fair amount of time working on indentation for list items. I read the spec, of course, but there are also differences between strict and transitional. Further, there are interesting cases where there is a
as direct formatting on a paragraph, and then the paragraph refers to a style that also has a , so the direct formatting takes precedence. Frankly, in implementing that module, I simply looked at what Word was doing, and emulated that.
I am curious – have you looked at what WmlToHtmlConverter produces for your document? Does it indent the relevant paragraph per your understanding of the spec, or does it produce the same indentation as Word?
Best, Eric
March 22, 2017 at 11:26 am in reply to: Headers and footers dissapear when WmlComparer.Compare #4237Hello,
Yes, this is a limitation of WmlComparer. I implemented WmlComparer for a specific customer who had a specific need, and they did not require that WmlComparer deal with headers and footers.
If you think about it, it is not so simple. Of course, you have the situation where a header or footer contains differences, and would need to compare them, and include revision marks in them. But it gets more complicated than that – what if one document has a section break, and the other does not – then this needs to get recorded properly. If one document contains the property – keep the same header footer as the previous section, whereas the other document contains newly defined headers / footers, then this needs to get properly recorded. There are other edge cases in addition to these.
The main goal of WmlComparer was to compare the actual content of the documents themselves, where the content was defined as not being the headers/footers. The estimate to properly handle headers/footers was significantly higher, the customer did not need it, so the feature to consider headers/footers was cut, so WmlComparer does not include headers/footers in the new document.
I am just about to embark on a significant revision of WmlComparer. I am going to add ability to handle nested tables and text boxes, so I will consider again if there is some good option for sections/headers/footers. However, once again, the requirement is not there to handle headers/footers, so I probably cannot spend much time on it.
Best, Eric
Hi Thomas,
Yes, one would think that this would be easy.
However, PresentationML is significantly different from WordprocessingML, as their design goals are so different, so you can’t use PresentationML directly in a DOCX.
Further, I have done no work on converting PresentationML to a form that can be directly used in a DOCX. Someone else out there may have, but I know of no open source project to do so.
If I had your problem, the way I would proceed is to automate PowerPoint to save the presentation to another form. One of the options when saving in PowerPoint is to save each slide as a JPG or other graphic form. PowerPoint then creates a directory with a JPG for each slide in it. I would guess that there is an option when automating PowerPoint to do this, but I have never done so.
Once you get a number of images, one for each slide, it is then straightforward to insert those images into a DOCX.
Best, Eric
February 23, 2017 at 10:55 pm in reply to: Get comments from docx file and insert it to another #4195Hi Mohamed,
At the moment, there is not an open source project (afaik) that does a good job at this.
It is a great idea, though.
Best, Eric
Hi,
As far as I know, there is no open source API to do this.
You could use the WmlToHtmlConverter module in Open-Xml-PowerTools to convert DOCX to HTML/CSS, and then use some API to print it. I have no experience with using any APIs to print HTML/CSS, but I am certain it is possible. This would have the restrictions of WmlToHtmlConverter, in that it does not display page headers/footers.
Best, Eric
Hi Garth,
Yes, you are absolutely correct, Word is being “smart” when opening the file. The issue is that the actual layout of the table is sometimes just a set of desired configurations, and Word will do whatever it wants to.
The genesis of this issue is that it is quite possible to set up a table with invalid column widths with regards to content, and Word then does whatever it wants with the table.
I have seen this, but it has never become important to me to fully understand the algorithm that Word uses. Further, I think it likely that different versions of Word behave differently, but I don’t know.
Best, Eric
-
AuthorPosts