Eric White
Forum Replies Created
-
AuthorPosts
-
Hi Bob,
I have done no work in this area. I don’t have anything in Open-Xml-PowerTools to help with this. It is a great idea, though.
Best, Eric
Hi,
Currently this is not a feature of the WmlToHtmlConverter. I have contemplated this change, but it hasn’t risen to the top of the list.
Generally, such changes get added when I find someone to ‘sponsor’ such a change. Are you interested in discussing this?
Best, Eric
Hi Kim –
Sure – my email is eric@ericwhite.com
You can connect with me on linkedin, which is a convenient way to get in touch. Sometimes my spam filters block real people. I’ll watch for your mail.
Best, Eric
Hi Manu,
This is a big problem in Word. The w:lastRenderedPageBreak tags are not inserted in the right place. You have identified two places, but there are several others. This is a bug in Word, but I am not hopeful of this bug being addressed by Microsoft.
The only solution to this would be to write a layout engine, which would be a massive undertaking.
The gist of this is that you cannot rely on the position of the w:lastRenderedPageBreak.
Best regards, Eric
This is definitely something that I’ve discussed over the years, but has never risen to the top of the heap. I don’t know of any open source tool to convert PPTX to HTML, although I would not be surprised if there are commercial ones.
It sure would be a fun project. And given the HtmlToWmlConverter module, it would enable snagging content from PowerPoint and injecting it into Word.
The node select for the table is relative to the node selected for the outer repeat section. This would mean that you would have to have XML that looked like this:
<Orders> <Order> <Orders> <Order>
Which I would guess is not what you have. For the select for the table, be aware of what the current node is, which will be based on the enclosing repeat.
Make sense?
- This reply was modified 8 years ago by Eric White.
The basic problem is that the Open XML markup itself does not properly track merging / unmerging of cells. Word itself will not track these changes. If you turn on tracked changes, and merge / unmerge a cell, you will get a modal dialog from Word indicating that changes will not be tracked.
It is possible to compare (using Word) two documents that have merged cells, and Word sometimes does an OK job of this, but other times it totally messes it up.
Given my schedule and budget for WmlComparer, I decided to not attempt to generate proper markup for merged cells (and am not even sure it is possible). Therefore comparing tables that contain merged cells is not a feature of the module.
It is possible to enhance this module such that if two tables have identical structure, i.e. same number of cells on each row, same merging, etc., then could do deltas in the merged cells. This is not a trivial project, but not super difficult. It is important that it be done in the proper structural way – to fit into the existing infrastructure for dealing with tables and whatnot. My estimate is that it should take a day or two. It is possible that I could look at this in January.
Best, Eric
The correct approach is to first apply DocumentBuilder to your source document, extracting the desired content. DocumentBuilder works at the block-level, not run-level, so you can only extract complete paragraphs and tables from the source document – this is a built-in limitation to DocumentBuilder.
You can do this in memory, of course.
Then insert your extracted content at the correct location in the template document, as appropriate.
I don’t have an examples that show directly how to do what you want to do, but this is all certainly doable – extract the content into a new WmlDocument, and then insert that content using the approach shown in the example that you are looking at.
I am not completely clear on what the issue is with your markup. Are you attempting to change the markup for existing items where the size is not what you desire?
In any case, you must not mix using XElement (LINQ to XML) with the strongly typed OM for the Open-Xml-Sdk.
The best way to research the markup that you need to change is by creating two copies of a document, changing one slightly, and then using the Open-Xml-Sdk Productivity Tool to compare the two, highlight the differences.
In the following screen-cast series, you can watch screen-cast #13:
Best, Eric
Currently, the productivity tool has not been open sourced. I don’t know what the plans are for it, but I am not working on it.
Hi,
I think that there is something else causing this problem, not MarkupSimplifier.
You are getting a failure in parsing the xml in the /word/styles.xml file, not the main document part, which is what NormalizeXml operates on. It looks as though your styles.xml file maybe doesn’t have anything in it, which could be caused by any of a variety of things, but probably not by MarkupSimplifier, not to say that MarkupSimplifier doesn’t modify styles.xml – it might, I can’t recall, but this is not the first place I’d look for this bug. I’d look for what is writing to styles.xml, and see why the XML parser is failing on it.
You can also manually examine the styles.xml file using the Open XML Package Editor Add-In for Visual Studio. That may provide a clue as to why the parser is failing on reading the styles.xml part.
Best, Eric
Hi Garth,
Can you please upload somewhere the smallest possible html file that exhibits this issue? Would be helpful for you to be very explicit about what you expect to see.
I will add this HTML file to the test files, add an XUnit test for it, and research. Probably is easy to fix.
Thanks, Eric
Hi,
One constraint about DocumentBuilder is that it merges styles, and takes the first style in the list of source documents. If you want to keep both styles in both documents, then you will need to rename the style in one of them. You can write a small program to do this – not hard. Need to change the style name in the document.xml part, and also in the styles.xml part.
Be aware that if you have other styles based on the style you are renaming, then the reference from the derived style also needs to be updated. But most probably you are not facing this situation.
At the time I designed DocumentBuilder, I made this decision – it more or less meets peoples needs, but occasionally, we will find an issue such as this. Renaming styles before merging should solve your issue.
If you want to see what needs to be done to rename styles, use the approach shown in screen-cast #13 in the following series:
Cheers, Eric
Hi Pravi,
When you say:
>> I have to admit that If I try to validate the docx using productivity tool, there are at least 49 errors and again, struggling to tie it back to the actual document and its content..
Are you referring to the input document? One constraint about DocumentBuilder is that it is not prepared to process documents that have errors (with a few common exceptions that Word processes properly, and generates sometimes).
In general, spend some effort in making sure that your input documents are perfect – this is my advice.
Best, Eric
Hi,
I am uncertain from your description what exactly was going wrong, and what you had to do to fix it.
In general, I believe that it is possible to do everything that you need to do using the Open-Xml-Sdk-JavaScript. I know that we are deficient in example code in this area. I would love to create more example code, but earning a living sometimes prevents this. But one good thing – I have a big Open-Xml-Sdk-JavaScript project coming up in a couple of months, so it will be good to get back into it. I plan on moving it to GitHub at the same time.
In general, if your fix works, and if the resulting file validates per the Open-Xml-Sdk productivity tool, then I would say go for it.
Best, Eric
-
AuthorPosts