Eric White
Forum Replies Created
-
AuthorPosts
-
Hi Garth,
At the moment it is not possible.
Adding page numbers (and headers / footers) would require writing a layout engine. AFAIK, right now there is no open source layout engine. Without a layout engine, one can’t do pagination. Without pagination, there is no way to determine the page number, etc.
In the document, there are w:lastRenderedPageBreak elements. Unfortunately, there are ‘bugs’ in Word that cause these to be put into the wrong places in a number of circumstances. I put quotes on ‘bugs’ because Word does not guarantee the correct placement of these elements, so Microsoft doesn’t consider these to be bugs, I believe.
So currently, there is no good path to do this.
Best, Eric
Hi Manu,
The list items are never stored in the content. They are always calculated. A change in list numbering does not affect this. You can detect that the list numbering has changed, but if you actually want the deleted list items, it is pretty complicated, and not at your fingertips.
I guess that I would need to hear more about the problem you are trying to solve. What is your user scenario?
Best, Eric
Hi Manu,
I am not quite clear about your question – need more info.
With regards to the deleted listnum, what exactly are you referring to? In Word itself, we see the list items, which is the displayed representation of any paragraph that has list numbering. In the open xml markup, there is the listnum attribute for every paragraph that has list numbering. You can find this listnum attribute in the paragraph itself, or in the styles part for the specific style.
The styles part contains tracked revisions, including changes to styles where numbering has changed.
With regards to calculating the list item for any paragraph (what the ListItemRetriever.cs module does), processing list items is pretty complicated. It took several tries before I found every numbering bug in ListItemRetriever.cs. But ListItemRetriever.cs presumes that there are no tracked revisions. I have not yet contemplated the problem of determining list items in a document that contains tracked revisions.
Please give me more information with specifics, and I’ll be happy to help 🙂
Cheers, Eric
February 8, 2017 at 11:59 am in reply to: Unable to set multiline while editing Word ContentControl using SDTElement #4142Hi,
Looking at your code, nothing looks incorrect to me, but I normally use LINQ to XML to access and manipulate content instead of the strongly-typed object model, so I may be missing something.
In general, if you need to generate documents from data, I recommend using the DocumentAssembler module in Open-Xml-PowerTools.
http://www.ericwhite.com/blog/blog/documentassembler-developer-center/
Watch those videos – you can do document assembly without writing code.
If you go down that path, and if you continue to use the strongly-typed object model, I recommend closing and opening the document any time you switch between using LINQ to XML and the strongly-typed object model.
Best, Eric
Hi,
I’m afraid that I haven’t much to do these days with the NuGet package and dependencies. I didn’t create those, and haven’t used them. I’m sure they are great; I just don’t know much about them.
Sorry I don’t have a better answer for you, but thought I’d let you know…
Personally, I work directly from the github repos, and I’m afraid I am lazy and don’t try out the other various ways to use the Open-Xml-Sdk and Open-Xml-PowerTools. But I’m sure you can get it working 🙂
Cheers, Eric
I don’t have quite enough information to help with this. Can you post the markup for the entire paragraph, and where you think it wrong? Feel free to post a doc online somewhere if you want me to examine the markup.
There is an issue I see with your description – tabs are not ‘additive’ – they are directly positioned.
There is another issue which is that Word has a fairly complex algorithm for text positioning, including that if a word extends past a tab position, then Word automatically ‘creates’ a tab based on specific settings in the Open XML document. There can be issues associated with transforming Open XML (rendered by Word) to HTML (rendered by browsers) where font metrics are ever so slightly different, and something that fits within one tab as rendered in Word extends beyond the tab, rendered in the browser, causing text to ‘shift over’.
The WmlToHtmlConverter project was never intended to do a pixel by pixel rendering of the document. It isn’t possible without writing a layout engine that is compatible feature-for-feature with Word, and this would include using a Word compatible text renderer (which browsers are not). There is enough mismatch between the layout system of Word and the layout system of HTML that you just have to do the best you can, and then don’t worry too much about it.
The intent of WmlToHtmlConverter is to give a pretty good representation of the document, but it can’t be perfect.
With regards to your specific case, it is important to understand exactly what is going on, and it is possible that the “extra” node is adjusting the ‘tabbing’ system (which calculates spans with a given width) in an unexpected way.
Best, Eric
Yes, you are right. That is a current limitation of the module. I have tentative plans to work on that module, enhancing it to support text boxes, and nested tables. However, there is no schedule for this right now.
Best, Eric
Sorry, I’m still not clear. Are you saying that the behavior of Word is incorrect? Or the behavior of Open-Xml-PowerTools is incorrect? Which module?
Best, Eric
Hello Manu,
I am not certain exactly what you are referring to. Are you having issues with one of the modules in Open-Xml-PowerTools, such as WmlToHtmlConverter?
Hi Michaela,
It certainly is possible to enhance DocumentBuilder so that it handles external links. This modification can fit into the existing structure of DocumentBuilder with no issues. It probably is about 4-6 hours of work, including adding XUnit tests. You can give it a try to make the changes.
If you want me to make those changes, I am available on a consulting basis to do so.
Best, Eric
Hi Michaela,
I have confirmed, yes, the DocumentBuilder does not handle external links. There are security issues associated with this, as well as technical issues. Actually, I think that from a security perspective, the behavior of DocumentBuilder should be to throw an exception when it encounters one of these links.
I would love to hear your scenario, where external links are important when using DocumentBuilder. If you have time, would you write a paragraph or two as to why they are important?
Best, Eric
Hi Michaela,
If I remember correctly, I didn’t address links to external documents. Such links are problematic at best, and I could not think of a scenario for DocumentBuilder where such links are important. I confess that I didn’t even think about what to do with those links, so I’m not surprised the code is broken.
I’ll take a quick look, and see if there is an easy fix.
Best, Eric
Hi,
Can you please post a test document (on dropbox or some such) and provide a link?
DocumentBuilder should propagate all images into the assembled document, and if it does not, that would be a bug.
Best, Eric
Hi Manu,
Whenever dealing with fields, and certainly nested fields, I use the FieldRetriever.cs module in Open-Xml-PowerTools. This module retrieves (in nested form) all fields in a DOCX. Further, auxiliary information is in the data structures returned by FieldRetriever, such as object references to the various markup elements in the main document part. You can use those object references to alter / query the document.
The FieldRetriever module serves two purposes – it provides nicely packaged functionality to C# developers to understand the fields in a document, including nested fields. Further, it provides a reference implementation regarding the correct approach for retrieving the instrText and the representation of fields in a document.
Please watch screen-casts #14 and #15 in the following series:
Cheers, Eric
Hi Manu,
The exact format of the text of field codes is defined in the Open XML standard. You must be prepared for those beginning and ending spaces.
I believe that there is a grammar for the text of fields, if I remember correctly, so should be possible to build a small parser for it.
-Eric
-
AuthorPosts