List Numbering on Merged Docs
Home › Forums › Open-Xml-PowerTools › List Numbering on Merged Docs
Tagged: Merging List Numbering
This topic contains 23 replies, has 4 voices, and was last updated by Thierry 7 years, 12 months ago.
-
AuthorPosts
-
September 23, 2016 at 7:44 pm #3827
I found my issue, now to work out the solution.
At the bottom of the numbering.xml part I have found that the lists that are not resetting to 1 correctly are mapping to the same abstractNumId:
<w:num w:numId="10"> <w:abstractNumId w:val="9"/> </w:num> <w:num w:numId="11"> <w:abstractNumId w:val="9"/> </w:num> <w:num w:numId="12"> <w:abstractNumId w:val="12"/> </w:num> <w:num w:numId="13"> <w:abstractNumId w:val="12"/> </w:num> <w:num w:numId="14"> <w:abstractNumId w:val="3"/> </w:num> <w:num w:numId="15"> <w:abstractNumId w:val="3"/> </w:num> <w:num w:numId="16"> <w:abstractNumId w:val="4"/> </w:num> <w:num w:numId="17"> <w:abstractNumId w:val="4"/> </w:num>
If I create new abstractNum nodes, and update all guids to new unique values and remap one of each of these pairs to the new version then it resets as expected.
EDIT:
So it turns out that my document is correct until after I open it and say OK to the “Update fields” modal dialog that I enabled to refresh the ToC.After my program is finished with the document, the numbering part has unique numbering IDs and unique abstract numbering IDs that match to the document part numbered lists as I would expect.
When I say OK and then save and close the document to see what changed in the XML I can see that the numbering part has been changed to have duplicate abstract numbering IDs as shown above.
- This reply was modified 8 years, 1 month ago by thandley.
September 23, 2016 at 9:27 pm #3829I figured it out!!
It was not related to the ToC update modal as I last suspected.
The numbered list pairs that were misbehaving each had matching <w:nsid w:val=”X”/> nodes in their abstractNum definition nodes.
So I think I can just do a post process to make all nsid’s unique and I should be good.
EDIT:
I was able to solve this with the following codeprivate static void FixNumberingPart(MainDocumentPart mainPart) { List<AbstractNum> nums = mainPart.NumberingDefinitionsPart.Numbering.Descendants<AbstractNum>().ToList(); foreach (AbstractNum num in nums) { bool isDuplicate = nums.Count(instance => instance.Nsid.Val.Value == num.Nsid.Val.Value) > 1; if (isDuplicate) { Console.WriteLine($"Found duplicate Nsid = {num.Nsid.Val.Value}"); num.Nsid.Val = HexBinaryValue.FromString(GetRandomHexNumber(8)); Console.WriteLine($"New Nsid = {num.Nsid.Val.Value}"); } } } private static readonly Random Random = new Random(); private static string GetRandomHexNumber(int digits) { byte[] buffer = new byte[digits / 2]; Random.NextBytes(buffer); string result = string.Concat(buffer.Select(x => x.ToString("X2")).ToArray()); if (digits%2 == 0) { return result; } return result + Random.Next(16).ToString("X"); }
- This reply was modified 8 years, 1 month ago by thandley.
September 23, 2016 at 9:59 pm #3831Arrgh… that all worked on my simple test documents. But when I tested it on real world use cases that are much more complex then it failed.
I suppose those Nsid values are referenced in other places.
At least I’m on the right track.
September 26, 2016 at 9:44 am #3835Hi Trevor,
I did what you suggested and that stopped the error but my numberings were back to continuing across the page instead of resetting to 0 and I got the same result as if I was not running any code beyond the standard BuildDocument. However what you said made me add more debugging and inspect the OpenXML more etc to see if I was updating other numbers more than once etc. I made some key discoveries:
- Abstract num ids appear to start at 0 but num ids start at 1.
- Even after allowing for starting at 0 in my algorithm, my input file had 2 abstract nums but if I had a 3 times output then I had 3 abstract nums instead of my expected 6
- Abstract nums have an NSID that is meant to be a hex unique identifier https://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.nsid%28v=office.14%29.aspx
That last point made me change to taking the hex nsids and adding to them based on the highest abstract num list to get a unique nsid which appears to work for my testing
My code is now:
public bool MergeDocs(List<InterchangeableWordProcessingDocument> inputFiles, string outputFile, Guid busUnitID) { if (inputFiles == null) { logger.LogDebug("No files to merge."); return true; } try { logger.LogDebug("Starting creating merged label doc " + outputFile + "\nCombining " + inputFiles.Count() + " document(s) into single doc."); List<OpenXmlPowerTools.Source> sources = new List<OpenXmlPowerTools.Source>(); int highestListNumbering = 0; int highestAbstractListNumbering = 0; foreach (var inputFile in inputFiles) { //Sometimes merge puts start of next page onto end of previous one so prevent //Seems to cause extra blank page when there are labels so don't do on labels pages if (inputFile.DocType == DocType.Letter) { using (var wordDoc = inputFile.GetAsWordProcessingDocument()) { var para = wordDoc.MainDocumentPart.Document.Body.ChildElements.First<Paragraph>(); if (para.ParagraphProperties == null) { para.ParagraphProperties = new ParagraphProperties(); } para.ParagraphProperties.PageBreakBefore = new PageBreakBefore(); //http://www.ericwhite.com/blog/forums/topic/list-numbering-on-merged-docs/ //Numberings should be unique to each page otherwise they continue from the previous //Keep track of how many we have so we can add on to always have a unique number var numIds = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList(); logger.LogDebug("Found " + numIds.Count + " num ids."); foreach (var numId in numIds) { logger.LogDebug("Changing num id from " + numId.Val + " to " + (numId.Val + highestListNumbering)); numId.Val += highestListNumbering; } var styleNumIds = wordDoc.MainDocumentPart.StyleDefinitionsPart.RootElement.Descendants<NumberingId>().ToList(); if (wordDoc.MainDocumentPart.StyleDefinitionsPart != null) { logger.LogDebug("Found " + styleNumIds.Count + " style num ids."); foreach (var styleNumId in styleNumIds) styleNumId.Val += highestListNumbering; } if (wordDoc.MainDocumentPart.NumberingDefinitionsPart != null) { var numberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<NumberingInstance>().ToList(); logger.LogDebug("Found " + numberingNumIds.Count + " numbering num ids."); foreach (var numberingNumId in numberingNumIds) { logger.LogDebug("Changing num id from " + numberingNumId.NumberID + " to " + (numberingNumId.NumberID + highestListNumbering)); numberingNumId.NumberID += highestListNumbering; logger.LogDebug("Changing num abstract num id from " + numberingNumId.AbstractNumId.Val + " to " + (numberingNumId.AbstractNumId.Val + highestAbstractListNumbering)); numberingNumId.AbstractNumId.Val += highestAbstractListNumbering; } highestListNumbering = Math.Max(highestListNumbering, numberingNumIds.Max(ln => (ln.NumberID.HasValue ? ln.NumberID.Value : 0))); var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNum>().ToList(); logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids." + wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.XName.LocalName); foreach (var abstractNumberingNumId in abstractNumberingNumIds) { logger.LogDebug("Changing abstract id from " + abstractNumberingNumId.AbstractNumberId + " to " + (abstractNumberingNumId.AbstractNumberId + highestAbstractListNumbering)); abstractNumberingNumId.AbstractNumberId += highestAbstractListNumbering; int nsid = Convert.ToInt32(abstractNumberingNumId.Nsid.Val, 16); nsid += highestAbstractListNumbering; abstractNumberingNumId.Nsid.Val = nsid.ToString("X"); logger.LogDebug("NSID is " + abstractNumberingNumId.Nsid.Val); } //Keep the max nums up to date if (abstractNumberingNumIds.Count > 0) highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.AbstractNumberId.HasValue ? ln.AbstractNumberId.Value : -1) + 1)); } if (numIds.Count > 0) highestListNumbering = Math.Max(highestListNumbering, numIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0))); logger.LogDebug("Max num is now " + highestListNumbering); logger.LogDebug("Max abstract num is now " + highestAbstractListNumbering); wordDoc.MainDocumentPart.Document.Save(); } } sources.Add(new OpenXmlPowerTools.Source(inputFile.GetAsWmlDocument(), true)); } DocumentBuilder.BuildDocument(sources, outputFile); inputFiles.Clear(); return true; } catch (SystemException ex) { logger.LogError("Error occured while mergining. ", ex); return false; } }
which is pretty messy but appears to work and I am going to be very careful about changing it to make it cleaner!
September 26, 2016 at 10:34 am #3836Sorry Trevor I got caught out by the length of this thread and missed your latest messages where you are exactly talking about the NSID problem! I notice you are updating in a slightly different way though. I would suggest adding the decimal value of the hightest abstract num so far so it’s guaranteed to be unique. What broke after you did the NSID fix or what does it not cover? I tried a few scenarios and appeared ok but our QA will be testing this properly shortly. Thanks!
September 27, 2016 at 3:29 pm #3845Glad your code is working Alan!
It seems that updating my NSID values using my approach or yours produces the same effect. It’s hard to describe, but it looks like some lists are now interweaved. Like a list starts numbering 1, 2, 3 and then it jumps to 1 again, and then back to 4, 5, then theres a 2…
Some places have badly formatted lists that seem to have appeared out of nowhere… it’s all just messed up.
I think I’ll need to do more pre-merge cleaning of my input documents. The source documents are not under my control so they can come in many forms and may or may not be in good shape when passed to my program. It’s tough.
November 17, 2016 at 1:53 pm #3956Hi Alan and Trevor,
I have read this thread carefully and also preprocess the documents before calling DocumentBuilder.BuildDocument via the algoritm of Alan described at September 26, 2016 at 9:44 am.
Thank you all for the useful information, which have saved me a lot of research time.
The algoritm is working quite good I must say, however in some cases with numbered and bulleted lists it does not work correctly unfortunately.
Are you already something further with this issue?
I will also do some research myself of the algoritm.Kind regards,
Thierry Knijff
Software EngineerNovember 17, 2016 at 2:40 pm #3957Hi Alan,
I added an extra check when determining the variable highestListNumbering in the algoritm:
highestListNumbering = Math.Max(highestListNumbering, numberingNumIds.Max(ln => (ln.NumberID.HasValue ? ln.NumberID.Value : 0)));
is changed to (to prevent a exception when then numberingNumIds has no items):
if (numberingNumIds.Count > 0)
highestListNumbering = Math.Max(highestListNumbering, numberingNumIds.Max(ln => (ln.NumberID.HasValue ? ln.NumberID.Value : 0)));Kind regards,
Thierry
November 18, 2016 at 10:13 am #3960Hi Alan,
When I adjusted the bullets & numbering of content of the documents that went wrong the problems with the bullets & numbering disappeared 🙂
So for now the algoritm is working ok, as far I can see. But we will do more testing thoroughly.
Kind regards,
Thierry -
AuthorPosts
You must be logged in to reply to this topic.