List Numbering on Merged Docs

Home Forums Open-Xml-PowerTools List Numbering on Merged Docs

This topic contains 23 replies, has 4 voices, and was last updated by  Thierry 8 years, 1 month ago.

Viewing 9 posts - 16 through 24 (of 24 total)
  • Author
    Posts
  • #3827

    thandley
    Participant

    I found my issue, now to work out the solution.

    At the bottom of the numbering.xml part I have found that the lists that are not resetting to 1 correctly are mapping to the same abstractNumId:

    
    	<w:num w:numId="10">
    		<w:abstractNumId w:val="9"/>
    	</w:num>
    	<w:num w:numId="11">
    		<w:abstractNumId w:val="9"/>
    	</w:num>
    	<w:num w:numId="12">
    		<w:abstractNumId w:val="12"/>
    	</w:num>
    	<w:num w:numId="13">
    		<w:abstractNumId w:val="12"/>
    	</w:num>
    	<w:num w:numId="14">
    		<w:abstractNumId w:val="3"/>
    	</w:num>
    	<w:num w:numId="15">
    		<w:abstractNumId w:val="3"/>
    	</w:num>
    	<w:num w:numId="16">
    		<w:abstractNumId w:val="4"/>
    	</w:num>
    	<w:num w:numId="17">
    		<w:abstractNumId w:val="4"/>
    	</w:num>
    

    If I create new abstractNum nodes, and update all guids to new unique values and remap one of each of these pairs to the new version then it resets as expected.

    EDIT:
    So it turns out that my document is correct until after I open it and say OK to the “Update fields” modal dialog that I enabled to refresh the ToC.

    After my program is finished with the document, the numbering part has unique numbering IDs and unique abstract numbering IDs that match to the document part numbered lists as I would expect.

    When I say OK and then save and close the document to see what changed in the XML I can see that the numbering part has been changed to have duplicate abstract numbering IDs as shown above.

    • This reply was modified 8 years, 3 months ago by  thandley.
    #3829

    thandley
    Participant

    I figured it out!!

    It was not related to the ToC update modal as I last suspected.

    The numbered list pairs that were misbehaving each had matching <w:nsid w:val=”X”/> nodes in their abstractNum definition nodes.

    So I think I can just do a post process to make all nsid’s unique and I should be good.

    EDIT:
    I was able to solve this with the following code

    
            private static void FixNumberingPart(MainDocumentPart mainPart) {
                List<AbstractNum> nums = mainPart.NumberingDefinitionsPart.Numbering.Descendants<AbstractNum>().ToList();
                foreach (AbstractNum num in nums) {
                    bool isDuplicate = nums.Count(instance => instance.Nsid.Val.Value == num.Nsid.Val.Value) > 1;
                    if (isDuplicate) {
                        Console.WriteLine($"Found duplicate Nsid = {num.Nsid.Val.Value}");
                        num.Nsid.Val = HexBinaryValue.FromString(GetRandomHexNumber(8));
                        Console.WriteLine($"New Nsid = {num.Nsid.Val.Value}");
                    }
                }
            }
            private static readonly Random Random = new Random();
            private static string GetRandomHexNumber(int digits) {
                byte[] buffer = new byte[digits / 2];
                Random.NextBytes(buffer);
                string result = string.Concat(buffer.Select(x => x.ToString("X2")).ToArray());
                if (digits%2 == 0) {
                    return result;
                }
                return result + Random.Next(16).ToString("X");
            }
    
    • This reply was modified 8 years, 3 months ago by  thandley.
    #3831

    thandley
    Participant

    Arrgh… that all worked on my simple test documents. But when I tested it on real world use cases that are much more complex then it failed.

    I suppose those Nsid values are referenced in other places.

    At least I’m on the right track.

    #3835

    AlanSMac
    Participant

    Hi Trevor,

    I did what you suggested and that stopped the error but my numberings were back to continuing across the page instead of resetting to 0 and I got the same result as if I was not running any code beyond the standard BuildDocument. However what you said made me add more debugging and inspect the OpenXML more etc to see if I was updating other numbers more than once etc. I made some key discoveries:

    That last point made me change to taking the hex nsids and adding to them based on the highest abstract num list to get a unique nsid which appears to work for my testing

    My code is now:

    public bool MergeDocs(List<InterchangeableWordProcessingDocument> inputFiles, string outputFile, Guid busUnitID)
            {
                if (inputFiles == null)
                {
                    logger.LogDebug("No files to merge.");
                    return true;
                }
                try
                {
    
                    
                    logger.LogDebug("Starting creating merged label doc " + outputFile + "\nCombining " + inputFiles.Count() + " document(s) into single doc.");
    
                    List<OpenXmlPowerTools.Source> sources = new List<OpenXmlPowerTools.Source>();
                    int highestListNumbering = 0;
                    int highestAbstractListNumbering = 0;
                    foreach (var inputFile in inputFiles)
                    {
                        //Sometimes merge puts start of next page onto end of previous one so prevent
                        //Seems to cause extra blank page when there are labels so don't do on labels pages
                        if (inputFile.DocType == DocType.Letter)
                        {
                            using (var wordDoc = inputFile.GetAsWordProcessingDocument())
                            {
                                var para = wordDoc.MainDocumentPart.Document.Body.ChildElements.First<Paragraph>();
    
                                if (para.ParagraphProperties == null)
                                {
                                    para.ParagraphProperties = new ParagraphProperties();
                                }
    
                                para.ParagraphProperties.PageBreakBefore = new PageBreakBefore();
    
                                //http://www.ericwhite.com/blog/forums/topic/list-numbering-on-merged-docs/
                                //Numberings should be unique to each page otherwise they continue from the previous
                                //Keep track of how many we have so we can add on to always have a unique number
                                var numIds = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList();
    
                                logger.LogDebug("Found " + numIds.Count + " num ids.");
    
                                foreach (var numId in numIds)
                                {
                                    logger.LogDebug("Changing num id from " + numId.Val + " to " + (numId.Val + highestListNumbering));
                                    numId.Val += highestListNumbering;
                                }
    
                                var styleNumIds = wordDoc.MainDocumentPart.StyleDefinitionsPart.RootElement.Descendants<NumberingId>().ToList();
    
                                if (wordDoc.MainDocumentPart.StyleDefinitionsPart != null)
                                {
    
                                    logger.LogDebug("Found " + styleNumIds.Count + " style num ids.");
                                    foreach (var styleNumId in styleNumIds)
                                        styleNumId.Val += highestListNumbering;
                                }
    
                                if (wordDoc.MainDocumentPart.NumberingDefinitionsPart != null)
                                {
    
                                    var numberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<NumberingInstance>().ToList();
    
                                    logger.LogDebug("Found " + numberingNumIds.Count + " numbering num ids.");
                                    foreach (var numberingNumId in numberingNumIds)
                                    {
                                        logger.LogDebug("Changing num id from " + numberingNumId.NumberID + " to " + (numberingNumId.NumberID + highestListNumbering));
                                        numberingNumId.NumberID += highestListNumbering;
    
                                        logger.LogDebug("Changing num abstract num id from " + numberingNumId.AbstractNumId.Val + " to " + (numberingNumId.AbstractNumId.Val + highestAbstractListNumbering));
                                        numberingNumId.AbstractNumId.Val += highestAbstractListNumbering;
                                    }
    
                                    highestListNumbering = Math.Max(highestListNumbering, numberingNumIds.Max(ln => (ln.NumberID.HasValue ? ln.NumberID.Value : 0)));
    
                                    var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNum>().ToList();
    
                                    logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids." + wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.XName.LocalName);
                                    foreach (var abstractNumberingNumId in abstractNumberingNumIds)
                                    {
                                        logger.LogDebug("Changing abstract id from " + abstractNumberingNumId.AbstractNumberId + " to " + (abstractNumberingNumId.AbstractNumberId + highestAbstractListNumbering));
                                        abstractNumberingNumId.AbstractNumberId += highestAbstractListNumbering;
    
                                        int nsid = Convert.ToInt32(abstractNumberingNumId.Nsid.Val, 16);
                                        nsid += highestAbstractListNumbering;
                                        abstractNumberingNumId.Nsid.Val = nsid.ToString("X");
                                        logger.LogDebug("NSID is " + abstractNumberingNumId.Nsid.Val);
    
                                    }
    
                                    //Keep the max nums up to date
                                    if (abstractNumberingNumIds.Count > 0)
                                        highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.AbstractNumberId.HasValue ? ln.AbstractNumberId.Value : -1) + 1));
    
                                }
    
                                if (numIds.Count > 0)
                                    highestListNumbering = Math.Max(highestListNumbering, numIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0)));
    
                                logger.LogDebug("Max num is now " + highestListNumbering);
                                logger.LogDebug("Max abstract num is now " + highestAbstractListNumbering);
    
                                wordDoc.MainDocumentPart.Document.Save();
                            }
                        }
                        sources.Add(new OpenXmlPowerTools.Source(inputFile.GetAsWmlDocument(), true));
    
                    }
    
                    DocumentBuilder.BuildDocument(sources, outputFile);
    
                    inputFiles.Clear();
                    return true;
    
                }
                catch (SystemException ex)
                {
                    logger.LogError("Error occured while mergining. ", ex);
    
                    return false;
                }
            }

    which is pretty messy but appears to work and I am going to be very careful about changing it to make it cleaner!

    #3836

    AlanSMac
    Participant

    Sorry Trevor I got caught out by the length of this thread and missed your latest messages where you are exactly talking about the NSID problem! I notice you are updating in a slightly different way though. I would suggest adding the decimal value of the hightest abstract num so far so it’s guaranteed to be unique. What broke after you did the NSID fix or what does it not cover? I tried a few scenarios and appeared ok but our QA will be testing this properly shortly. Thanks!

    #3845

    thandley
    Participant

    Glad your code is working Alan!

    It seems that updating my NSID values using my approach or yours produces the same effect. It’s hard to describe, but it looks like some lists are now interweaved. Like a list starts numbering 1, 2, 3 and then it jumps to 1 again, and then back to 4, 5, then theres a 2…

    Some places have badly formatted lists that seem to have appeared out of nowhere… it’s all just messed up.

    I think I’ll need to do more pre-merge cleaning of my input documents. The source documents are not under my control so they can come in many forms and may or may not be in good shape when passed to my program. It’s tough.

    #3956

    Thierry
    Participant

    Hi Alan and Trevor,

    I have read this thread carefully and also preprocess the documents before calling DocumentBuilder.BuildDocument via the algoritm of Alan described at September 26, 2016 at 9:44 am.

    Thank you all for the useful information, which have saved me a lot of research time.

    The algoritm is working quite good I must say, however in some cases with numbered and bulleted lists it does not work correctly unfortunately.

    Are you already something further with this issue?
    I will also do some research myself of the algoritm.

    Kind regards,

    Thierry Knijff
    Software Engineer

    #3957

    Thierry
    Participant

    Hi Alan,

    I added an extra check when determining the variable highestListNumbering in the algoritm:

    highestListNumbering = Math.Max(highestListNumbering, numberingNumIds.Max(ln => (ln.NumberID.HasValue ? ln.NumberID.Value : 0)));

    is changed to (to prevent a exception when then numberingNumIds has no items):

    if (numberingNumIds.Count > 0)
    highestListNumbering = Math.Max(highestListNumbering, numberingNumIds.Max(ln => (ln.NumberID.HasValue ? ln.NumberID.Value : 0)));

    Kind regards,

    Thierry

    #3960

    Thierry
    Participant

    Hi Alan,

    When I adjusted the bullets & numbering of content of the documents that went wrong the problems with the bullets & numbering disappeared 🙂

    So for now the algoritm is working ok, as far I can see. But we will do more testing thoroughly.

    Kind regards,
    Thierry

Viewing 9 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic.