List Numbering on Merged Docs

Home Forums Open-Xml-PowerTools List Numbering on Merged Docs

This topic contains 23 replies, has 4 voices, and was last updated by  Thierry 7 years, 12 months ago.

Viewing 15 posts - 1 through 15 (of 24 total)
  • Author
    Posts
  • #3662

    AlanSMac
    Participant

    Hi Eric,

    I am replacing variables in different instances of the same template docx file and merging them all together at the end. It generally works very well so thank you for your excellent library. We have one issue though, which is that if the template designer has put a numbered list in the template then after merging the list numbers continue rather than restart on each page.

    So for 10 bullet points on page 1 for the first instance it’s number 1-10. On page 2 for the second instance of the same template the numbering is 11-20 but I want 1-10 again. I have read that OOXML has a w:lvlRestart element but I am unsure how I would automatically insert this to get the desired behaviour acrossed merged documents given the template designer has free control to design the template as he/she pleases. Do you know how I could achieve this?

    I am on OOXML Power Tools 4.1.3. Nuget is showing me 4.2.0 so potentially I can upgrade if it’s required but that would involve regression testing the whole thing.

    Thanks,

    Alan

    #3663

    Eric White
    Keymaster

    Hi Alan,

    I presume that you are using the DocumentBuilder module, correct? I ask this because it is possible to encounter a similar issue with DocumentAssembler…

    This is one of the more complex issues associated with using DocumentBuilder – something that I have wanted to address for years, but have never had the time.

    Here is the idea way to fix this issue. It may seem complex, but really is not too bad. The gist of this approach is that if you want to have lists operate in isolation, then they need to have a unique w:numId and w:abstractNumId for each list.

    Each numbered list, regardless of whether it is numbering based on style, or numbering directly applied to the paragraph, has a w:numId associated with it. In the numbering part, this w:numId refers to a w:abstractNumId. Numbering is calculated based on the abstractNumId.

    So if you want your lists to count in isolation, then each list needs to have a unique numId and abstractNumId.

    Let’s say that you have Document1 and Document2 that each contain lists with the same numId and abstractNumId, probably because they originated from the same document. The pseudo code would be:

    • Find out the maximum number of numId and abstractNumId in each of the documents. MAXNUMID=maximum of numId. MAXABSTRACTNUMID=maximum of abstractNumId.
    • Leave Document1 as is.
    • Go through Document2, adding MAXNUMID to each definition and reference of a numId value. These values need to be changed in the main document part, in the styles part, and in the numbering part.
    • Go through Document2, adding MAXABSTRACTNUMID to each definition and reference of an abstractNumId value. These values need to be changed in the numbering part only.
    • At the end of this process, the w:numId values will be unique in each document, as well the w:abstractNumId values. The documents will then merge with DocumentBuilder in such a way that each numbered list will count in isolation.

    Ideally, there would be an option in DocumentBuilder that would enable you to specify that the lists in the document should be processed in isolation, or that the lists should be merged with lists in other source documents. I’ve added this item to my list of possible enhancements for Open-Xml-PowerTools.

    Cheers, Eric

    #3695

    AlanSMac
    Participant

    Thanks very much Eric.

    I got this working without too much effort in the end. You were spot on. Much appreciated.

    #3696

    AlanSMac
    Participant

    I spoke to soon. It broke the formatting on the lists like alignment and indentation. I guess that is stored in the document elsewhere and I need to update the ids elsewhere to match the new ones.

    #3699

    Eric White
    Keymaster

    Hi Alan,

    Make sure up update the w:numId values in the styles.xml part.

    -Eric

    #3701

    AlanSMac
    Participant

    Hi Eric,

    can you please give me pointers as to how to change that programmatically? I know there is a styles.xml and also a numbering.xml when I unzip a docx but interms of working with the SDK or Power Tools.

    I have the following code for finding the highestListNumbers and making sure each merged doc has unique numering ids which is the code that successfuly resets the numering per document but the formatting is lost beyond the first doc.

                   int highestListNumbering = 0;
    
                    foreach (var inputFile in inputFiles)
                    {
                            using (var wordDoc = inputFile.GetAsWordProcessingDocument())
                            {
                                var listNumberings = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList();
    
                                foreach (var listNumbering in listNumberings)
                                    listNumbering.Val += highestListNumbering;
    
                                logger.LogDebug(listNumberings.Count + " NumberingIds found");
    
                               var xdoc = wordDoc.MainDocumentPart.GetXDocument();
    
                                var nums = xdoc.Descendants(W.numId).ToList();
                                logger.LogDebug(nums.Count + " numberings found");
    
                                wordDoc.MainDocumentPart.Document.Save();
                            }
                     }

    The final part was my attempt to see if I use the XDocument if I would see more Numering ids but I get the same number as I am already affecting. How do I get a handle to the style information for the numberings? I omitted the equivalent abstract numbering code for brevity and also my sample docs never have abstract numberings so far.

    Thanks

    • This reply was modified 8 years, 2 months ago by  AlanSMac. Reason: Didn't include enough code
    • This reply was modified 8 years, 2 months ago by  AlanSMac. Reason: Code blocks not showing correctly
    #3707

    AlanSMac
    Participant

    I’m still struggling with this formatting issue. I have been trying lots of different ways of finding NumebringId elements or w.numId from an XDocument and still can’t get the formatting issue to fix itself. I think it will be a very small amount of code but I just can’t find the magic settings.

    The code I have that uses wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>(), will this find these elements across all sections of the document e.g. stylings, numberings etc? I started trying to go through all the parts of the document but get null reference issues i.e.

    var listNumberings = wordDoc.Parts
                                 .Where(part => part != null)
                                 .SelectMany(part => part.OpenXmlPart.RootElement.Descendants<NumberingId>()).ToList();
    
    foreach (var listNumbering in listNumberings)
        listNumbering.Val += highestListNumbering;

    Thanks

    • This reply was modified 8 years, 2 months ago by  AlanSMac.
    • This reply was modified 8 years, 2 months ago by  AlanSMac.
    #3723

    AlanSMac
    Participant

    Hi Eric,

    I went back through what you said and tried to make sure I was doing everything but still no luck with the indentation of the list. It is hard left on page 2 onwards when it should be indented like a normal numbered list. To make it easier to see I am pasting my current version of the whole merge method

    
            /// <summary>
            /// Merge input files into one output file
            /// </summary>
            /// <param name="inputFiles"></param>
            /// <param name="outputFilePath"></param>
            /// <returns></returns>
            public bool Merge(List<InterchangeableWordProcessingDocument> inputFiles, string outputFilePath)
            {
                if (inputFiles == null)
                {
                    logger.LogDebug("No files to merge.");
                    return true;
                }
                try
                {
    
                    List<OpenXmlPowerTools.Source> sources = new List<OpenXmlPowerTools.Source>();
                    int highestListNumbering = 0;
                    int highestAbstractListNumbering = 0;
                    foreach (var inputFile in inputFiles)
                    {
                        //Sometimes merge puts start of next page onto end of previous one so prevent
                        //Seems to cause extra blank page when there are labels so don't do on labels pages
                        if (inputFile.DocType == DocType.Letter)
                        {
                            using (var wordDoc = inputFile.GetAsWordProcessingDocument())
                            {
                                var para = wordDoc.MainDocumentPart.Document.Body.ChildElements.First<Paragraph>();
    
                                if (para.ParagraphProperties == null)
                                {
                                    para.ParagraphProperties = new ParagraphProperties();
                                }
    
                                para.ParagraphProperties.PageBreakBefore = new PageBreakBefore();
    
                                //http://www.ericwhite.com/blog/forums/topic/list-numbering-on-merged-docs/
                                //Numberings should be unique to each page otherwise they continue from the previous
                                //Keep track of how many we have so we can add on to always have a unique number
                                var numIds = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList();
    
                                logger.LogDebug("Found " + numIds.Count + " num ids.");
    
                                foreach (var numId in numIds)
                                    numId.Val += highestListNumbering;
    
                                var styleNumIds = wordDoc.MainDocumentPart.StyleDefinitionsPart.RootElement.Descendants<NumberingId>().ToList();
    
                                logger.LogDebug("Found " + styleNumIds.Count + " num ids.");
                                foreach (var styleNumId in styleNumIds)
                                    styleNumId.Val += highestListNumbering;
    
                                var numeringNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<NumberingId>().ToList();
    
                                logger.LogDebug("Found " + numeringNumIds.Count + " num ids.");
                                foreach (var numeringNumId in numeringNumIds)
                                    numeringNumId.Val += highestListNumbering;
    
                                var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNumId>().ToList();
    
                                logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids.");
                                foreach (var abstractNumberingNumId in abstractNumberingNumIds)
                                    abstractNumberingNumId.Val += highestAbstractListNumbering;
    
                                //Keep the max nums up to date
                                if (numIds.Count > 0)
                                    highestListNumbering = Math.Max(highestListNumbering, numIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0)));
    
                                if (abstractNumberingNumIds.Count > 0)
                                    highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0)));
    
                                
                                wordDoc.MainDocumentPart.Document.Save();
                            }
                        }
                        sources.Add(new OpenXmlPowerTools.Source(inputFile.GetAsWmlDocument(), true));
    
                    }
                    DocumentBuilder.BuildDocument(sources, outputFilePath);
                    return true;
    
                }
                catch (SystemException ex)
                {
                    logger.LogError("Error occured while generating bereavement letters. ", ex);
    
                    return false;
                }
                finally
                {
                    foreach (var inputFile in inputFiles)
                    {
                        inputFile.Dispose();
                    }
                }
            }
    • This reply was modified 8 years, 2 months ago by  AlanSMac.
    #3735

    AlanSMac
    Participant

    OK so I found from inspecting the raw numbering xml and comparing to my logging statements I was never finding the num id elements in the numbering section so I have changed to select NumeringInstance elements but now get Sequence Contains No Elements errors in DocumentBuilder.BuildDocument. New code is

    public bool Merge(List<InterchangeableWordProcessingDocument> inputFiles, string outputFilePath)
            {
                if (inputFiles == null)
                {
                    logger.LogDebug("No files to merge.");
                    return true;
                }
                try
                {
    
                    List<OpenXmlPowerTools.Source> sources = new List<OpenXmlPowerTools.Source>();
                    int highestListNumbering = 0;
                    int highestAbstractListNumbering = 0;
                    foreach (var inputFile in inputFiles)
                    {
                        //Sometimes merge puts start of next page onto end of previous one so prevent
                        //Seems to cause extra blank page when there are labels so don't do on labels pages
                        if (inputFile.DocType == DocType.Letter)
                        {
                            using (var wordDoc = inputFile.GetAsWordProcessingDocument())
                            {
                                var para = wordDoc.MainDocumentPart.Document.Body.ChildElements.First<Paragraph>();
    
                                if (para.ParagraphProperties == null)
                                {
                                    para.ParagraphProperties = new ParagraphProperties();
                                }
    
                                para.ParagraphProperties.PageBreakBefore = new PageBreakBefore();
    
                                //http://www.ericwhite.com/blog/forums/topic/list-numbering-on-merged-docs/
                                //Numberings should be unique to each page otherwise they continue from the previous
                                //Keep track of how many we have so we can add on to always have a unique number
                                var numIds = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList();
    
                                logger.LogDebug("Found " + numIds.Count + " num ids.");
    
                                foreach (var numId in numIds)
                                    numId.Val += highestListNumbering;
    
                                var styleNumIds = wordDoc.MainDocumentPart.StyleDefinitionsPart.RootElement.Descendants<NumberingId>().ToList();
    
                                if (wordDoc.MainDocumentPart.StyleDefinitionsPart != null)
                                {
    
                                    logger.LogDebug("Found " + styleNumIds.Count + " stlye num ids.");
                                    foreach (var styleNumId in styleNumIds)
                                        styleNumId.Val += highestListNumbering;
                                }
    
                                if (wordDoc.MainDocumentPart.NumberingDefinitionsPart != null)
                                {
    
                                    var numberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<NumberingInstance>().ToList();
    
                                    logger.LogDebug("Found " + numberingNumIds.Count + " numbering num ids.");
                                    foreach (var numberingNumId in numberingNumIds)
                                    {
                                        numberingNumId.NumberID += highestListNumbering;
                                        numberingNumId.AbstractNumId.Val += highestAbstractListNumbering;
                                    }
                                
                                    var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNumId>().ToList();
    
                                    logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids." + wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.XName.LocalName);
                                    foreach (var abstractNumberingNumId in abstractNumberingNumIds)
                                        abstractNumberingNumId.Val += highestAbstractListNumbering;
    
                                    //Keep the max nums up to date
                                    if (abstractNumberingNumIds.Count > 0)
                                        highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0)));
    
                                }
    
                                
                                if (numIds.Count > 0)
                                    highestListNumbering = Math.Max(highestListNumbering, numIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0)));
    
                                
                                
                                wordDoc.MainDocumentPart.Document.Save();
                            }
                        }
                        sources.Add(new OpenXmlPowerTools.Source(inputFile.GetAsWmlDocument(), true));
    
                    }
                    DocumentBuilder.BuildDocument(sources, outputFilePath);
                    return true;
    
                }
                catch (SystemException ex)
                {
                    logger.LogError("Error occured while generating bereavement letters. ", ex);
    
                    return false;
                }
                finally
                {
                    foreach (var inputFile in inputFiles)
                    {
                        inputFile.Dispose();
                    }
                }
            }
    

    Error is:

    System.InvalidOperationException: Sequence contains no elements
    at System.Linq.Enumerable.First[TSource](IEnumerable`1 source)
    at OpenXmlPowerTools.DocumentBuilder.CopyNumbering(WordprocessingDocument sourceDocument, WordprocessingDocument newDocument, IEnumerable1 newContent, List1 images)
    at OpenXmlPowerTools.DocumentBuilder.AppendDocument(WordprocessingDocument sourceDocument, WordprocessingDocument newDocument, List1 newContent, Boolean keepSection, String insertId, List1 images)
    at OpenXmlPowerTools.DocumentBuilder.BuildDocument(List`1 sources, WordprocessingDocument output)
    at OpenXmlPowerTools.DocumentBuilder.BuildDocument(List`1 sources, String fileName)
    at BereavementMailing.TemplateEngine.Merge(List`1 inputFiles, String outputFilePath) in C:\caw\Underdog\Apps\Services\BereavementMailingEngine\BM_RequestProcessor\TemplateEngine.cs:line 508

    Feels like I am missing another place or have the wrong search criteria still.

    #3812

    thandley
    Participant

    Alan,

    I have encountered a very similar issue. Maybe we can work though this together?

    I’m going to try implementing a solution similar to what you have produced so far and will let you know if I can solve the styles issue.

    It’s always fun when a google search only turns up questions without full solutions 🙂

    Time to take a deep dive into OpenXML I guess!

    -Trevor

    #3814

    AlanSMac
    Participant

    Hi Trevor,

    that would be great. I had to park this as I tried a lot of things and couldn’t get it exactly right. I tried renaming my sample docs as zip, unxipping them and inspecting the numering and , style and document body xml parts to try and work out where I might be missing updating a numbering id or something but just couldn’t get it.

    Have you got anywhere?

    Thanks

    Alan

    #3817

    thandley
    Participant

    I’m running into an issue in DocumentBuilder.CopyNumbering

    
    // Copy abstract numbering element, if necessary (use matching NSID)
    string abstractNumId = element
                           .Elements(W.abstractNumId)
                           .First()
                           .Attribute(W.val)
                           .Value;
    XElement abstractElement = oldNumbering
                               .Descendants()
                               .Elements(W.abstractNum)
                               .Where(p => ((string)p.Attribute(W.abstractNumId)) == abstractNumId)
                               .First();
    

    An exception is thrown when setting XElement abstractElement, because there are no elements in the sequence that match the abstractNumId.

    Is that what you’re seeing, Alan?

    #3818

    thandley
    Participant

    I managed to get past that error, and the documents merge successfully but it’s still not what I am expecting.

    In your last posted code you would need to change this section:

    
    var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNumId>().ToList();
    
    logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids." + wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.XName.LocalName);
    foreach (var abstractNumberingNumId in abstractNumberingNumIds)
        abstractNumberingNumId.Val += highestAbstractListNumbering;
    
    //Keep the max nums up to date
    if (abstractNumberingNumIds.Count > 0)
        highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0)));
    

    to this:

    
    var abstractNums = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNum>().ToList();
    
    logger.LogDebug("Found " + abstractNums.Count + " abstract nums." + wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.XName.LocalName);
    foreach (var abstractNum in abstractNums)
        abstractNum.AbstractNumberId += highestAbstractListNumbering;
    
    //Keep the max nums up to date
    if (abstractNums.Count > 0)
        highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNums.Select(a => a.AbstractNumberId).Max(n => n.HasValue ? n.Value : 0));
    

    My code is a bit different so I only modified your section by eye here, apologies if I missed something in the syntax changes. The idea is that you were previously updating the values of Descendants<AbstractNumId>() twice on accident… the previous foreach that grabs Descendants<NumberingInstance>() changes the same AbstractNumId instances that you were changing here, so each pass was updating them twice!

    The formatting of my document remains correct, but there are still some numbered lists that aren’t restarting at 1 for me. Maybe I’m missing a step in here somewhere still.

    EDIT:
    It looks like the numbered lists from different source documents are correctly restarting at 1 as expected, but some of my documents have multiple consecutive numbered lists that continue ordering when I expect them to restart at 1.

    • This reply was modified 8 years, 1 month ago by  thandley.
    #3820

    AlanSMac
    Participant

    Hi Trevor

    Sorry I’ve been working on other stuff but I will take a look tomorrow. I’m in the UK so it’s night here.

    My numbers were resetting but the formatting was incorrect. If we’re lucky you’ve fixed my issue and I can work out your one.

    Will post when I get the chance.
    I really appreciate you taking a look. Thanks

    #3821

    thandley
    Participant

    I’ll attempt to clarify my issue.

    Document 1 has two numbered lists consecutively as such:
    1. <content>
    2. <content>
    3. <content>
    — page break, new section —
    1. <content>
    2. <content>
    3. <content>

    Document 2 has the same situation.

    I merge Doc1 and Doc2 such that all four numbered lists are consecutive in the final document, so we want:
    1. <content>
    2. <content>
    3. <content>
    — page break, new section —
    1. <content>
    2. <content>
    3. <content>
    — page break, new section —
    1. <content>
    2. <content>
    3. <content>
    — page break, new section —
    1. <content>
    2. <content>
    3. <content>

    But instead we get
    1. <content>
    2. <content>
    3. <content>
    — page break, new section —
    4. <content>
    5. <content>
    6. <content>
    — page break, new section —
    1. <content>
    2. <content>
    3. <content>
    — page break, new section —
    4. <content>
    5. <content>
    6. <content>

    EDIT:
    Before including the changes from this thread I was seeing the four lists all numbered in one sequence from 1 to 12.

    • This reply was modified 8 years, 1 month ago by  thandley.
Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic.