AlanSMac
Forum Replies Created
-
AuthorPosts
-
Sorry Trevor I got caught out by the length of this thread and missed your latest messages where you are exactly talking about the NSID problem! I notice you are updating in a slightly different way though. I would suggest adding the decimal value of the hightest abstract num so far so it’s guaranteed to be unique. What broke after you did the NSID fix or what does it not cover? I tried a few scenarios and appeared ok but our QA will be testing this properly shortly. Thanks!
Hi Trevor,
I did what you suggested and that stopped the error but my numberings were back to continuing across the page instead of resetting to 0 and I got the same result as if I was not running any code beyond the standard BuildDocument. However what you said made me add more debugging and inspect the OpenXML more etc to see if I was updating other numbers more than once etc. I made some key discoveries:
- Abstract num ids appear to start at 0 but num ids start at 1.
- Even after allowing for starting at 0 in my algorithm, my input file had 2 abstract nums but if I had a 3 times output then I had 3 abstract nums instead of my expected 6
- Abstract nums have an NSID that is meant to be a hex unique identifier https://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.nsid%28v=office.14%29.aspx
That last point made me change to taking the hex nsids and adding to them based on the highest abstract num list to get a unique nsid which appears to work for my testing
My code is now:
public bool MergeDocs(List<InterchangeableWordProcessingDocument> inputFiles, string outputFile, Guid busUnitID) { if (inputFiles == null) { logger.LogDebug("No files to merge."); return true; } try { logger.LogDebug("Starting creating merged label doc " + outputFile + "\nCombining " + inputFiles.Count() + " document(s) into single doc."); List<OpenXmlPowerTools.Source> sources = new List<OpenXmlPowerTools.Source>(); int highestListNumbering = 0; int highestAbstractListNumbering = 0; foreach (var inputFile in inputFiles) { //Sometimes merge puts start of next page onto end of previous one so prevent //Seems to cause extra blank page when there are labels so don't do on labels pages if (inputFile.DocType == DocType.Letter) { using (var wordDoc = inputFile.GetAsWordProcessingDocument()) { var para = wordDoc.MainDocumentPart.Document.Body.ChildElements.First<Paragraph>(); if (para.ParagraphProperties == null) { para.ParagraphProperties = new ParagraphProperties(); } para.ParagraphProperties.PageBreakBefore = new PageBreakBefore(); //http://www.ericwhite.com/blog/forums/topic/list-numbering-on-merged-docs/ //Numberings should be unique to each page otherwise they continue from the previous //Keep track of how many we have so we can add on to always have a unique number var numIds = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList(); logger.LogDebug("Found " + numIds.Count + " num ids."); foreach (var numId in numIds) { logger.LogDebug("Changing num id from " + numId.Val + " to " + (numId.Val + highestListNumbering)); numId.Val += highestListNumbering; } var styleNumIds = wordDoc.MainDocumentPart.StyleDefinitionsPart.RootElement.Descendants<NumberingId>().ToList(); if (wordDoc.MainDocumentPart.StyleDefinitionsPart != null) { logger.LogDebug("Found " + styleNumIds.Count + " style num ids."); foreach (var styleNumId in styleNumIds) styleNumId.Val += highestListNumbering; } if (wordDoc.MainDocumentPart.NumberingDefinitionsPart != null) { var numberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<NumberingInstance>().ToList(); logger.LogDebug("Found " + numberingNumIds.Count + " numbering num ids."); foreach (var numberingNumId in numberingNumIds) { logger.LogDebug("Changing num id from " + numberingNumId.NumberID + " to " + (numberingNumId.NumberID + highestListNumbering)); numberingNumId.NumberID += highestListNumbering; logger.LogDebug("Changing num abstract num id from " + numberingNumId.AbstractNumId.Val + " to " + (numberingNumId.AbstractNumId.Val + highestAbstractListNumbering)); numberingNumId.AbstractNumId.Val += highestAbstractListNumbering; } highestListNumbering = Math.Max(highestListNumbering, numberingNumIds.Max(ln => (ln.NumberID.HasValue ? ln.NumberID.Value : 0))); var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNum>().ToList(); logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids." + wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.XName.LocalName); foreach (var abstractNumberingNumId in abstractNumberingNumIds) { logger.LogDebug("Changing abstract id from " + abstractNumberingNumId.AbstractNumberId + " to " + (abstractNumberingNumId.AbstractNumberId + highestAbstractListNumbering)); abstractNumberingNumId.AbstractNumberId += highestAbstractListNumbering; int nsid = Convert.ToInt32(abstractNumberingNumId.Nsid.Val, 16); nsid += highestAbstractListNumbering; abstractNumberingNumId.Nsid.Val = nsid.ToString("X"); logger.LogDebug("NSID is " + abstractNumberingNumId.Nsid.Val); } //Keep the max nums up to date if (abstractNumberingNumIds.Count > 0) highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.AbstractNumberId.HasValue ? ln.AbstractNumberId.Value : -1) + 1)); } if (numIds.Count > 0) highestListNumbering = Math.Max(highestListNumbering, numIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0))); logger.LogDebug("Max num is now " + highestListNumbering); logger.LogDebug("Max abstract num is now " + highestAbstractListNumbering); wordDoc.MainDocumentPart.Document.Save(); } } sources.Add(new OpenXmlPowerTools.Source(inputFile.GetAsWmlDocument(), true)); } DocumentBuilder.BuildDocument(sources, outputFile); inputFiles.Clear(); return true; } catch (SystemException ex) { logger.LogError("Error occured while mergining. ", ex); return false; } }
which is pretty messy but appears to work and I am going to be very careful about changing it to make it cleaner!
Hi Trevor
Sorry I’ve been working on other stuff but I will take a look tomorrow. I’m in the UK so it’s night here.
My numbers were resetting but the formatting was incorrect. If we’re lucky you’ve fixed my issue and I can work out your one.
Will post when I get the chance.
I really appreciate you taking a look. ThanksHi Trevor,
that would be great. I had to park this as I tried a lot of things and couldn’t get it exactly right. I tried renaming my sample docs as zip, unxipping them and inspecting the numering and , style and document body xml parts to try and work out where I might be missing updating a numbering id or something but just couldn’t get it.
Have you got anywhere?
Thanks
Alan
OK so I found from inspecting the raw numbering xml and comparing to my logging statements I was never finding the num id elements in the numbering section so I have changed to select NumeringInstance elements but now get Sequence Contains No Elements errors in DocumentBuilder.BuildDocument. New code is
public bool Merge(List<InterchangeableWordProcessingDocument> inputFiles, string outputFilePath) { if (inputFiles == null) { logger.LogDebug("No files to merge."); return true; } try { List<OpenXmlPowerTools.Source> sources = new List<OpenXmlPowerTools.Source>(); int highestListNumbering = 0; int highestAbstractListNumbering = 0; foreach (var inputFile in inputFiles) { //Sometimes merge puts start of next page onto end of previous one so prevent //Seems to cause extra blank page when there are labels so don't do on labels pages if (inputFile.DocType == DocType.Letter) { using (var wordDoc = inputFile.GetAsWordProcessingDocument()) { var para = wordDoc.MainDocumentPart.Document.Body.ChildElements.First<Paragraph>(); if (para.ParagraphProperties == null) { para.ParagraphProperties = new ParagraphProperties(); } para.ParagraphProperties.PageBreakBefore = new PageBreakBefore(); //http://www.ericwhite.com/blog/forums/topic/list-numbering-on-merged-docs/ //Numberings should be unique to each page otherwise they continue from the previous //Keep track of how many we have so we can add on to always have a unique number var numIds = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList(); logger.LogDebug("Found " + numIds.Count + " num ids."); foreach (var numId in numIds) numId.Val += highestListNumbering; var styleNumIds = wordDoc.MainDocumentPart.StyleDefinitionsPart.RootElement.Descendants<NumberingId>().ToList(); if (wordDoc.MainDocumentPart.StyleDefinitionsPart != null) { logger.LogDebug("Found " + styleNumIds.Count + " stlye num ids."); foreach (var styleNumId in styleNumIds) styleNumId.Val += highestListNumbering; } if (wordDoc.MainDocumentPart.NumberingDefinitionsPart != null) { var numberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<NumberingInstance>().ToList(); logger.LogDebug("Found " + numberingNumIds.Count + " numbering num ids."); foreach (var numberingNumId in numberingNumIds) { numberingNumId.NumberID += highestListNumbering; numberingNumId.AbstractNumId.Val += highestAbstractListNumbering; } var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNumId>().ToList(); logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids." + wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.XName.LocalName); foreach (var abstractNumberingNumId in abstractNumberingNumIds) abstractNumberingNumId.Val += highestAbstractListNumbering; //Keep the max nums up to date if (abstractNumberingNumIds.Count > 0) highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0))); } if (numIds.Count > 0) highestListNumbering = Math.Max(highestListNumbering, numIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0))); wordDoc.MainDocumentPart.Document.Save(); } } sources.Add(new OpenXmlPowerTools.Source(inputFile.GetAsWmlDocument(), true)); } DocumentBuilder.BuildDocument(sources, outputFilePath); return true; } catch (SystemException ex) { logger.LogError("Error occured while generating bereavement letters. ", ex); return false; } finally { foreach (var inputFile in inputFiles) { inputFile.Dispose(); } } }
Error is:
System.InvalidOperationException: Sequence contains no elements
at System.Linq.Enumerable.First[TSource](IEnumerable`1 source)
at OpenXmlPowerTools.DocumentBuilder.CopyNumbering(WordprocessingDocument sourceDocument, WordprocessingDocument newDocument, IEnumerable1 newContent, List
1 images)
at OpenXmlPowerTools.DocumentBuilder.AppendDocument(WordprocessingDocument sourceDocument, WordprocessingDocument newDocument, List1 newContent, Boolean keepSection, String insertId, List
1 images)
at OpenXmlPowerTools.DocumentBuilder.BuildDocument(List`1 sources, WordprocessingDocument output)
at OpenXmlPowerTools.DocumentBuilder.BuildDocument(List`1 sources, String fileName)
at BereavementMailing.TemplateEngine.Merge(List`1 inputFiles, String outputFilePath) in C:\caw\Underdog\Apps\Services\BereavementMailingEngine\BM_RequestProcessor\TemplateEngine.cs:line 508Feels like I am missing another place or have the wrong search criteria still.
Hi Eric,
I went back through what you said and tried to make sure I was doing everything but still no luck with the indentation of the list. It is hard left on page 2 onwards when it should be indented like a normal numbered list. To make it easier to see I am pasting my current version of the whole merge method
/// <summary> /// Merge input files into one output file /// </summary> /// <param name="inputFiles"></param> /// <param name="outputFilePath"></param> /// <returns></returns> public bool Merge(List<InterchangeableWordProcessingDocument> inputFiles, string outputFilePath) { if (inputFiles == null) { logger.LogDebug("No files to merge."); return true; } try { List<OpenXmlPowerTools.Source> sources = new List<OpenXmlPowerTools.Source>(); int highestListNumbering = 0; int highestAbstractListNumbering = 0; foreach (var inputFile in inputFiles) { //Sometimes merge puts start of next page onto end of previous one so prevent //Seems to cause extra blank page when there are labels so don't do on labels pages if (inputFile.DocType == DocType.Letter) { using (var wordDoc = inputFile.GetAsWordProcessingDocument()) { var para = wordDoc.MainDocumentPart.Document.Body.ChildElements.First<Paragraph>(); if (para.ParagraphProperties == null) { para.ParagraphProperties = new ParagraphProperties(); } para.ParagraphProperties.PageBreakBefore = new PageBreakBefore(); //http://www.ericwhite.com/blog/forums/topic/list-numbering-on-merged-docs/ //Numberings should be unique to each page otherwise they continue from the previous //Keep track of how many we have so we can add on to always have a unique number var numIds = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList(); logger.LogDebug("Found " + numIds.Count + " num ids."); foreach (var numId in numIds) numId.Val += highestListNumbering; var styleNumIds = wordDoc.MainDocumentPart.StyleDefinitionsPart.RootElement.Descendants<NumberingId>().ToList(); logger.LogDebug("Found " + styleNumIds.Count + " num ids."); foreach (var styleNumId in styleNumIds) styleNumId.Val += highestListNumbering; var numeringNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<NumberingId>().ToList(); logger.LogDebug("Found " + numeringNumIds.Count + " num ids."); foreach (var numeringNumId in numeringNumIds) numeringNumId.Val += highestListNumbering; var abstractNumberingNumIds = wordDoc.MainDocumentPart.NumberingDefinitionsPart.RootElement.Descendants<AbstractNumId>().ToList(); logger.LogDebug("Found " + abstractNumberingNumIds.Count + " abstract num ids."); foreach (var abstractNumberingNumId in abstractNumberingNumIds) abstractNumberingNumId.Val += highestAbstractListNumbering; //Keep the max nums up to date if (numIds.Count > 0) highestListNumbering = Math.Max(highestListNumbering, numIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0))); if (abstractNumberingNumIds.Count > 0) highestAbstractListNumbering = Math.Max(highestAbstractListNumbering, abstractNumberingNumIds.Max(ln => (ln.Val.HasValue ? ln.Val.Value : 0))); wordDoc.MainDocumentPart.Document.Save(); } } sources.Add(new OpenXmlPowerTools.Source(inputFile.GetAsWmlDocument(), true)); } DocumentBuilder.BuildDocument(sources, outputFilePath); return true; } catch (SystemException ex) { logger.LogError("Error occured while generating bereavement letters. ", ex); return false; } finally { foreach (var inputFile in inputFiles) { inputFile.Dispose(); } } }
- This reply was modified 8 years, 2 months ago by AlanSMac.
I’m still struggling with this formatting issue. I have been trying lots of different ways of finding NumebringId elements or w.numId from an XDocument and still can’t get the formatting issue to fix itself. I think it will be a very small amount of code but I just can’t find the magic settings.
The code I have that uses wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>(), will this find these elements across all sections of the document e.g. stylings, numberings etc? I started trying to go through all the parts of the document but get null reference issues i.e.
var listNumberings = wordDoc.Parts .Where(part => part != null) .SelectMany(part => part.OpenXmlPart.RootElement.Descendants<NumberingId>()).ToList(); foreach (var listNumbering in listNumberings) listNumbering.Val += highestListNumbering;
Thanks
Hi Eric,
thanks for your help. I got this working for the revision/change tracking code. Strangely I had to leave some other code using TextReplacer instead of OpenXmlRegex because I couldn’t get the two object trees (OM vs using LINQ to XML) to sync in memory but only for that case. I have to keep the former object model because it seems easier when inserting new element. I have to do that to append breaks after newline characters otherwise the new lines don’t work and do a little paragraph object manipulation elsewhere for a superfluous blank page issue. Perhaps it would have been easier than I thought to do via Linq to SQL but I like the properties and methods on the OM classes.
Thanks!
Hi Eric,
can you please give me pointers as to how to change that programmatically? I know there is a styles.xml and also a numbering.xml when I unzip a docx but interms of working with the SDK or Power Tools.
I have the following code for finding the highestListNumbers and making sure each merged doc has unique numering ids which is the code that successfuly resets the numering per document but the formatting is lost beyond the first doc.
int highestListNumbering = 0; foreach (var inputFile in inputFiles) { using (var wordDoc = inputFile.GetAsWordProcessingDocument()) { var listNumberings = wordDoc.MainDocumentPart.Document.Body.Descendants<NumberingId>().ToList(); foreach (var listNumbering in listNumberings) listNumbering.Val += highestListNumbering; logger.LogDebug(listNumberings.Count + " NumberingIds found"); var xdoc = wordDoc.MainDocumentPart.GetXDocument(); var nums = xdoc.Descendants(W.numId).ToList(); logger.LogDebug(nums.Count + " numberings found"); wordDoc.MainDocumentPart.Document.Save(); } }
The final part was my attempt to see if I use the XDocument if I would see more Numering ids but I get the same number as I am already affecting. How do I get a handle to the style information for the numberings? I omitted the equivalent abstract numbering code for brevity and also my sample docs never have abstract numberings so far.
Thanks
I spoke to soon. It broke the formatting on the lists like alignment and indentation. I guess that is stored in the document elsewhere and I need to update the ids elsewhere to match the new ones.
Thanks very much Eric.
I got this working without too much effort in the end. You were spot on. Much appreciated.
Hi Eric,
thanks for your reply. I am in the middle of trying to convert my code over to use OpenXmlRegex. I followed the link and also watched one of your YouTube videos about it and can’t get the replace to work despite the fact I think I am calling correctly. I can find matches with the regex but not replace the value. I am in the UK so at home now so will try again tomorrow when if the office. I think it might be because all my existing code was based on WordprocessingDocument and now I have had to call doc.MainDocumentPart.GetXDocument(); and manipulate that. Maybe it’s not persisting back and I have to save to the same stream?
public void ReplaceFirst(WordprocessingDocument doc, params KeyValuePair<string, string>[] kvps) { var xdoc = doc.MainDocumentPart.GetXDocument(); foreach (var kvp in kvps) { //OOXML library does not like null or empty string value = (kvp.Value == null || kvp.Value == string.Empty) ? " " : kvp.Value; logger.LogDebug("Applying value: [" + kvp.Key + "] " + value); //var content = doc.MainDocumentPart.Document.Body.Descendants<Text>(); var content = xdoc.Descendants(W.p); logger.LogDebug("Found " + content.Count() + " text elements to search"); //var regex = new Regex(VariablePrefix + kvp.Key + VariableSuffix); var regex = new Regex("contact"); logger.LogDebug(OpenXmlRegex.Match(content, regex) + " matches"); bool isFirstReplacement = true; OpenXmlRegex.Replace(content, regex, value, (xElement, match) => { if (isFirstReplacement) { isFirstReplacement = false; logger.LogDebug("Replaced match"); return true; } logger.LogDebug("Did not replace match"); return false; } ); //TextReplacer.SearchAndReplace(doc, VariablePrefix + kvp.Key + VariableSuffix, value, false); } }
Later on the WordprocessingDocument is saved via wordDoc.MainDocumentPart.Document.Save()
I am wondering if I need to do something to see changes in a WordProcessingDocument caused by OpenXmlRegex against an XDocument.
Thanks
Thanks Eric.
Bizarrely I replied to this the day after you posted and it never showed. I tried to post again immediately and the server said it detected a duplicate post and it never ever showed up!
Just wanted to say your response was really useful and much appreciated. I ended up creating a wrapper for conveniently being able to interchange between the formats. The only thing is the callis responsbile for disposing the Word doc etc. to get the bytes to udpdate like you mentioned:
public class InterchangeableWordProcessingDocument : IDisposable
{public MemoryStream memoryStream { get; private set; }
public InterchangeableWordProcessingDocument(string path)
{
var bytes = File.ReadAllBytes(path);
CreateMemoryStream(bytes);
}private MemoryStream CreateMemoryStream(byte[] bytes)
{
//Do not use byte array constructor as this is not resizable i.e. does not handle change.
memoryStream = new MemoryStream();
memoryStream.Write(bytes, 0, bytes.Length);
return memoryStream;
}public WordprocessingDocument GetAsWordProcessingDocument()
{
return WordprocessingDocument.Open(memoryStream, true);
}public WmlDocument GetAsWmlDocument()
{
return new WmlDocument(“dummy”, memoryStream.ToArray());
}public void Dispose()
{
memoryStream.Dispose();
}
}While I’m posting for my particular task it would have been great if the TextReplacer class exposed the method that does all the hard work on an individual element rather than the only public method being to replace all instances in a whole document. Sorry I forget the method name but it looked it would just be a case of changing the accessor. I ended up having to put in code to manually handle replacing values like <<myvariable>> becuase the << and >> would sometimes be broken into 2 or 3 elements but I didn’t want to replace all instances in my particualr scenario (different parts of the document were for different people and had different values based for variables based on the person). I got the impressiong TextReplacer had had a lot of work and pain to handle these types of things.
Thanks again for your speedy and useful response.
-
AuthorPosts