SimplifyMarkup – Saved file corrupted
Home › Forums › Open-Xml-PowerTools › SimplifyMarkup – Saved file corrupted
Tagged: Error, SimplifyMarkup
This topic contains 3 replies, has 2 voices, and was last updated by FRCMNS0 7 years, 11 months ago.
-
AuthorPosts
-
November 28, 2016 at 1:02 pm #3980
Hello,
I am running into a problem when running the MarkupSimplifier.SimplifyMarkup method and saving the document. The code used is this one:using (var docMaster = WordprocessingDocument.Open("PORTA_copy.docx", true)) { SimplifyMarkupSettings settings = new SimplifyMarkupSettings { NormalizeXml = true, // Merges Run's in a paragraph with similar formatting // Additional settings if required RemoveBookmarks = true, RemoveComments = true, RemoveGoBackBookmark = true, RemoveWebHidden = true, RemoveContentControls = true, RemoveEndAndFootNotes = true, //RemoveFieldCodes = true, RemoveLastRenderedPageBreak = true, RemovePermissions = true, RemoveProof = true, RemoveRsidInfo = true, RemoveSmartTags = true, RemoveSoftHyphens = true, }; MarkupSimplifier.SimplifyMarkup(docMaster, settings); docMaster.Save(); }
The PORTA_copy.docx file (created on Word 2016) contains only one word “PORTA” and is segmented like that on the internal document.xml:
<w:r w:rsidRPr="00F74B85"> <w:rPr> <w:color w:val="FF0000"/> </w:rPr> <w:t>PO</w:t> <w:t>R</w:t> <w:t>TA</w:t> </w:r>
My intention is to group the word together using SimplifyMarkup.
After the above code runs, the new document.xml section of the word looks like this:
<w:r><w:rPr><w:color w:val="FF0000" /></w:rPr><w:t>PORTA</w:t></w:r>
OK, That’s what I wanted. However, when i try to open the docx on Word (again testing with the 2016 version), it shows this error:
The XML data is invalid according to the schema Location: Part: /word/styles.xml, Line: 0, Column: 0
It shows an option to repair, but it’s clear that something is wrong.
What is the problem with this code?
Here is a sample project with the test docx and a minimal console application.
November 28, 2016 at 4:45 pm #3981As an addendum, the extra SimplifyMarkupSettings options (everything besides NormalizeXml) doesn’t cause errors, only when NormalizeXml is set this problem occurs.
November 28, 2016 at 5:37 pm #3982Hi,
I think that there is something else causing this problem, not MarkupSimplifier.
You are getting a failure in parsing the xml in the /word/styles.xml file, not the main document part, which is what NormalizeXml operates on. It looks as though your styles.xml file maybe doesn’t have anything in it, which could be caused by any of a variety of things, but probably not by MarkupSimplifier, not to say that MarkupSimplifier doesn’t modify styles.xml – it might, I can’t recall, but this is not the first place I’d look for this bug. I’d look for what is writing to styles.xml, and see why the XML parser is failing on it.
You can also manually examine the styles.xml file using the Open XML Package Editor Add-In for Visual Studio. That may provide a clue as to why the parser is failing on reading the styles.xml part.
Best, Eric
November 28, 2016 at 6:10 pm #3983In this case, I am sure that MarkupSimplifier is modifying the styles.xml file. Here is the entire sample code used:
using DocumentFormat.OpenXml.Packaging; using OpenXmlPowerTools; using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; namespace DocxTest { class Program { static void Main(string[] args) { try { File.Copy("PORTA.docx", "PORTA_copy.docx"); using (var docMaster = WordprocessingDocument.Open("PORTA_copy.docx", true)) { SimplifyMarkupSettings settings = new SimplifyMarkupSettings { NormalizeXml = true, // Merges Run's in a paragraph with similar formatting // Additional settings if required RemoveBookmarks = true, RemoveComments = true, RemoveGoBackBookmark = true, RemoveWebHidden = true, RemoveContentControls = true, RemoveEndAndFootNotes = true, //RemoveFieldCodes = true, RemoveLastRenderedPageBreak = true, RemovePermissions = true, RemoveProof = true, RemoveRsidInfo = true, RemoveSmartTags = true, RemoveSoftHyphens = true, }; MarkupSimplifier.SimplifyMarkup(docMaster, settings); docMaster.Save(); } Console.WriteLine("Done."); } catch(Exception ex) { Console.WriteLine("Error: {0}", ex.ToString()); } Console.ReadLine(); } } }
There’s nothing else being done to the document.
Most of the differences are a extra space before closing a tag or reordered attributes.The major change is right on the beginning of the file, mostly additional namespace declarations.
Here is a WinMerge report with the differences highlighted:
https://drive.google.com/file/d/0B0ZNalzpb4uFRjdndWFidTduME0/view?usp=sharing -
AuthorPosts
You must be logged in to reply to this topic.