Custom Formatting of XML using LINQ to XML
On StackOverflow, there is a question (posted by Otaku, an online friend of mine for some time) about how to serialize multiple XML elements on the same line. It is a very interesting question. After going down a couple of dead-ends, I realized that it is pretty easy to iterate through an XML tree and do all of the writing to an XmlWriter explicitly, bypassing all of LINQ to XML’s logic for serializing through an XmlWriter. This lets us do just about anything we want to do to the indentation of the XML, while still letting the XmlWriter class do all of the serializing of the XML itself. Some folks at StackOverflow suggested post-processing the XML, but I know from hard experience that it is very difficult to post-process XML and really get it right, including handling CData elements, and etc. By letting the XmlWriter class do all of the output of XML, while injecting just a bit of white space in the right places, we can be confident of the validity of the XML.
His question: he has XML that looks like this:
<Canvas>
<Grid>
<TextBlock>
<Run Text="r"/>
<Run Text="u"/>
<Run Text="n"/>
</TextBlock>
<TextBlock>
<Run Text="far a"/>
<Run Text="way"/>
<Run Text=" from me"/>
</TextBlock>
</Grid>
<Grid>
<TextBlock>
<Run Text="I"/>
<Run Text=" "/>
<Run Text="want"/>
<LineBreak/>
</TextBlock>
<TextBlock>
<LineBreak/>
<Run Text="...thi"/>
<Run Text="s to"/>
<LineBreak/>
<Run Text=" work"/>
</TextBlock>
</Grid>
</Canvas>
He wants to format it so that it looks like this:
<Canvas>
<Grid>
<TextBlock>
<Run Text="r"/><Run Text="u"/><Run Text="n"/>
</TextBlock>
<TextBlock>
<Run Text="far a"/><Run Text="way"/><Run Text=" from me"/>
</TextBlock>
</Grid>
<Grid>
<TextBlock>
<Run Text="I"/><Run Text=" "/><Run Text="want"/>
<LineBreak/>
</TextBlock>
<TextBlock>
<LineBreak/>
<Run Text="...thi"/><Run Text="s to"/>
<LineBreak/>
<Run Text=" work"/>
</TextBlock>
</Grid>
</Canvas>
The reason he wants to do this is because of some fairly obscure semantics of XAML for Silverlight 3. Read his question on StackOverflow for more detail.
I posted code on StackOverflow that shows how to do that specialized serialization using VB.NET. Of course, actually I wrote the code first in C#, and then after getting it all working, I translated to VB.NET. This post presents the C# code.
The key to solving this problem is to write a recursive function that iterates through the XML tree, writing the various elements and attributes to specially created XmlWriter objects. There is an ‘outer’ XmlWriter object that writes indented XML, and an ‘inner’ XmlWriter object that writes non-indented XML.
The recursive function initially uses the ‘outer’ XmlWriter, writing indented XML, until it sees the TextBlock element (an element that triggers a desired change in the indenting behavior). When it encounters the TextBlock element, it creates the ‘inner’ XmlWriter object, writing the child elements of the TextBlock element to it. It also writes custom white space to the ‘inner’ XmlWriter.
When the ‘inner’ XmlWriter object is finished with writing the TextBlock element, the text that the ‘inner’ writer wrote is written to the ‘outer’ XmlWriter using the WriteRaw method.
As I mentioned, the advantages of this approach is that there is no post-processing of the XML. It is extremely difficult to post-process XML and be certain that you have properly handled all cases, including arbitrary text in CData nodes, etc. All of the XML is written using only the XmlWriter class, thereby ensuring that this will always write valid XML. The only exception to this is the specially crafted white-space that is written using the WriteRaw method, which achieves the desired indenting behavior.
One key point is that the ‘inner’ XmlWriter object’s conformance level is set to ConformanceLevel.Fragment, because the ‘inner’ XmlWriter needs to write XML that does not have a root element.
To achieve the desired formatting of Run elements (i.e. Run elements that are adjacent have no insignificant white space between them), the code uses the GroupAdjacent extension method. Some time ago, I write a blog post on the GroupAdjacent extension method.
Here is the C# code to do the specialized formatting:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
public static class Extensions
{
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
}
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource>
System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}
class Program
{
static void WriteStartElement(XmlWriter writer, XElement e)
{
XNamespace ns = e.Name.Namespace;
writer.WriteStartElement(e.GetPrefixOfNamespace(ns),
e.Name.LocalName, ns.NamespaceName);
foreach (var a in e.Attributes())
{
ns = a.Name.Namespace;
string localName = a.Name.LocalName;
string namespaceName = ns.NamespaceName;
writer.WriteAttributeString(
e.GetPrefixOfNamespace(ns),
localName,
namespaceName.Length == 0 && localName == "xmlns" ?
XNamespace.Xmlns.NamespaceName :
namespaceName,
a.Value);
}
}
public static void WriteElement(XmlWriter writer, XElement e)
{
if (e.Name == "TextBlock")
{
WriteStartElement(writer, e);
writer.WriteRaw(Environment.NewLine);
// Create an XML writer that outputs no insignificant white space so that we can
// write to it and explicitly control white space.
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = false;
settings.OmitXmlDeclaration = true;
settings.ConformanceLevel = ConformanceLevel.Fragment;
StringBuilder sb = new StringBuilder();
using (XmlWriter newXmlWriter = XmlWriter.Create(sb, settings))
{
// Group adjacent runs so that they can be output with no whitespace between them
var groupedRuns = e.Nodes().GroupAdjacent(n =>
{
XElement element = n as XElement;
if (element != null && element.Name == "Run")
return true;
return false;
});
foreach (var g in groupedRuns)
{
if (g.Key == true)
{
// Write white space so that the line of Run elements is properly indented.
newXmlWriter.WriteRaw("".PadRight((e.Ancestors().Count() + 1) * 2));
foreach (var run in g)
run.WriteTo(newXmlWriter);
newXmlWriter.WriteRaw(Environment.NewLine);
}
else
{
foreach (var g2 in g)
{
// Write some white space so that each child element is properly indented.
newXmlWriter.WriteRaw("".PadRight((e.Ancestors().Count() + 1) * 2));
g2.WriteTo(newXmlWriter);
newXmlWriter.WriteRaw(Environment.NewLine);
}
}
}
}
writer.WriteRaw(sb.ToString());
writer.WriteRaw("".PadRight(e.Ancestors().Count() * 2));
writer.WriteEndElement();
}
else
{
WriteStartElement(writer, e);
foreach (var n in e.Nodes())
{
XElement element = n as XElement;
if (element != null)
{
WriteElement(writer, element);
continue;
}
n.WriteTo(writer);
}
writer.WriteEndElement();
}
}
static string ToStringWithCustomWhiteSpace(XElement element)
{
// Create XmlWriter that indents.
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.OmitXmlDeclaration = true;
StringBuilder sb = new StringBuilder();
using (XmlWriter xmlWriter = XmlWriter.Create(sb, settings))
WriteElement(xmlWriter, element);
return sb.ToString();
}
static void Main(string[] args)
{
XElement root = XElement.Parse(
@"<Canvas a='1'>
<Grid>
<TextBlock>
<Run Text='r'/>
<Run Text='u'/>
<Run Text='n'/>
</TextBlock>
<TextBlock>
<Run Text='far a'/>
<Run Text='way'/>
<Run Text=' from me'/>
</TextBlock>
</Grid>
<Grid>
<TextBlock>
<Run Text='I'/>
<Run Text=' '/>
<Run Text='want'/>
<LineBreak/>
</TextBlock>
<TextBlock>
<LineBreak/>
<Run Text='...thi'/>
<Run Text='s to'/>
<LineBreak/>
<Run Text=' work'/>
</TextBlock>
</Grid>
</Canvas>");
Console.WriteLine(ToStringWithCustomWhiteSpace(root));
}
}
And for completeness, here is the VB code:
Imports System.Text
Imports System.Xml
Public Class GroupOfAdjacent(Of TElement, TKey)
Implements IEnumerable(Of TElement)
Private _key As TKey
Private _groupList As List(Of TElement)
Public Property GroupList() As List(Of TElement)
Get
Return _groupList
End Get
Set(ByVal value As List(Of TElement))
_groupList = value
End Set
End Property
Public ReadOnly Property Key() As TKey
Get
Return _key
End Get
End Property
Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of TElement) _
Implements System.Collections.Generic.IEnumerable(Of TElement).GetEnumerator
Return _groupList.GetEnumerator
End Function
Public Function GetEnumerator1() As System.Collections.IEnumerator _
Implements System.Collections.IEnumerable.GetEnumerator
Return _groupList.GetEnumerator
End Function
Public Sub New(ByVal key As TKey)
_key = key
_groupList = New List(Of TElement)
End Sub
End Class
Module Module1
<System.Runtime.CompilerServices.Extension()> _
Public Function GroupAdjacent(Of TElement, TKey)(ByVal source As IEnumerable(Of TElement), _
ByVal keySelector As Func(Of TElement, TKey)) As List(Of GroupOfAdjacent(Of TElement, TKey))
Dim lastKey As TKey = Nothing
Dim currentGroup As GroupOfAdjacent(Of TElement, TKey) = Nothing
Dim allGroups As List(Of GroupOfAdjacent(Of TElement, TKey)) = New List(Of GroupOfAdjacent(Of TElement, TKey))()
For Each item In source
Dim thisKey As TKey = keySelector(item)
If lastKey IsNot Nothing And Not thisKey.Equals(lastKey) Then
allGroups.Add(currentGroup)
End If
If Not thisKey.Equals(lastKey) Then
currentGroup = New GroupOfAdjacent(Of TElement, TKey)(keySelector(item))
End If
currentGroup.GroupList.Add(item)
lastKey = thisKey
Next
If lastKey IsNot Nothing Then
allGroups.Add(currentGroup)
End If
Return allGroups
End Function
Public Sub WriteStartElement(ByVal writer As XmlWriter, ByVal e As XElement)
Dim ns As XNamespace = e.Name.Namespace
writer.WriteStartElement(e.GetPrefixOfNamespace(ns), _
e.Name.LocalName, ns.NamespaceName)
For Each a In e.Attributes
ns = a.Name.Namespace
Dim localName As String = a.Name.LocalName
Dim namespaceName As String = ns.NamespaceName
writer.WriteAttributeString( _
e.GetPrefixOfNamespace(ns), _
localName, _
IIf(namespaceName.Length = 0 And localName = "xmlns", _
XNamespace.Xmlns.NamespaceName, namespaceName),
a.Value)
Next
End Sub
Public Sub WriteElement(ByVal writer As XmlWriter, ByVal e As XElement)
If (e.Name = "TextBlock") Then
WriteStartElement(writer, e)
writer.WriteRaw(Environment.NewLine)
' Create an XML writer that outputs no insignificant white space so that we can
' write to it and explicitly control white space.
Dim settings As XmlWriterSettings = New XmlWriterSettings()
settings.Indent = False
settings.OmitXmlDeclaration = True
settings.ConformanceLevel = ConformanceLevel.Fragment
Dim sb As StringBuilder = New StringBuilder()
Using newXmlWriter As XmlWriter = XmlWriter.Create(sb, settings)
' Group adjacent runs so that they can be output with no whitespace between them
Dim groupedRuns = e.Nodes().GroupAdjacent( _
Function(n) As Boolean?
If TypeOf n Is XElement Then
Dim element As XElement = n
If element.Name = "Run" Then
Return True
End If
Return False
End If
Return False
End Function)
For Each g In groupedRuns
If g.Key = True Then
' Write white space so that the line of Run elements is properly indented.
newXmlWriter.WriteRaw("".PadRight((e.Ancestors().Count() + 1) * 2))
For Each run In g
run.WriteTo(newXmlWriter)
Next
newXmlWriter.WriteRaw(Environment.NewLine)
Else
For Each g2 In g
' Write some white space so that each child element is properly indented.
newXmlWriter.WriteRaw("".PadRight((e.Ancestors().Count() + 1) * 2))
g2.WriteTo(newXmlWriter)
newXmlWriter.WriteRaw(Environment.NewLine)
Next
End If
Next
End Using
writer.WriteRaw(sb.ToString())
writer.WriteRaw("".PadRight(e.Ancestors().Count() * 2))
writer.WriteEndElement()
Else
WriteStartElement(writer, e)
For Each n In e.Nodes
If TypeOf n Is XElement Then
Dim element = n
WriteElement(writer, element)
Continue For
End If
n.WriteTo(writer)
Next
writer.WriteEndElement()
End If
End Sub
Function ToStringWithCustomWhiteSpace(ByVal element As XElement) As String
' Create XmlWriter that indents.
Dim settings As XmlWriterSettings = New XmlWriterSettings()
settings.Indent = True
settings.OmitXmlDeclaration = True
Dim sb As StringBuilder = New StringBuilder()
Using xmlWriter As XmlWriter = xmlWriter.Create(sb, settings)
WriteElement(xmlWriter, element)
End Using
Return sb.ToString()
End Function
Sub Main()
Dim myXML As XElement = _
<Canvas>
<Grid>
<TextBlock>
<Run Text='r'/>
<Run Text='u'/>
<Run Text='n'/>
</TextBlock>
<TextBlock>
<Run Text='far a'/>
<Run Text='way'/>
<Run Text=' from me'/>
</TextBlock>
</Grid>
<Grid>
<TextBlock>
<Run Text='I'/>
<Run Text=' '/>
<Run Text='want'/>
<LineBreak/>
</TextBlock>
<TextBlock>
<LineBreak/>
<Run Text='...thi'/>
<Run Text='s to'/>
<LineBreak/>
<Run Text=' work'/>
</TextBlock>
</Grid>
</Canvas>
Console.Write(ToStringWithCustomWhiteSpace(myXML))
Console.ReadLine()
End Sub
End Module