LINQ to XML for JavaScript – Gaining Performance through Atomization
Return to the
Open XML and JavaScript
Developer Center
One of the interesting characteristics of LINQ to XML as implemented in the .NET framework is that it uses atomized XNamespace and XName objects, and this means that LINQ to XML has very good performance for many common scenarios. The Ltxml.js library also uses
an approach of atomizing XNamespace and XName objects, and for the exact same reason. In this post, I’m going to explain exactly what I mean by this, and how this works in the JavaScript implementation (which is similar in design, but different in mechanics from the .NET LINQ to XML library).
As an aside, I am starting an Open XML / JavaScript Resource Page here on OpenXMLDeveloper.org. I will be listing all JavaScript-related blog posts and screen-casts on this page. I think that developers will soon find that using JavaScript to implement Open XML functionality has a large number of benefits. I expect that the number of blog posts and screen-casts will rapidly multiply on this page.
What is XNamespace and XName Atomization?
First I am going to explain what atomization is, and why we like it.
The key point about atomization is that if two variables are initialized with the name namespace and local name, then they are in fact they are initialized with the exact same object. The way this happens is that when you initialize an XName object with a given namespace and name, first Ltxml looks in a name cache and determines if that namespace and name exist in the cache. If they do, then Ltxml returns the object in the cache. If that namespace and name do not exist in the cache, then Ltxml initializes an object, puts it into the cache, and then returns that object.
The reason that we do this is that if then we compare two XName objects together to see if they have the same namespace and local name, all we need to do is to see if they are the same object. In JavaScript, we can use the === comparison operator.
var ns1 = new XNamespace(“http://www.ericwhite.com”);
var n1 = new XName(ns1 + “root”);
var n2 = new XName(ns1 + “root”);
var b = n1 === n2; // b is set to true, and this === operator executes extremely quickly
Where this really comes in handy is in the axis methods, such as the elements method or the descendants method. If we look in the Ltxml.XContainer.prototype.elements method in ltxml.js, we see the comparison between the passed-in XName object and the XName object of each node in the list of children nodes. Key point is that this
comparison is fast, so the elements axis method performs well.
About the Mechanics
In the .NET implementation of LINQ to XML, you find that the constructors for the XNamespace and XName classes are protected, and you can’t call them directly. This is necessary because there is no way to use the constructor to return an atomized object. The constructor will always create a new object. So instead, LINQ to XML in .NET uses an implicit conversion from string to XNamespace, so that you can initialize an XNamespace object by assigning a string to it. The atomization happens in that implicit conversion code. If the namespace already exists in the cache, then the implicit conversion returns the object in the cache.
However, JavaScript does not have implicit conversions. But JavaScript does have provisions in a constructor function to specify exactly which object is returned by the constructor function. If you look at the Ltxml.XName function, you can see that it first looks for the namespace/name combination in the Ltxml.nameCache property, and if it finds it, then the constructor returns the object in the cache. Otherwise the constructor creates a new object, adds it to the cache, and then returns it.
The nameCache is an ordinary JavaScript objects where we use the object as a hash table. The property names of its properties are the expanded names, which means that the namespace is enclosed in curly braces, and the name follows the namespace. Those property names look like this:
{http://www.ericwhite.com}root
{http://www.ericwhite.com}child
{http://www.ericwhite.com}anotherElement
The values of those properties are the atomized name objects.
Since you have the source code for Ltxml.js, you can go look at the code, and see how this works.