Announcing a Complete Re-Write of ListItemRetriever.cs

The ListItemRetriever.cs module is a crucial part of the HtmlConverter.cs module.  Without a completely accurate and robust ListItemRetriever.cs module, it isn’t possible to write high fidelity transforms of WordprocessingML (DOCX) to Html/CSS.

About three weeks ago, I had the good fortune to receive 4400 documents from an Open XML developer, each of which uses numbering in interesting ways.  I wrote a small program to use the ListItemRetreiver.cs module to retrieve every list item and compare the retrieved value with the list item as rendered by Word.  I wrote the ListItemRetriever.cs module several years ago, and for the
most part, it was pretty good.  However, I found several places where it fell down.  I proceeded to patch it, and then discovered that I needed to restructure it – to rewrite it.  After the re-write, it was better, but I wasn’t satisfied, so I re-wrote it again.  And then again!  And now I am happy with the result.  The new list item retriever accurately retrieves the list items for all 4400 documents, as well as 10’s of thousands of other sample documents that I have accumulated over the last three years.  The code is smaller, more accurate, and faster than the old version – so the considerable mental pain that I’ve gone through over the last three weeks was worth it.  It means that we have a good ListItemRetriever module for the future that
is a core part of the foundation for the new high-fidelity HtmlConverter.

One thing that I didn’t change was the programming interface to GetListItemText.  A few weeks ago, I asked
for volunteers to write implementations of GetListItemText for the many languages and cultures that Word supports. 
We’re making progress, but we need more volunteers for implementations of languages.  In the linked
blog post, you can see the implementations that we have so far, and the implementations that folks have volunteered for, so please pick the language of your choice, and help the new high fidelity HtmlConverter support your native language / culture.

One more change to this new release of PowerTools for OpenXML – it supports RTL languages in a better way. 
The support for RTL isn’t complete yet, but it is better.

I have a ways to go before I am going to be happy with HtmlConverter.  I am pretty ambitious with what I want it to do.  This new release, 2.7.00, is a big step in the right direction.  You can find the source code at https://github.com/OfficeDev/Open-Xml-PowerTools.

One last note: I am going to do a comprehensive screen-cast that details everything that I’ve learned about Open XML numbering, and how numbering is implemented in ListItemRetriever.cs.  Should have it published in a week or two .