Yes, you are right, there is not an easy way to get the document position.
In a recent project (WmlComparer), a module that compares two DOCX files and produces a new document that contains the precise differences between them (with certain restrictions), I transform the DOCX into a new form that is an array of the precise content of the document. Each character and image in the document occupy a single element of the array. This array is put together in such a way that it is possible to reconstruct a valid Open XML document from it. This approach resolves the problems associated with the nested nature of Open XML. You may be interested in watching this screen-cast:
Introducing WmlComparer, a Module in Open-Xml-PowerTools
It’s a bit long of a screen-cast, but it can illuminate the proper approach to dealing with this issue.
I have in mind a generalization of that approach so that developers can do the type of operations that you want to do, i.e. count specific characters, insert comments at any specific point easily, and so on. Writing WmlComparer really helped me to formalize my thoughts about this issue.
Cheers, Eric