Eric White » Topic: Adding/Getting Comments based on character position

Adding/Getting Comments based on character position

Tagged: comments

This topic contains 3 replies, has 2 voices, and was last updated by Eric White 8 years, 7 months ago.

Viewing 4 posts - 1 through 4 (of 4 total)

Author

Posts
September 15, 2016 at 2:04 am #3789

iunknown
Participant

I’m tasked with highlighting and adding new comments to an OpenXML document using it’s Plain Text version (We’re not supporting comments on non-text elements)

I can pull the plain text version of the docx and I can enumerate the comments, but I’m at a loss on how to get the start and stop position of the comment.

I have a feeling I’m missing something simple. Do I look at the comment parent and then there’s some offset to get it’s position?

Or do I have to start at the top of the document and as I pull out the plain text and then ‘remember the start and stop’ when I run across a comment instead of attempting to using the comments subsection?

Eric, can you point me in the right direction?

thanks,

Gene

September 15, 2016 at 11:59 am #3790

Eric White
Keymaster

Hi Gene,

I’m not fully clear on your question.

Comments have markup in the main document part to indicate the start and end of the location of the comment (w:commentRangeStart, w:commentRangeEnd). These elements are situated at the specific location in the document. Then the actual text of the comment is in the comments part, which you must find by following the location.

You will be interested in the following screen-cast:

How to Research Open XML Markup

Key point of that screen-cast: create a word document (without comment), copy the document, in the copy, insert a comment, then use the Open XML SDK productivity tool to compare the two. This will teach you about comment markup.

Cheers, Eric

September 15, 2016 at 1:48 pm #3794

iunknown
Participant

Thank you Eric,

What I’m trying to do is pull the plain text and the comments out of the document, which I can do.
The problem I’m having is knowing where a comment starts and stops in the plain text version.

I did find the CommentRangeStart and Stop but they don’t expose a document position, that I could find.

>>These elements are situated at the specific location in the document.
Based on this statement, I think the approach I have to take is while extracting the plain text, record the current position of any comment starts and stops…

But that doesn’t work because of the nested nature of OpenXML.

ugh.

September 16, 2016 at 1:04 pm #3798

Eric White
Keymaster

Yes, you are right, there is not an easy way to get the document position.

In a recent project (WmlComparer), a module that compares two DOCX files and produces a new document that contains the precise differences between them (with certain restrictions), I transform the DOCX into a new form that is an array of the precise content of the document. Each character and image in the document occupy a single element of the array. This array is put together in such a way that it is possible to reconstruct a valid Open XML document from it. This approach resolves the problems associated with the nested nature of Open XML. You may be interested in watching this screen-cast:

Introducing WmlComparer, a Module in Open-Xml-PowerTools

It’s a bit long of a screen-cast, but it can illuminate the proper approach to dealing with this issue.

I have in mind a generalization of that approach so that developers can do the type of operations that you want to do, i.e. count specific characters, insert comments at any specific point easily, and so on. Writing WmlComparer really helped me to formalize my thoughts about this issue.

Cheers, Eric
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.

Eric White's Blog

Adding/Getting Comments based on character position

Forums

Developer Content

User

Blog TOC

Archives

Categories

Search