Get Highlighted Text from .docx
Home › Forums › Open-Xml-Sdk › Get Highlighted Text from .docx
This topic contains 3 replies, has 2 voices, and was last updated by Eric White 8 years, 6 months ago.
-
AuthorPosts
-
June 7, 2016 at 7:23 pm #3442
I have been trying to get all highlighted text from a .docx but it fails to find any. My code is as follows
Dim htext As IEnumerable(Of Highlight) = wordDocument.MainDocumentPart.Document.Descendants(Of Highlight)().Where(Function(h) h.Val = “Yellow”).ToList()
This returns a collection of items but InnerText is “”. Therefore my next statement Returns nothing.
For Each e in htext
Dim docHightext As New ParagraphText()
Dim highText As String = “”
highText = e.InnerText
docHighText.FieldText = highText
If e.InnerText <> “” Then
paratext.Add(docHighText)
End IF
NextCan you help me out?
June 13, 2016 at 3:53 pm #3468It is quite a bit more complicated than the approach you are taking. You are selected the descendant ‘w:highlight’ elements, but this is not where the text is stored. The text is in the w:t element that is inside a w:r element that contains the w:rPr element (the run properties), which contains the w:highlight element.
<w:p> <w:r> <w:rPr> <w:highlight w:val="yellow"/> </w:rPr> <w:t>Test</w:t> </w:r> </w:p>
You have to first select the runs that have the w:rPr elements that contain the w:highlight element with your desired value. Then after selecting those runs, you have to select the child w:t elements (and there may be multiple) that contain the actual text. To complicate matters further, that highlight element may be in the run properties in a style, so you would have to look at the style part, find the style, and see if the w:highlight element is in the run props for a style. Also, that character style may itself derive from another character style, where the w:highlight element is defined.
Document formats are complicated, and for good reason – the structure of the documents themselves are complicated.
I recommend that you watch the Introduction to Open XML screen-cast series. After you have watched those screen-casts, then watch the Introduction to WordprocessingML screen-cast series.
June 13, 2016 at 5:10 pm #3471Ok,
So I got an answer off stackflow with this code.
Still an issue with it finding my Run Properties as Nothing. WHen I open my document in OpenXml Productivity tool I see this.
<w:p w:rsidRPr=”00AA7ABD” w:rsidR=”00710260″ w:rsidP=”00710260″ w:rsidRDefault=”006B1119″>
<w:pPr>
<w:spacing w:line=”240″ w:lineRule=”auto” />
<w:ind w:firstLine=”720″ />
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
</w:pPr>
<w:proofErr w:type=”gramStart” />
<w:r w:rsidRPr=”00AA7ABD”>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t>Zz5</w:t>
</w:r>
<w:r w:rsidRPr=”00AA7ABD” w:rsidR=”00710260″>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t>TT-1.</w:t>
</w:r>
<w:proofErr w:type=”gramEnd” />
<w:r w:rsidRPr=”00AA7ABD” w:rsidR=”00710260″>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t xml:space=”preserve”> This is </w:t>
</w:r>
<w:proofErr w:type=”spellStart” />
<w:r w:rsidRPr=”00AA7ABD” w:rsidR=”00CC1B4F”>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t>ttttt</w:t>
</w:r>
<w:r w:rsidRPr=”00AA7ABD” w:rsidR=”00710260″>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t>my</w:t>
</w:r>
<w:proofErr w:type=”spellEnd” />
<w:r w:rsidRPr=”00AA7ABD” w:rsidR=”00710260″>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t xml:space=”preserve”> test paragraph test paragraph </w:t>
</w:r>
<w:proofErr w:type=”gramStart” />
<w:r w:rsidRPr=”00AA7ABD” w:rsidR=”00710260″>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t>This</w:t>
</w:r>
<w:proofErr w:type=”gramEnd” />
<w:r w:rsidRPr=”00AA7ABD” w:rsidR=”00710260″>
<w:rPr>
<w:highlight w:val=”yellow” />
</w:rPr>
<w:t xml:space=”preserve”> is my test paragraph test paragraph.</w:t>
</w:r>
</w:p>I included the snippet of code below.
Private Function GetListOfHighlightedString(ByVal Docx As WordprocessingDocument) As List(Of String)
Dim lstOfHighlightedString As List(Of String) = New List(Of String)()
Try
For Each EachRun In Docx.MainDocumentPart.Document.Body.Descendants(Of Run)()
If EachRun.RunProperties IsNot Nothing Then
For Each EachPrpChild In EachRun.RunProperties.ChildElements
If TypeOf EachPrpChild Is Highlight Then
Dim highlightVal As Highlight = TryCast(EachPrpChild, Highlight)
If highlightVal.Val.Equals(HighlightColorValues.Yellow) Then
lstOfHighlightedString.Add(EachRun.InnerText)
End If
End If
Next EachPrpChild
End If
Next EachRun
Catch e1 As ExceptionThrow
End Try
Return lstOfHighlightedStringJune 15, 2016 at 4:53 am #3487Yes, sometimes there are no run properties for a run, in which case it uses the run properties from the style, and then from the global defaults. This is valid Open XML, and your code should be prepared to handle this.
-Eric
-
AuthorPosts
You must be logged in to reply to this topic.