Xpath To First Occurrence Of Element With Text Length >= 200 Characters
How do I get the first element that has an inner text (plain text, discarding other children) of 200 or more characters in length? I'm trying to create an HTML parser like Embed.ly
Solution 1:
Use:
(//*[not(self::script orself::style)]/text()[string-length() > 200])[1]
Note: In case the document is an XHTML document (and that means all elements are in the xhrml namespace), the above expression should be specified as:
(//*[not(self::x:script orself::x:style)]/text()[string-length() > 200])[1]
where the prefix "x:"
must be bound to the XHTML namespace -- "http://www.w3.org/1999/xhtml"
(or as many XPath APIs call this -- the namespace must be "Registered" with this prefix)
Solution 2:
I meant something like this:
root.SelectNodes("html/body/.//*[(name() !='script') and (name()!='style')]/text()[string-length() > 200]")
Seems to work pretty well.
Solution 3:
HTML is not XML. You should not use XML parsers to parse HTML period. They are two different things entirely, and your parser will choke out the first time you see html that's not well formed XML.
You should find an opensource HTML parser instead of rolling your own.
Post a Comment for "Xpath To First Occurrence Of Element With Text Length >= 200 Characters"