Php Domdocument / Xpath: Get Html-text And Surrounded Tags
I am looking for this functionality: Given is this html-Page:
Hello, world!
I want to get an array that onlSolution 1:
You can iterate over the parentNodes of the DOMText nodes:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$textNodes = array();
foreach($xpath->query('/html/body//text()') as$i => $textNode) {
$textNodes[$i] = array(
'text' => $textNode->nodeValue,
'parents' => array()
);
for (
$currentNode = $textNode->parentNode;
$currentNode->parentNode;
$currentNode = $currentNode->parentNode
) {
$textNodes[$i]['parents'][] = $currentNode->nodeName;
}
}
print_r($textNodes);
Note that loadHTML
will add implied elements, e.g. it will add html and head elements which you will have to take into account when using XPath. Also note that any whitespace used for formatting is considered a DOMText so you will likely get more elements than you expect. If you only want to query for non-empty DOMText nodes use
/html/body//text()[normalize-space(.) != ""]
Solution 2:
In your sample code, $res=$xpath->query("//body//*/text()")
is a DOMNodeList
of DOMText
nodes. For each DOMText
, you can access the containing element via the parentNode
property.
Post a Comment for "Php Domdocument / Xpath: Get Html-text And Surrounded Tags"