php - XPath : Parsing a page
one text
Solution:
My knowledge is somewhat limited in PHP but you can try :
<?php
$html = <<<'HTML'
<div class="area">Area One</div>
<div class="key">AAA</div>
<div class="value">BBB</div>
<div class="key">CCC</div>
<div class="value">DDD</div>
<div class="key">EEE</div>
<div class="value">FFF</div>
<div class="area">Area Two</div>
<div class="key">GGG</div>
<div class="value">HHH</div>
<div class="key">III</div>
<div class="value">JJJ</div>
<div class="key">KKK</div>
<div class="value">LLL</div>
HTML;
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$nbarea = count($xpath->query('//*[contains(text(),"Area")]'));
$i=1;
$j=1;
for ($a = 1; $a <= $nbarea; $a++) {
for ($b = 1; $b <= 3; $b++) {
$element1 = $xpath->query('//*[contains(text(),"Area")]['.$i.']/following::div['.$j.']');
$j++;
$element2 = $xpath->query('//*[contains(text(),"Area")]['.$i.']/following::div['.$j.']');
$h1 = $element1->item(0)->nodeValue;
$h2 = $element2->item(0)->nodeValue;
$area[$i-1][$h1] = $h2;
$j++;
}
$i++;
$j=1;
}
print_r($area)
?>
Output :
Array
(
[0] => Array
(
[AAA] => BBB
[CCC] => DDD
[EEE] => FFF
)
[1] => Array
(
[GGG] => HHH
[III] => JJJ
[KKK] => LLL
)
)
Side note : I've assumed you always have the same number of elements for each area (=3).
Source