php - XPath : Parsing a page

one text

Solution:

My knowledge is somewhat limited in PHP but you can try :

<?php
$html = <<<'HTML'
<div class="area">Area One</div>
<div class="key">AAA</div>
<div class="value">BBB</div>
<div class="key">CCC</div>
<div class="value">DDD</div>
<div class="key">EEE</div>
<div class="value">FFF</div>
<div class="area">Area Two</div>
<div class="key">GGG</div>
<div class="value">HHH</div>
<div class="key">III</div>
<div class="value">JJJ</div>
<div class="key">KKK</div>
<div class="value">LLL</div>
HTML;
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);

$nbarea = count($xpath->query('//*[contains(text(),"Area")]'));

$i=1;
$j=1;

for ($a = 1; $a <= $nbarea; $a++) {

    for ($b = 1; $b <= 3; $b++) {
        $element1 = $xpath->query('//*[contains(text(),"Area")]['.$i.']/following::div['.$j.']');
        $j++;
        $element2 = $xpath->query('//*[contains(text(),"Area")]['.$i.']/following::div['.$j.']');

        $h1 = $element1->item(0)->nodeValue;
        $h2 = $element2->item(0)->nodeValue;

        $area[$i-1][$h1] = $h2;
        $j++;
    }

$i++;
$j=1;
}

print_r($area)

?>

Output :

Array
(
    [0] => Array
        (
            [AAA] => BBB
            [CCC] => DDD
            [EEE] => FFF
        )

    [1] => Array
        (
            [GGG] => HHH
            [III] => JJJ
            [KKK] => LLL
        )

)

Side note : I've assumed you always have the same number of elements for each area (=3).

Source