php - How to extract an array from a script
Below I have a sample HTML. In the HTML there is a script. In the script there is an array. I want to extract this array. Preferably as an array in PHP but also extracting it as a string is also good. I don't understand how to target that part of the script. The part of the script I want to extract is: (1234567, 111, 'Red Pants', 'https://example.com/red', '', 1, '','','0','0','0', 'buy')
.
It seems that the node value is the entire script. I am unsure how to isolate it.
PHP
<pre><?php
$content = file_get_contents("test.html");
$dom = new DOMDocument();
@$dom->loadHTML($content);
$dom->preserveWhiteSpace = false;
foreach($dom->getElementsByTagName('script') as $scripts) {
var_dump($scripts);
}
HTML
<!doctype html>
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
</head>
<script>
function init() {
YAHOO.util.Event.on(['adding_to_cart','adding_to_trolley_sub'], 'click', addProdToTrolley);
YAHOO.util.Event.on('add-prod-to-watchlist', 'click', addProdToWatchList);
YAHOO.util.Event.on(['buy_now','buy_now_sub'], 'click', function() { SHOP.Purchase.buy(1234567, 111, 'Red Pants', 'https://example.com/red', '', 1, '','','0','0','0', 'buy');});
YAHOO.util.Event.on(['frm_buttons','frm_buttons_sub'], 'submit', function(e) { YAHOO.util.Event.stopEvent(e); });
};
var id = 1234567;
function addProdToTrolley(e) {
YAHOO.util.Event.stopEvent(e);
document.trolley_quantity.trolley_action.value = 'Add to cart';
document.trolley_quantity.tid.value = '';
document.trolley_quantity.prod_id.value = id;
disableButtons();
document.trolley_quantity.submit();
};
</script>
Answer
Solution:
You can use regex
and extract from HTML
content without xml dom:
preg_match_all('/<script\b[^>]*>(.*?)<\/script>/is', $html, $matches);
And use substr to target starting with SHOP.Purchase.buy(
to end of array in ;});
or directly use regex to extract the JS data:
preg_match_all('/SHOP.Purchase.buy(.*?);}\);/is', $string, $matches);
now $matches[1][0]
is extracted data inside of script:
(1234567, 111, 'Red Pants', 'https://example.com/red', '', 1, '','','0','0','0', 'buy')
you can use preg_match
if you want to find first matched script instead preg_match_all
.