Scraping websites with PHP

View Original

I'm trying to scrap information directly from the maersk website. Exemple, i'm trying scraping the information from this URL https://www.maersk.com/tracking/221242675 I Have a lot of tracking nunbers to update every day on database, so I dicided automate a little bit.

But, if have the following code, but its saying that need JS to work. I alredy even tryed with curl, etc. But nothing work. Any one know another way?

I tryed the following code:


<?php
//

Answer

-- teste 14

Answer

-- $html = file_get_contents('https://www.maersk.com/tracking/#tracking/221242675'); //get the html returned from the following url echo $html; $ETAupdate = new DOMDocument(); libxml_use_internal_errors(TRUE); //disable libxml errors if(!empty($html)){ //if any html is actually returned $ETAupdate->loadHTML($html); libxml_clear_errors(); //remove errors for yucky html $ETA_xpath = new DOMXPath($ETAupdate); //get all the h2's with an id $ETA_row = $ETA_xpath->query('//strong'); if($ETA_row->length > 0){ foreach($ETA_row as $row){ echo $row->nodeValue . "<br/>"; } } } ?>

Answer

Solution:

You need to scrape the data directly from their API requests, rather than trying to scrape the page URL directly (Unless you're using something like puppeteer, but I really don't recommend that for this simple task)

I took a look at the site and the API endpoint is:

https://api.maersk.com/track/221242675?operator=MAEU

This will return a JSON-formatted response which you can parse and use to extract the details. It'll also give you a much easier method to access the data rather than parsing the HTML. Example below.

{
    "tpdoc_num": "221242675",
    "isContainerSearch": false,
    "origin": {
        "terminal": "YanTian Intl. Container Terminal",
        "geo_site": "1PVA2R05ZGGHQ",
        "city": "Yantian",
        "state": "Guangdong",
        "country": "China",
        "country_code": "CN",
        "geoid_city": "0L3DBFFJ3KZ9A",
        "site_type": "TERMINAL"
    },
    "destination": {
        "terminal": "DCT Gdansk sa",
        "geo_site": "02RB4MMG6P32M",
        "city": "Gdansk",
        "state": "",
        "country": "Poland",
        "country_code": "PL",
        "geoid_city": "3RIGHAIZMGKN3",
        "site_type": "TERMINAL"
    },
    "containers": [ ... ]
}

Source