PHP refresh page after outputting some HTML (breaking along operation into chunks)

Solution:

You might not consider this a pure PHP solution because there is an onload parameter specified for the <BODY> tag. Then again, purity is not all it's cracked up to be. But I throw this out as a technique you could use to give the user a running pseudo "progress bar" and your script would only get interrupted when you wanted it to and you would be able to pass back to the script any restart parameters. In this demo, the only "restart parameters" being passed back are successive integers that drive the progress bar, but you can get the idea:

<?php
if (!isset($_REQUEST['progress'])) {
    // initially no parameters specified, so use starting value of 0:
    $progress = 0;
}
else {
    $progress = (int)$_REQUEST['progress'];
}
// increment progress:
$progress++;
// draw progress bar:
$progress_bar = str_repeat ('+' , $progress);

// simulate doing some work:
sleep(5);

// simulate being done or not:
$done = $progress == 5;
?>
<html>
    <head>
        <title>Test</title>
    </head>
<?php if (!$done) { ?>
    <body onload="document.f.submit();">
<?php } else { ?>
    <body>
<?php } ?>
    Progress: <?= $progress_bar ?><br>
<?php if (!$done) { ?>
        <form name="f" method="post">
            <input type="hidden" name="progress" value="<?= $progress ?>">
            <!-- add any other hidden variables you need to resume where you left off -->
        </form>
<?php } else { ?>
        Done!
<?php } ?>
    </body>
</html>

Answer

Solution:

For this, I would use import.lock file. Here is my suggested workflow:

  1. When the user starts processing some data I create import.lock file, in this file we could write how many records we processed already AND the most important index of the last processed batch. It's up to you how you create batches creating strategy, you can implement it with temporary .json files, for example, join 3 big files and split it into small chunks.
  2. I assume you already implemented files reading and chunking, so for example we can make foreach and every iteration put some data into MySQL.
  3. So as I said before every iteration we updating import.lock with processed chunk number. At the moment when foreach ends we deleting import.lock file or clearing it.
  4. Every iteration we refresh the page with a simple PHP refresh and after refresh reading the import.lock file, taking the last imported batch index, and start again from next.

For progress to implemented you count all created junks (chunk1.json, chunk2.json ...), and let's say you we 100 chunks and we processed 15 already, import.lock have a value of 15 so it means 15% done. After refresh we taking 16 chunks (chunk16.json).

After every junk is done we delete it from file system.

Implementation example

Here is my implementation example, tested and working: Link to the repository: https://github.com/RomkaLTU/php-progress-case You still need to implement JSON files generation. I have used session in this example to track progress as user Mawg says reinstate Monica suggested.

And you may need to adjust header("refresh: 2") and chunk size.

session_start();

$dataDir = __DIR__ . DIRECTORY_SEPARATOR . 'data';
$chunkFiles = array_values(array_diff(scandir($dataDir), ['..', '.']));
$chunksCount = count($chunkFiles);
$currentChunk = $_SESSION['currentChunk'] ?? 1;

// need to identify current index of proccessing file
// need to follow naming convention here or make something smarter
$currentChunkIndex = array_search('chunk_' . $currentChunk . '.json', $chunkFiles);

if (isset($chunkFiles[$currentChunkIndex])) {
    $chunkFilePath = $dataDir . DIRECTORY_SEPARATOR . $chunkFiles[$currentChunkIndex];
    $dataJson = file_get_contents($chunkFilePath);
    $data = json_decode($dataJson, true);

    foreach ($data as $item) {
        // @TODO do something with a $item
    }

    unlink($chunkFilePath);

    $_SESSION['currentChunk'] = $currentChunk + 1;

    if (!empty($chunksCount)) {
        header("refresh: 2");
    } else {
        $_SESSION['currentChunk'] = 1;
    }
}
?>

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <?php echo $chunksCount * 3 . ' records left...'; ?>
</body>
</html>

Answer

Solution:

Ideally a script that "could potentially take quite some time" must not be executed on the fly via HTTP. The best practice is to queue the job, the http request only add the job into the queue and does not run it directly, so it is a fast and predictable operation that will not timeout, then the job is run independantly of the HTTP server, the job is executed by a queue manager with larger or no timeout on resources. You can periodically poll via HTTP from client side (with HTTP refresh or Javascript) to display some progress status to the user.

A minimalist but perfectly viable queue manager could be a cronjob launching a simple PHP Cli script that read and write some persistent locked data about jobs (file, database, ...). For more complex use cases (prioritizing or distributing load), it may be useful to use a dedicated queue manager software.

If cron is not an option, you may also run a PHP CLI script daemonized as a 100% pure PHP queue manager.

To main concept here is to decouple running unpredictable or long jobs from serving HTML pages through stateless HTTP in a predictable and fast way.

Source