php - SQL Update taking a long time
I am trying to update many rows (100 000+) in my database but it's taking a while (over 10 mins and still not finished). I'm wondering if this is intended behavior or is there something wrong in my code. To prevent the database from hanging while performing the update I've been told to update one row at a time, not sure if this is how it should be implemented.
I am updating images in my song table to be null if those songs were played in my playlist table
private function updateBlogSongs ($blog_id) {
$db = Yii::app()->db;
$affectedRows = 0;
$sql = "SELECT *
FROM `firstdatabase`.song s
INNER JOIN `seconddatabase`.playlist p ON s.name LIKE p.song_name";
$dataReader = $db->createCommand($sql)->query(); // Rows from the song table that were played in the given blog
$row = $dataReader->read();
while ($row != false) {
$sql = "UPDATE `firstdatabase`.song s
SET s.image = NULL
WHERE s.song_id = " . $row['song_id'];
$affectedRows += $db->createCommand($sql)->execute();
$row = $dataReader->read();
}
return $affectedRows;
}
Edit: after reading The Dog's comment I made some changes: With 500 000 rows in the song table it takes about 10 minutes if I increase my batchSize to 10000 (was taking 8 hours with the code above). At 250 at the batch size it's taking about 50 minutes. I chose 250 because the query takes about 1 second to run, and it's taking 10+ seconds to run at 10000 batch size (constraint is 1 second). I would like to make it faster but not sure what else to change
$batchSize = 250;
$lastSongID = 0;
$rowIndex = 0;
$affectedRows = 0;
$sql = "SELECT max(song_id) FROM `firstdatabase`.song";
$lastSongID = intval($db->createCommand($sql)->query()->read()['max(song_id)']);
echo($lastSongID . ' songs in table.' . PHP_EOL);
echo('Updating songs...' . PHP_EOL);
while($rowIndex <= $lastSongID) {
$startTime = microtime(true);
$sql = "UPDATE `firstdatabase`.song
SET image = NULL
WHERE song_id in (
SELECT song_id
FROM (
SELECT song_id, name
FROM `firstdatabase`.song
WHERE song_id > " . $rowIndex . "
LIMIT " . $batchSize . "
) s
INNER JOIN (
SELECT DISTINCT song_name
FROM `seconddatabase`.playlist
) p ON s.name LIKE p.song_name
ORDER BY s.song_id ASC
)";
$affectedRows += $db->createCommand($sql)->execute();
$rowIndex += $batchSize;
$endTime = microtime(true);
$elapsedTime = round($endTime - $startTime, 2);
}
Answer
Solution:
This is really more a question for the SQL world instead of the PHP world but here's my recommendations:
Don't do this one row at a time in a while loop. Make a more complex update statement that can do it all in one database hit. Database commands are the slowest part of your php code, you want to limit the number of calls you do to the database.
When you are confident that you can get the operation done in one sql command, or even if you don't think it is possible then pull your code into a stored procedure in the database. Having complex sql queries as stored procedures can help a lot with maintaining your code.
Make sure you have indexes on your tables. You need to make sure your queries hit those indexes for best performance.
Here's an option for the single query:
update `firstdatabase`.song
set image = null
where song_id in (
select s.song_id
from `firstdatabase`.song s
INNER JOIN `seconddatabase`.playlist p
ON s.name LIKE p.song_name"
);
Obviously we don't have access to your database so you'll need to make changes where necessary but hopefully it can get you on the right track.
EDIT: Try replacing your second code set with the following:
$lastSongID = 0;
$rowIndex = 0;
$affectedRows = 0;
$sql = "SELECT max(song_id) FROM `firstdatabase`.song";
$lastSongID = intval($db->createCommand($sql)->query()->read()['max(song_id)']);
echo($lastSongID . ' songs in table.' . PHP_EOL);
echo('Updating songs...' . PHP_EOL);
$startTime = microtime(true);
$sql = "
update `firstdatabase`.song
set image = null
where song_id in (
select s.song_id
from `firstdatabase`.song s
INNER JOIN `seconddatabase`.playlist p
ON s.name LIKE p.song_name"
)";
$affectedRows += $db->createCommand($sql)->execute();
$endTime = microtime(true);
$elapsedTime = round($endTime - $startTime, 2);
If it works, then let me know the time it takes to run, if it doesn't work, is it an issue with the SQL (again I can't see the tables so I'm guessing).
Source