php - Troubles using LOAD DATA LOCAL INFILE to store CSV data into MySQL

UPDATE:

As per the answer suggesting the use of Table::insert, I updated the query.

// At this point the data is stored in an array
// I turn it into a collection to make use of the chunk function
$records = collect($records); 
// I split the array into chunks
// Inserting everything at once gave memory issues
$chunks = $records->chunk(500); 
// Then I iterate the chunks and insert them into the database
foreach ($chunks as $chunk) {
    \DB::table('table')->insert($chunk->toArray()); 
}

The issue with this is that while being faster than using the previous method, it still takes several minutes to insert everything into my database, so it's not an actual solution.

I'm not sure if this is the wrong approach, if I'm missing something/doing something wrong here.


Description:

I have an Angular form where the user can upload a .txt file containing CSV data. This is how it works:

HTML:

  <input type="file" class="form-control w-auto" (change)="getFile($event)" accept=".txt" />

Typescript:

  getFile(e) {
    this.file = e.target.files[0];
    this.file_description = e.target.files[0].name;
  }

  submit() {
    let reader = new FileReader();

    reader.onload = () => {
      let result = reader.result;
      this.ImportService.import(result).subscribe(
        (response) => {
          // Do success stuff
        },
        (_error) => {
          // Do error stuff
        }
      );
    };

    if (this.file) {
      reader.readAsText(this.file, "UTF-16LE");
    }
  }

This sends the data to my Laravel instance, where I want to store it into MySQL. I've been able to do this by processing the received data into an array and inserting each row one by one, like this:

PHP:

$array = array();
$csv = str_getcsv($request->file, "\n");
foreach ($csv as &$row) {
    $row = str_getcsv($row, ";");
    $array[] = $row;
}
array_splice($array, 0, 1);

foreach ($array as &$row) {
    $query = Table::firstOrNew(['col2' => $row[1], 'col3' => $row[2]]);
    $query->col1 = $row[0];
    $query->col2 = $row[1];
    $query->col3 = $row[2];
    $query->col4 = $row[3];
    $query->col5 = $row[4];
    // [...]
    $query->col72 = $row[71];
    $query->col73 = $row[72];
    $query->save();
}

The Problem:

The data being sent here contains around 100.000 records. This method is way too slow, often leading to timeouts/5-10+ minutes of waiting.


Attempted Solution:

I've been trying to use LOAD DATA LOCAL INFILE, but I can't get it to work.

Here's the code:

$results = DB::connection()->getpdo()->exec(
    "LOAD DATA LOCAL INFILE '" . $request->file . "' IGNORE INTO TABLE `table`
    FIELDS TERMINATED BY ';'
    LINES TERMINATED BY '\n'
    IGNORE 1 LINES (
        `col1`,
        `col2`,
        `col3`,
        // etc
    )"
);

In short, there are three issues with this method:

  • If the CSV data contains any Apostrophes I get a syntax error.
  • The local_infile property on my MySQL instance is disabled by default, and reverts to disable on restart.
  • With the above fixed, I get the following error: General error: 7890 Can't find file 'col1;col2;col3;etc'.

I've tried using LOAD DATA INFILE but I'm getting several 'access denied' errors when the query tries to fetch the file contents.


If you need any further information pleases let me know.

Answer

Solution:

Tasks like this are probably better suited to be handled in a Queue/Job so that the user isn't waiting for this to complete in real time.

Either way, I would recommend refactoring the code though. firstOrNew() will run queries for every single record you process, so it will be a major strain on the database.

The methods insert() and update() support processing many records at a time, so use PHP to preprocess the records so you can utilize those two methods.

Use one database query to get all the existing records and then build two arrays to call insert() and update().

I think you can create a where query like this if you create pairs of values that you want to search on:

Table::where(function($query) use ($pairs) {
    foreach ($pairs as $pair) {
        $query->where('col2', $pair['col2'])->where('col3', $pair['col3']);
    }
});

That will give you the existing records, so then you can loop through all records in PHP to build your two arrays and then run the insert and update queries only one time:

Table::insert($new_records);
Table::update($existing_records);

Source