Process big DB table with chunk() method

Let’s imagine the situation: you have a big database table (like 10 000 rows or bigger) and you need to run an update to one column. But you cannot run just SQL query – there is some PHP logic behind it. So foreach loop could potentially take forever or bump into a default 30-second script limit. Luckily, Laravel has a neat solution for it.

Let’s look at a simple example of a problem I’m talking about:

Do you see the problem? It’s fine for a DB table with 100 or 500 users. But what about 10000? 100000? Million?
It’s not only about the speed, but also about storing the data – you are potentially storing the whole database table into variables, you might run out of memory.

So here’s the thing: there is a function called chunk() which splits all data into separate selects, like pagination.
Let’s look at an example:

What it actually does is running a loop of selecting 100 entries, then doing updates with them, and then another 100 entries, another update and so on. Which means that at no point there is a huge amount of data taken from the database – you are working with a chunk of entries, not the whole table.

Want more articles like this every week? Subscribe!




Still not sure? Want to check out past newsletter issues?
Here they are – just click this link!

Now, be careful with filtering initial results – don’t run anything like this:

Technically, it would work and wouldn’t throw any errors, but the problem here is that you’re filtering not approved users, and then approving them, and on the next chunk, when Laravel is doing another database query to get another “page”, the data is already changed by then, and you will miss a page of data. Which means you will process only half of the entries.
So don’t update same fields that would affect initial filter.

Hope that helps dealing with big or huge amount of data.

Want more tips and articles like this?
20-50 useful links every week on our weekly newsletter - comes out every Thursday!



9 thoughts on “Process big DB table with chunk() method

  1. Hey Povilas, nice tip with chunks and updates. I also faced problem when you had to delete some entries after PHP processing / validating it. (Link can be found by clicking my name:))

    Keep up writing these tips!

  2. So, is there a way we can run WHERE statements in conjunction with chunk? Or are we restricted to using if / continue statements inside the loop?

  3. can anybody explain a bit how update case would only process half of the entries because as far as i understand if laravel selects 100 records (based on the where condition and we are updating the same column as in where clause ) it will again select next 100 enteries based on the same where clause, please expalain

    1. Basically the chunk method has an inbuilt $page counter initialized to 1 and a do while loop. It accepts 2 parameters $count and $callback [see: Illuminate\Database\Query\Builder namespace]. So each time it loops through the do while loop it generates sql select query and pass the results through a callback function. The generated select sql query is like this:

      SELECT * FROM tbl WHERE $constraints LIMIT $offset, $row_count; To cover all basis SELECT * FROM tbl WHERE $constraints LIMIT 120,100; // will retrieve 100 rows from row 121- 220

      //How $offset and $row_count is determined in Laravel
      $offset = ($page -1) * $count; //$count parameter passed in.
      $row_count = $count;

      So sql query generated for the first 3 iterations from the example above of …chunk(100, $callback) assuming the User model uses a ‘users’ will be:
      SELECT * FROM users WHERE approved = 0 LIMIT 0,100 # Retrieve 100 rows from row 1- 100
      SELECT * FROM users WHERE approved = 0 LIMIT 100,100 # Retrieve 100 rows from row 101- 200
      SELECT * FROM users WHERE approved = 0 LIMIT 200,100 # Retrieve 100 rows from row 201- 300

      So if your where $constraints was (approved = 0 ) and you change it to (approved = 1 ) in your callback function. On second iteration the first 100 results of which meet criteria of approved = 0 are not selected. but rather records from 101 – 200. On the next iteration the first 200 results of which meet criteria of approved = 0 are not selected. but rather records from 201 – 300. So as you can see if suppose total records are 1000 at the sixth iteration the select statement will be:
      SELECT * FROM users WHERE approved = 0 LIMIT 500,100 # Retrieve 0 rows since there will be a total of 500 records where approved = 0. Hence that’s why the author said “…you will process only half of the entries.”

      1. This is a brilliant answer to a question I have been pondering in the background for over a year! Thank you Charles Chamusi.

  4. Thank You! I’ve seen the Laravel documentation (4.2) on chunk() and didn’t think anything about this function because the details were brief didn’t have much context. After one of my tables grew to over 1.5 millions records I started to Google and came across this article and my application went from crashing at > 2GB of memory to finishing in < 5 minutes of execution.

Leave a Reply

Your email address will not be published. Required fields are marked *