How to safely delete records in massive tables on AWS using Laravel
When reviewing the contents of the database of Flare, we encountered a table with 1 billion records. Technically that isn't a problem. Flare runs on Vapor and uses an Aurora database, so it can handle that scale. But of course, there's a cost for storing that many records.
We dove in and concluded that we could safely delete about 900 million records. They all were created before a specific date.
Attempt 1: running a single delete query
A first naive approach would be to run a simple delete query.
DELETE FROM table_with_too_many_records
WHERE created_at < '2020 -01-01 00:00:00'
Performing a delete query like this on a small table will probably work. But when your table contains a billion items, it will likely cause problems.
In most cases, a database will try to perform all queries atomically, so no other queries see intermediary results. When a deletion query using a where
clause starts, it will lock that table. All other queries that are run against the table will have to wait.
Because of the massive size of the table, the deletion query would probably run for hours. The lock would be in place for hours, essentially halting the entire app.
So, just running a single query to clean up the table isn't a good idea.
Attempt 2: using a limit
To make the query complete faster, you can add a limit
clause. Because the query completes faster, the table lock will be released much faster as well. Other queries wont be halted as long.
DELETE FROM table_with_too_many_records
WHERE created_at < '2020 -01-01 00:00:00`
LIMIT 1000;
Here's an artisan command that performs that query.
namespace App\Console\Commands;
use App\Domain\Error\Models\ErrorOccurrence;
use Illuminate\Console\Command;
class CleanTableCommand extends Command
{
protected $signature = 'clean-table';
protected $description = 'Clean up the table';
public function handle()
{
$this->info('Cleaning table...')
MyModel::query()
->where('created_at', '<', now()->subMonth())
->limit(1000)
->delete();
$this->info('All done!');
}
}
In the code above, only the 1000 first selected are deleted. Our tables contain many more records that should be deleted.
A deletion query in Laravel returns the number of deleted records. We can use that to determine whether if it's worthwhile to rerun the deletion query.
public function handle()
{
$this->info('Cleaning table...')
do {
$numberOfRecordsDeleted = MyModel::query()
->where('created_at', '<', now()->subMonth())
->limit(1000)
->delete();
$this->comment("Deleted {$numberOfRecordsDeleted} records");
} while($numberOfRecordsDeleted);
$this->info('All done!');
}
This strategy works quite well on traditional systems where there's no limit on how long a process can run. If you use a conventional server instead of AWS Lambda and want to use the cleanup strategy above, look at our model cleanup package.
The problem with this strategy on AWS Lambda is that there is a hard execution time limit of 15 minutes. When cleaning up a huge table, the process would likely take multiple hours, so doing the cleanup in a single artisan command isn't possible.
Attempt 3: using jobs
To overcome that execution time limit of 15 minutes, the deletion logic can be moved to a job. The job can dispatch itself to delete more records.
namespace App\Jobs;
use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
class CleanTableJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function handle()
{
$numberOfRecordsDeleted = MyModel::query()
->where('created_at', '<', now()->subMonth())
->limit(1000)
->delete();
if ($numberOfRecordsDeleted > 0) {
self::dispatch(static::class);
}
}
}
This strategy works great, the query executes fast, and the deletion process can keep running as long as needed.
Scheduling the cleanup process
In Laravel's console kernel, you can, in addition to artisan commands, schedule jobs.
// in app/Console/Kernel.php
protected function schedule(Schedule $schedule)
{
// other scheduled tasks
$schedule->job(\App\Jobs\CleanTableJob::class)->hourly();
}
If CleanTableJob
implements ShouldQueue
(and in our case it does), the scheduler will dispatch the job on the queue, and it will not stall next entries in the schedule.
Preventing overlaps
There is one more caveat. In the example above, the CleanTableJob
is scheduled to be dispatched hourly. The job will re-dispatch itself if more records need to be deleted.
Now imagine that the CleanTableJob
will re-dispatch itself for more than an hour. While there is still a CleanTableJob
running, the CleanTableCommand
will dispatch another CleanTableJob
. That will cause multiple CleanTableJob
jobs to run at the same time.
In our specific situation, where we are cleaning up a table with hundreds of millions of records, it likely takes hours or even days to clean up the table. If left unchecked, there would be a great many CleanTableJob
running. Ideally, we want one CleanTableJob
running at any given time.
We can solve this problem by using an atomic lock. Laravel has support for this out of the box. Here's how using the lock could look like.
// in `CleanTableJob`
public function handle()
{
$lock = Cache::store('redis')->lock(DeleteOldErrorsJob::class.'_lock', 10 * 60)
if (! $lock->get()) {
// couldn't acquire lock, other job is probably running
return;
}
$numberOfRecordsDeleted = MyModel::query()
->where('created_at', '<', now()->subMonth())
->limit(1000)
->delete();
// release to lock, so a next job can get it
$lock->release();
if ($numberOfRecordsDeleted > 0) {
self::dispatch(static::class);
}
}
This code looks good, but it can be improved. That locking code makes it hard to see the essence of the job, which is deleting rows.
The locking code can be moved to a job middleware.
namespace App\Jobs\Middleware;
use Illuminate\Support\Facades\Cache;
class AtomicJobMiddleware
{
/**
* Process the queued job.
*
* @param \Illuminate\Contracts\Queue\Job $job
* @param callable $next
*/
public function handle($job, $next)
{
/** @var \Illuminate\Cache\RedisLock $lock */
$lock = Cache::store('redis')->lock("{$job->resolveName()}_lock", 10 * 60);
if (! $lock->get()) {
$job->delete();
return;
}
$next($job);
$lock->release();
}
}
Here's the rewritten CleanTableJob
that makes use of the AtomicJobMiddleware
.
// in `CleanTableJob`
public function handle()
{
$numberOfRecordsDeleted = MyModel::query()
->where('created_at', '<', now()->subMonth())
->limit(1000)
->delete();
if ($numberOfRecordsDeleted > 0) {
self::dispatch(static::class);
}
}
public function middleware()
{
return [new \App\Jobs\Middleware\AtomicJobMiddleware::class];
}
And with all this in place, we now have a pretty reliable way of cleaning up big tables.
In closing
Cleaning big tables isn't that difficult, but many caveats need to be taken care of. I hope this post helps you in cleaning up your tables.
If you're using an Aurora cluster on AWS, you should be aware that deleting a lot of records won't make your bill smaller as space used by the deleted records will still be allocated. Here's how to reclaim the allocated space.
If you're not on AWS and want to clean up big tables in a safe way, take a look at our model clean up package.
We are currenlty optimizing Flare some more, and will soon release a first batch of new features. Stay tuned for that!