Migrating from MySQL to Postgres using Laravel's query builder
Migrating from MySQL to Postgres is not an easy feat in general; there are numerous packages & scripts to accomplish this, like Pgloader and Nmig.
For us, the best way to do this was using Laravel's query builder, since it is agnostic to the database layer and allows us to transform any data as needed in a language we know (PHP) and a framework we know and love.
Why?
First off, a small note on why we moved from MySQL to Postgres since that's probably on everyone's mind reading this.
The main reason is that we were having trouble searching full text to work properly in MySQL for our use case. Our tests and subsequent migration to Postgres made our full-text search more reliable and faster.
We could use a different service or technology for our searching, but in the spirit of boring technology, Postgres does its job perfectly.
Looking toward the future, Postgres also offers many exciting opportunities thanks to its extensibility, for example, extensions like Timescale.
Preparing our database config
Our source database is MySQL; since this is our default, you could keep the database configuration as-is for the MySQL connection. However, we prefer to create a new one called source
.
'source' => [
'driver' => 'mysql',
'url' => env('SOURCE_DATABASE_URL'),
'host' => env('SOURCE_MYSQL_DB_HOST', '127.0.0.1'),
'port' => env('SOURCE_MYSQL_DB_PORT', '3306'),
'database' => env('SOURCE_MYSQL_DB_DATABASE', 'forge'),
'username' => env('SOURCE_MYSQL_DB_USERNAME', 'forge'),
'password' => env('SOURCE_MYSQL_DB_PASSWORD', ''),
...
],
Make sure to configure the necessary environment keys starting with SOURCE_
The next thing to do is to configure the destination
connection in the config/database.php
file to connect to the target database using the pgsql
driver:
'destination' => [
'driver' => 'pgsql',
'url' => env('DATABASE_URL'),
'host' => env('DESTINATION_DB_HOST', '127.0.0.1'),
'port' => env('DESTINATION_DB_PORT', '5432'),
'database' => env('DESTINATION_DB_DATABASE', 'forge'),
'username' => env('DESTINATION_DB_USERNAME', 'forge'),
'password' => env('DESTINATION_DB_PASSWORD', ''),
...
],
Creating the tables in Postgres
Laravel makes this extremely straightforward for us here, as you can run the migrations against the target database and have all the tables created using the correct field types.
php artisan migrate --database=destination
Creating a command to backfill the data
Since we're migrating tables with a lot of data, we need a way to sync the majority of the data before the migration. This way, we can minimize the downtime needed to execute a last-minute sync and switch over the database connection.
Laravel's query builder makes retrieving the data from the source MySQL database and inserting it into the destination Postgres database straightforward, as the query builder will change its schema language depending on the database driver.
Below you can find a simplified command from the one we used that retrieves the necessary data in a performant way and inserts it into the target database.
The command accepts two parameters: the table
it needs to migrate and a --from
parameter that lets you skip until a certain id. This makes the command easily restartable without needing to loop over thousands of rows that have been processed already.
class MigrateDatabaseCommand extends Command
{
protected $signature = 'app:migrate-database {table} {--from=}';
public function handle()
{
$table = $this->argument('table');
$this->getOutput()->info("Importing table {$table}");
$total = DB::connection('source')
->table($table)
->when($this->option('from'), fn ($query) => $query->where('id', '>', $this->option('from')))
->count();
$this->getOutput()->progressStart($total);
DB::connection('source')
->table($table)
->when($this->option('from'), fn ($query) => $query->where('id', '>', $this->option('from')))
->select('id')
->chunkById(1_000, function ($sourceRows) use ($table) {
$sourceIds = Arr::pluck($sourceRows, 'id');
$destinationRows = DB::connection('destination')->table($table)->whereIn('id', $sourceIds)->select('id')->get()->toArray();
$missingIds = array_diff($sourceIds, Arr::pluck($destinationRows, 'id'));
if (count($missingIds)) {
$missing = DB::connection('source')->table($table)->whereIn('id', $missingIds)->get();
DB::connection('destination')->table($table)->insert($missing);
}
$this->getOutput()->progressAdvance($this->option('chunk'));
});
$this->getOutput()->progressFinish();
}
}
This command loops over every ID inside the source database and checks which ones are already present in the destination database, if there are any missing IDs, the rows will be inserted into the destination database.
These extra checks ensure the command can be re-run at any time, and only missing rows will be fetched and inserted into the new database.
A couple of "gotcha's"
Rows that are updated or deleted between syncs
We have a unique situation where most tables could be wiped and imported into the destination database in seconds. These tables were, in this case, synced at the time of migration.
The other tables, with millions of rows, are rows inside an "append only" type table that is not updated and is cleaned up after a time period. In this example, we do not have any checks on rows that would need updating or exist in the destination but no longer in the source, for example, when a user deletes some rows.
If your project needs this kind of functionality, then be sure to build it in your command.
Column lengths
Make sure your column lengths are defined large enough in Postgres for the existing (and future) data you have. MySQL does not complain and truncates your data. However, Postgres is more strict in this regard and throws an exception when inserting too long data for a column.
Update any custom queries
If you're not using the Laravel query builder for some queries, this usually means you're using some functionality specific to the database driver. Make sure to update these queries!