Fetching data to render complex graphs in Flare
When tracking errors within your application, a visual indication of when errors are happing is essential in the debugging experience we're trying to provide.
Flare already had graphs on errors happening per project. Here's how that looks like:
Now, we've also added graphs per error. So you can easily see when a particular error start occurring, when it was spiking, and more. Here how such a graph looks like.
These graphs count the number of occurrences of each error in a project. Sometimes we have five occurrences for an error, but there are situations where we have 40k occurrences. Our database already does a terrific job of crunching all this data. But this still can take a while.
We wanted to keep Flare fast and, more importantly, have a single approach to how we did these calculations and smartly cached them.
Requirements
We have four points in our application where we show graphs:
-
On the projects view, we show a graph of occurrences per project per day
-
On the project errors view, we show a graph of all occurrences for the project per minute scoped by the current search query
-
Also, on the project errors view, for each error, we show a graph of occurrences per day
-
On an error-specific page, the graph we mentioned above is also shown
Let's see what we need:
-
We should be able to get graphs for a single or multiple items
-
Graphs are per day, sometimes per minute
-
A graph can be changed by external parameters (like the search query)
-
We want everything blazingly fast ⚡️
Caching
The best way to solve this is by caching graphs and using the cached values on the following requests. The only problem: Flare is a real-time product. You can have a graph of zero errors for the whole month, and suddenly due to some bug within your code, that graph could have hundreds of occurrences.
We don't want to wait for the next day to show you that these occurrences came in. You want to see these graphs in real time.
That's why we're using a combination of caching and live data. For all the graphs, we cache the data of the retention period you've got in your subscription except the current day. For the current day, we'll do live data crunching to calculate how many occurrences have passed by.
Such caching will work since new errors will only come in today, which is queried by the live statistics.
A complete cache bust should happen when an error is deleted, resolved, or unresolved because these actions are destructive to our data from the past. We'll need to regenerate them again on the subsequent request.
The implementation
We've constructed an abstract base class, MultiOccurrenceStats
, from which three classes extend:
-
ErrorOccurrenceStatsPerDay
for the error view and the project errors view -
ProjectErrorOccurrenceStatsPerMinute
for the project errors view -
ProjectErrorOccurrenceStatsPerDay
for the projects view
The base class has a fetch
method, allowing us to fetch data for multiple subjects(errors or projects). We return an array for each subject with as key a UNIX timestamp and as value the number of occurrences around that timestamp:
/**
* @param int[] $subjectIds
* @return array<int, array<int, int>>
*/
public function fetch(
array $subjectIds,
CarbonImmutable $start,
CarbonImmutable $end,
): array {
// ...
}
Each subclass should also implement a few methods:
//provides the key to storing statistics for a subject
abstract protected function cacheKey(int $subjectId): string;
// The time the graph is cached
abstract protected function cacheTtl(): int;
// A query to fetch the statistics
abstract protected function performQuery(): Builder;
// The subject id in the query above. Please continue reading to know why we need this
abstract protected function querySubjectIdColumn(): string;
Now we can call the fetch
method with
- on a single subject like the project:
fetch([$project->id], $project->retentionStart(), $project->retentionEnd());
- for multiple subjects like the errors in a project:
fetch($subjectIds, $start, $end);
How is it implemented? First, we map our subject ids to the keys within our cache:
$cacheKeys = array_map(
fn (int $subjectId) => $this->cacheKey($subjectId),
$subjectIds
);
We split the retention period into a cacheable and non-cacheable period:
[$cachablePeriod, $nonCacheablePeriod] = $this->splitPeriodInCacheableAndNonCacheable($retentionPeriod);
// calls:
function splitPeriodInCacheableAndNonCacheable(StatsPeriod $retentionPeriod): array
{
return [
new StatsPeriod(
$retentionPeriod->start->startOfDay(),
CarbonImmutable::yesterday()->endOfDay(),
),
new StatsPeriod(
CarbonImmutable::now()->startOfDay(),
$retentionPeriod->end->endOfDay(),
),
];
}
Now we want to check what's been cached and what isn't. We store a CachedStats
object within our cache:
class CachedStats
{
/**
* @param array<int, int> $stats
*/
public function __construct(
public StatsPeriod $period,
public array $stats,
) {
}
}
This object indicates whether cached graphs exist for a specific subject and for which period these were cached. Then, we will split our subjects into two categories: subjects for which a correct cached period exists and subjects without cached graphs.
Notice the cache()->many()
method. It queries all cache keys at once, which is a bit faster. Keys that do not exist will return null
.
/** @var \Illuminate\Support\Collection<int, null> $nonCachedSubjects */
$nonCachedSubjects = collect();
/** @var \Illuminate\Support\Collection<int, \App\Domain\Project\Stats\CachedStats> $cachedSubjects */
$cachedSubjects = collect();
foreach (array_values(cache()->many($cacheKeys)) as $i => $cached) {
$subjectId = $subjectIds[$i];
if (
$cached === null
|| ! $cached instanceof CachedStats
|| ! $cached->period->equals($cachablePeriod)
) {
$nonCachedSubjects[$subjectId] = null;
} else {
$cachedSubjects[$subjectId] = $cached;
}
}
From this point, the individual classes which extend from this base class become important. Fetching the data for graphs is different depending on what's exactly needed. From now on, we'll implement the stats for errors on the projects' errors page.
This is the base query required for these stats:
DB::table('error_occurrences')
->selectSub('error_occurrences.error_id', 'subject_id')
->selectRaw('CAST(UNIX_TIMESTAMP(DATE_FORMAT(received_at, "%Y-%m-%d 00:00:00")) * 1000 AS UNSIGNED) AS timestamp, count(*) as count')
->join('errors', 'error_occurrences.error_id', 'errors.id')
->where('errors.status', Status::Open);
This query is missing some parts. We're not grouping anything or scoping the query onto our subjects. Within the fetch
method, another part is added:
$query = $this->performQuery()
->groupBy('timestamp', 'subject_id')
->where(fn (Builder $builder) => $builder
->when($nonCachedSubjects->isNotEmpty(), fn (Builder $builder) => $builder
->where(fn (Builder $builder) => $builder
->whereIn($this->querySubjectIdColumn(), $nonCachedSubjects->keys())
->whereBetween('received_at', [$retentionPeriod->start, $retentionPeriod->end])
)
)
->when($cachedSubjects->isNotEmpty(), fn (Builder $builder) => $builder
->orWhere(fn (Builder $builder) => $builder
->whereIn($this->querySubjectIdColumn(), $cachedSubjects->keys())
->whereBetween('received_at', [$nonCacheablePeriod->start, $nonCacheablePeriod->end])
)
)
)
->orderBy('timestamp');
These extra statements will:
-
Group counted occurrences by their timestamp
-
Select for the non-cached subjects the whole retention period
-
Select the cached subjects for one day: today
-
Order the results by their timestamp
Notice the $this->querySubjectIdColumn()
call we discussed earlier. It selects occurrences based on the subject and will, in this case, be error_occurrences.error_id
.
This massive query is now executed, and we group the results in a Laravel Collection by its subject:
$stats = $query->get()
->groupBy('subject_id')
->map(fn (Collection $subjectStats) => $subjectStats->mapWithKeys(
fn (object $row) => [$row->timestamp => $row->count]
)->all());
At this point, our flow splits into two based on whether a subject has cached values.
The non-cached subjects
We will first cache the freshly calculated counts so they can be reused in the next request. We'll start by filtering all the stats we've queried and only use stats before today which have not yet been cached:
$cacheableStats = $stats
->filter(fn (array $stats, int $subjectId) => $nonCachedSubjects->has($subjectId))
->map(
fn (array $stats) => $this->filterStatsForPeriod($cachablePeriod, $stats)
);
// Calls:
/**
* @return array<int, int>
*/
private function filterStatsForPeriod(
StatsPeriod $period,
array $stats
): array {
$min = $period->start->getPreciseTimestamp(3);
$max = $period->end->getPreciseTimestamp(3);
$filtered = [];
foreach ($stats as $timestamp => $count) {
if ($min <= $timestamp && $timestamp < $max) {
$filtered[$timestamp] = $count;
}
}
return $filtered;
}
Then we're going to cache these stats if we have any:
$cacheValues = $cacheableStats->mapWithKeys(fn (array $stats, int $subjectId) => [
$this->cacheKey($subjectId) => new CachedStats($cachablePeriod, $stats),
]);
if ($cacheValues->isNotEmpty()) {
cache()->putMany($cacheValues->all(), $this->cacheTtl());
}
In the end, we put all the stats we've queried within an array per subject:
$nonCachedStats = $nonCachedSubjects->map(
fn (null $null, int $subjectId) => $stats->get($subjectId, [])
)->all();
The cached subjects
For the cached subjects, we're going to take the cached stats and append the freshly added stats:
$cachedStats = $cachedSubjects->map(function (CachedStats $cachedStats, int $subjectId) use ($stats) {
if (! $stats->has($subjectId)) {
return $cachedStats->stats;
}
return $stats->get($subjectId) + $cachedStats->stats;
})->all();
That + within the code might look weird. Why not use array_merge()
? In this case, the problem with array_merge
is that it removes the keys, which are our timestamps. Crucial information we want to keep.
The + operation between arrays won't lose these keys but is sometimes considered dangerous because when a key overlaps, the left-hand side value is used. That's why we've put the most recent data on the left-hand side of this expression. Should something go wrong, we'll always have the most up-to-date data.
The latest line of the fetch
method looks like this:
return $nonCachedStats + $cachedStats;
We've successfully and efficiently calculated our stats!
Future improvements
We'll use this version for the following weeks. Some improvements can be made, though:
-
Make the caching interval larger for minute stats. We could cache till an hour ago instead of the start of the day, thus reducing the query
-
Instead of requiring the exact cache period (start of retention period until yesterday), we could accept each cached period and then add the missing data in between when we're querying the most recent data
-
Make more use of lazy collections since we're working with a lot of data
-
When using arrays, pass them through with references. Which should eliminate some memory copies
-
Probably other stuff we can't think of right now 😀
In the end
The Flare redesign is coming together. We hope to start inviting beta testers soon. Want to join this group? Send a mail to [email protected].
Now is the perfect time to try out Flare. You can register here for a free ten-day trial; no credit card is required.