# Finding clients hit by Let's Encrypt mass-revocation

Yesterday, we sent out notifications to all clients whose certificates we [continuously monitor](/features/continuous-certificate-monitoring) and that are affected by the [Let's Encrypt mass revocation of SSL certificates](https://community.letsencrypt.org/t/revoking-certain-certificates-on-march-4/114864). In this post, we'll share the details how we found those certificates.

Now, the _morning after_, we're well rested and in good shape to do a proper write-up on the matter.

## Getting a list of all domains to check

As part of our [uptime monitoring](/features/website-uptime-monitoring), users can add a site to Oh Dear with specific URL parameters. So in order to get a list of domains we needed to verify, it wasn't as simple as:

```sql
SELECT domain FROM sites;
```

Instead, we used Laravel's lazy collections to quickly filter all teams with active subscriptions and extract the relevant domain name.

```php
Team::cursor()
    ->filter(fn (Team $team) => $team->hasActiveSubscriptionOrIsOnGenericTrial())
    ->flatMap(fn (Team $team) => $team->sites)
    ->map(fn (Site $site) => $site->domain())
    ->each(fn (string $output) => echo $output . PHP_EOL);
```

This produced a new-line separated list of domains that we need to check.

Let's save those in `domains.txt`, since we're moving to some CLI tricks now.

## Retrieving the serial for each certificate

Now we find the active _Serial Number_ for each of those certificates. It involves connecting to each site over SSL/TLS, getting the certificate and saving the _Serial Number_. 

The original idea came from [a Hacker News comment](https://news.ycombinator.com/item?id=22475943), we modified it to get some better error handling and control of the output.

```bash
# Create a directory to hold all serial numbers
mkdir -p serials

# Loop all domains, connect and fetch the serial
for i in $(cat domains.txt); do
  echo "Connecting to $i ... "

  (
    openssl s_client -connect $i:443 -servername $i -showcerts < /dev/null 2> /dev/null |
    openssl x509 -text -noout |
    grep -A 1 "Serial Number" |
    tr -d : |
    tail -n 1
  ) | tee serials/$i;

done
```

The `openssl s_client` connects to the domain (using Server Name Indication (SNI) with the `-servername` option) and lists all certificates.

Now, in `serials/*`, we have a directory full of domain names and their corresponding certificate serial.

## Combining all serials

We'll make a single list with all the serials we need to check. This way, we can optimize our `grep` commands for later.

```bash
$ cat serials/* | tr -d " " | sort | uniq > serial-numbers.txt
```

The file `serial-numbers.txt` is now a gigantic list of serial numbers.

## Finding the serials in the 1.2GB text file

Let's Encrypt has [released a text-file with all affected certificates](https://letsencrypt.org/caaproblem/). This file includes the Serial Number (which we now have) together with all domains/SANs on the certificate.

Our first attempt was to simply `grep` our way through the file for each serial found. But `grep` is single-threaded, so we could only utilize a single CPU core for searching through a pretty big file.

This was taking too long, so we quickly adapted our method and started to search through the log in parallel.

Lucky for us, we started preparing a new set of servers [for our crawlers](/features/broken-page-and-mixed-content-detection) that check for broken links last week. Those servers were still idling as they aren't in production yet. This was the perfect time to use that spare capacity.

First, we split the big file of serials (called `serial-numbers.txt`) in many equal pieces.

```bash
$ split -l 1000 serial-numbers.txt
```

This gives us a list of many files, all with 1000 serial numbers in it. The file naming is predictable:

```
$ ls -l x*
-rw-rw-r-- 1 immutable immutable 2672 Mar  3 20:27 xaa
-rw-rw-r-- 1 immutable immutable 2948 Mar  3 20:27 xab
-rw-rw-r-- 1 immutable immutable 2948 Mar  3 20:27 xac
-rw-rw-r-- 1 immutable immutable 2960 Mar  3 20:27 xad
```

In order to utilize all our cores, we used each file as the pattern input to `grep` and sent the job to the background for processing.

```bash
for file in $(ls x*); do
	\grep -P "$(cat $file | tr "\n" "|" | sed -e 's/|/\|/g' | sed -e 's/|$//' )" ../ssl-cert/caa-rechecking-incident-affected-serials.txt >> results.txt &
done
```

That rather ugly-looking `tr` & `sed` pipeline in there transforms the input file from a new-line separated list of serials, to a `|`-separated list. This is used in `grep` to indicate the "or" statement, any line may match.

In the long form, it turns our input of this:

```bash
$ cat xaa
01009ba...
0111839...
011539e...
0135d43...
```

... into this:

```
01009ba...|0111839...|011539e...|0135d43...
```

Because we sent each `grep` command to the background in our `for`-loop, using the `&` at the end of the command, we now have many `grep`'s running in parallel.

What followed was, to me as a sysadmin, a thing of beauty. ?

![Server Utilization of the Lets Encrypt checks](/uploads/blogs/letsencrypt-revocation-check/server-usage.png)

The crunching continued for a while, and we now had a list of affected serials stored in `results.txt`.

At this point, things were getting a bit late, so we resorted to even _weirder Bashness_ to match these serials back to the domain names.

## Matching the serials back to domains

We loop each affected serial and match it back to the domain:

```bash
for line in $(cat results.txt ); do
	\grep $line serials/*;
done |
	awk '{print $1}' |
	sed 's/\// /' |
	awk '{print $2}' |
	sed 's/:/ /' |
	awk '{print $1}'
```

Looking at it now, awake, it could've been much cleaner. But, it got the job done! We now have a list of domain names of clients we need to notify.

## Sending the mails to clients

To inform our clients, we resorted back to PHP. This allows us to send the notification e-mails in our own style/branding.

```php
Mail::to($users)->send(new LetsencryptRevokedMail($domain));
```

It uses the power of [Laravel Mailables](https://laravel.com/docs/master/mail) to make this really easy.

## A rush job because time was against us

We didn't do as a clean a job as we'd normally do. There were no tests, no clean integrations, and most of it was _hacked_ together on very short-notice.

But, there wasn't much choice. The list of affected certificates was released yesterday, and within 48 hours the revocation was to take place. It was up to us to notify our users _asap_. After all, those affected still needed to renew their certificates!

We're happy to see the list of affected domains beforehand though. The normal procedure is that the revoced certificates end up in Certificate Revocation Lists (CRL), but at that point the revocation _has already happened_.

This allowed us to be proactive and inform clients ahead of time!
