How Oh Dear identified a certificate problem at a large CDN provider

As part of our service, we perform SSL certificate monitoring. We do this slightly different than other providers, which is why were able to detect a problem with the SSL certificates of a large, commercial, CDN provider.

In this post, we'll do a technical deep-dive into how we found this problem!

Why we found it in the first place

While we're a traditional website monitoring service in some aspects, we do a couple of things differently from our competition (and I'm not just talking about our crawler that looks for broken links & mixed content).

Our SSL monitoring is fairly intense. We fetch and validate the SSL TLS x.509 certificates every 5 minutes, for each site, to report on changes and problems.

With over 250 checks a day for the SSL certificate alone, it means we have a lot of opportunities to catch errors related to your site certificates.

This allows us to quickly identify situations where a certificate has changed (use case: did you renew all the SANs on the certificate?) or one that has incorrectly been replaced.

What happened here?

In this case, we received reports from multiple of our clients that our certificate change reporting "seemed wrong". We were sending alerts to clients that their certificates had changed, but they could never quite verify this.

When we connect to a host to do our in-depth certificate analysis, we fetch as much information as we can. It's similar to how OpenSSL's client would do it:

$ openssl s_client \
	-connect clientsite.tld:443 \
	-servername clientsite.tld \
	2>/dev/null \
	| grep 'CN'

 0 s:/CN=clientsite.tld
   i:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3
 1 s:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3
   i:/O=Digital Signature Trust Co./CN=DST Root CA X3
issuer=/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3

This is what you'd expect: a connection to the host clientsite.tld, where we make clear that we want to retrieve the certificate information using SNI - Server Name Indication.

The above example is a functioning one, it correctly returned us the certificate of the site, and we could verify everything was intact.

However, every once in a while, this happened:

$ openssl s_client \
	-connect clientsite.tld:443 \
	-servername clientsite.tld \
	2>/dev/null \
	| grep 'CN'

 0 s:/OU=Domain Control Validated/CN=*
   i:/C=US/ST=State/L=Location/, Inc./OU= Daddy Secure Certificate Authority - G2
 1 s:/OU=Domain Control Validated/CN=*
   i:/C=US/ST=State/L=Location/, Inc./OU= Daddy Secure Certificate Authority - G2
 2 s:/C=US/ST=State/L=Location/, Inc./OU= Daddy Secure Certificate Authority - G2
   i:/C=US/ST=State/L=Location/, Inc./CN=Go Daddy Root Certificate Authority - G2
 3 s:/C=US/ST=State/L=Location/, Inc./CN=Go Daddy Root Certificate Authority - G2
   i:/C=US/ST=State/L=Location/, Inc./CN=Go Daddy Root Certificate Authority - G2
subject=/OU=Domain Control Validated/CN=*

We requested the certificate for domain clientsite.tld but got back a certificate for *

Clearly this shouldn't happen!

As a result, our alerts got sent to the client, because their site was being served by an incorrect certificate.

Changes made at Oh Dear to help verify this

When we received the first reports, it wasn't easy to troubleshoot this. You see, we connect to the hostname clientsite.tld, but that might be served by several IP addresses using DNS Round Robin.

$  dig +short +noshort clientsite.tld
clientsite.tld.		66	IN	A
clientsite.tld.		66	IN	A

In fact, that's how most CDNs operate: you point your domain to a CNAME, and they present multiple anycasted IPs, each capable of serving your site.

Our problem was that we didn't present the IP that we used for this certificate check back to you, the user. So how could you know which IP served the wrong certificate?

As part of our debugging, we made this available to all users in the detailed view of their certificate report:

Certificate retrieved from what IP?

To do this, we contributed to the upstream SSL package to expose the remote address IP, since this wasn't available out of the box.

/* ... */
$response['remoteAddress'] = stream_socket_get_name($client, true);

But, this wasn't enough to easily pinpoint the issue.

Working together with the CDN

You see, many of the IPs of a CDN are "anycast" IPs: you connect to an IP address, but there might be multiple servers worldwide responding to that very same IP.

The short version is: the one closest to you (in BGP-terms) will provide you the answer. But for the CDN provider, there's no way to uniquely identify which of their thousands of servers actually responded.

We worked together with them to provide all technical details of our checks and sufficient example/test cases to, at first, convince them of this problem. Then, our examples helped them to troubleshoot and resolve the problem on their end.

I wanted to let you know that we found two edge servers that were mis-behaving and causing the issue you were seeing. We did a full sweep of the thousands of servers we manage and confirmed no others were in this bad state.

-- Director of Customer Support

We're thrilled to see the widespread implications of our paranoid certificate monitoring, and that, as a result, we helped thousands of sites worldwide. Even those not using Oh Dear!

More updates

Want to get started? We offer a no-strings-attached 10 day trial. No credit card required.

Start monitoring

You're all set in
less than a minute!