PKI
Public Key Infrastructure
Asymmetric Cryptography
Asymmetric cryptography, also known as public key cryptography, is the system made up of two keys- one private and one public. A message can be encrypted using a public key and then only decrypted using the private key. This is in contrast to symmetric encryption, where the same key is used to encrypt and decrypt a message.
The idea behind asymmetric encryption is that you keep your private key to yourself (private!) but go ahead and hand around your public key to just about anyone. If someone wants to send you a message but ensure that no one else can read it, they simply encrypt it using this readily available public key- since you're the only one with the private key, no one else can decrypt the message. This enforces 'confidentiality' of the sent message.
Digital signatures
So far, we've talked about asymmetric crypto as a way of sending a secret message to someone- we encrypt with their public key, then they are able to decrypt this message using their private key (and since their private key is secret, no one else is able to read our message). The public key is used to encrypt, then the private is used to decrypt.
Asymmetric crypto can be used to authenticate and validate messages as well: that is, ensure a message comes from the person you think it is coming from, and that the message hasn't been changed. If we know we have a copy of A's public key, they can send out a message encrypted using their private key. If the public key is able to decrypt this message, we can trust that the message was sent by A, as only A could have encrypted the message using their private key. In order to prevent tampering, A could also calculate the hashed value of the message they're sending and then encrypt this value using their private key. When we go to read the message, we can check that A's public key can decrypt the included hash value (thus checking that the message was actually sent by A), and then recalculate the hash of the message, checking that it matches this included, decrypted value. If the message has been changed since A sent it, the hash values won't match- no one but A could replace this hashed value as we know it was encrypted with A's private key if we're able to decrypt it with A's public key.
Take note of the difference: if we encrypt a message using someone's public key, we can be sure only they can read it, as only they have their private key and thus the ability to decrypt our message (confidentiality). However, if we encrypt some value using our private key, anyone can decrypt it using our public key, but they will know the value came from us if this decryption works as we're the only ones who could have encrypted with our private key (authentication). If we use our private key to encrypt some message's hash and then send this value along with the message, we can make sure the message isn't tampered with (integrity): anyone reading the message can decrypt the related value with our public key (thus seeing that the message was sent by us) and then validate the message's current hash matches this sent hash value (thus checking that the message is in the same state as when we sent it). Note that anyone can read the message when we've encrypted it using our private key- there is no confidentiality.
Also note that the ideas of digital signatures and sending of confidential messages are coupled to two very key concepts: first, A has a secret private key that no one else has access to, and we know for certain that we are working with a copy of the public key associated with A's private key. If this private key is leaked or becomes accessible to someone else, we no longer know if a message was sent by A or that our messages can only be read by A. Second, if we have a public key that we think is tied to A's private key but is actually tied to B's private key, we could end up trusting information sent by B thinking it comes from A, or sending secret information to B rather than A.
Certificate Authorities
Certificate authorities (or CAs) are trusted parties that help us address that second issue- how do we make sure the public key we're working with when trying to communicate confidentially with A is actually A's public key, not some other individual's key who is now intercepting all our communication with A?
What CAs do is issue something called a digital certificate, which is basically a public key and a bunch of ownership information which is then digitally signed by the CA. This enables you to use someone's public key with a certain degree of certainty that that key actually belongs to the person you thing it does. This naturally leads us to an excellent follow-up question: who are the CAs, and why should we trust them?
Your browser (and potentially your computer) are set up with a default list of trusted CAs (root CAs). When requesting a webpage via TLS, the server hosting this page will provide the browser with its certificate, which is signed by some CA (intermediate CA). The browser then checks that there is a 'chain of trust' between one of its trusted root CAs and the intermediate CA which signed the server's certificate. That is, even if the browser does not directly trust the CA who issued the site's certificate, one of the CAs it trusts might trust this CA, or one of the CAs it trusts might trust another CA, which in turn trusts the issuing CA. In practice, the acceptable limit to the length of this trust chain is generally 3 levels (with the 4th level being the site's certificate).
In terms of certificates and digital signatures, root CAs all have 'self-signed' certificates. That is, their certificates are digitally signed by themselves- i.e. the same public key contained within their certificate will be used in the digital signing of that certificate. These root CAs in turn are the ones to digitally sign trusted intermediate CA certificates. Finally, these intermediate CAs are the ones to digitally sign leaf site certificates. Therefore, since we have our trusted root CAs' public keys, we are able to validate their trusted intermediate CA certificates. Similarly, we can use the intermediate CA public keys to validate the certificates of the sites we're actually trying to visit.
You can see this chain of trust in practice by visiting a site over HTTPS and clicking the little padlock icon, and viewing certificate information. You'll be able to visualize the signing from the root CA down to the leaf certificate. Demo: click little lock icon
Why do we bother with this chain of trust, rather than having the root CAs simply sign leaf certificates directly? One reason is to lessen damage if a CA were to be compromised. If a CA is compromised (perhaps their private key is leaked), then none of the certificates they have signed can be trusted anymore.
CSR
When a website wants to be issued a trusted certificate, they can first generate a new public and private key pair locally. This way, we don't need to worry about ever transporting or sharing the private key. Next, this public key is bundled with various metadata such as what organization this site belongs to, and which domain the certificate will be covering- this is known as a Certificate Signing Request (CSR). CSRs will be digitally signed as well to ensure integrity when sent to the CA. Important: Common name- FQDN. "This must match exactly what you type in your web browser or you will receive a name mismatch error." Selecting a CA?
Next, the CA must validate that this certificate request is valid- generally, that the requester actually owns or controls the domain they've requested a certificate for. There are three types of validations that could be performed- domain validation (DV), organization validation (OV), and extended validation (EV). DV is by far the most common- in this case, the CA must ensure that the certificate requester has "effective control of the domain" that the certificate has been generated for (the provided Common Name/FQDN entered when generating the CSR). For OV and EV, the CA must additionally verify the requester's "real-world identity", and will perform validations such as ensuring your company is officially registered within a specific locale and checking information via a phone call to a registered phone number associated with your organization.
In order to prove control of a domain, CAs might have processes that involve sending you some sort of "challenge" code that will need to be accessible via a specific location at the requested domain, or added to the DNS record associated with the domain. The ACME process (Automatic Certificate Management Environment) works to automate this challenge, and is used by the CA LetsEncrypt. The RFC is a great source of information on the challenge process.
One challenge example is an HTTP challenge, where the requester proves it controls a domain name by showing it can create resources on the server listening for requests under that domain. The validator asks the requester to add a file containing a specific string on some specific path- for ACME, this path is ./well-known/acme-challenge/<string value>, with the page result also displaying this string value. The requester must also sign a value using their private key and send this value back to the validator, proving the validator has the correct public key for the requester's private. If the validator is able to access the HTTP resource on the correct domain and decrypt the the sent value using the public key from the CSR, then DV has been completed.
The CA will then create a digitally signed certificate for the requester, which includes the CA's identifying information as well as the requester's public key. The requester can then load this certificate into their web server, so that it is accessible for inbound TLS connection requests.
SSL/TLS handshake + trusted CAs + cert validation
When you visit a website over HTTPS, your browser receives the site's certificate as part of connection initialization. Using the CA information included in the certificate, your browser will try to validate that the CA who signed this certificate is trusted by one of the CAs it trusts.
Types of SSL Certificates
CRL
Whenever we talk about an authentication system, it's important to think about the idea of revocation- that is, once we've granted someone trust or privileges, how can we take that trust away if something goes wrong.
What HTTPS isn't
Example: apple.com vs app1e.com. Sometimes HTTPS could backfire- users see the little padlock and assume a site is safe or secure or legit. Just because app1e.com has a CA-signed certificate, that doesn't mean the site doesn't have nefarious intentions. HTTPS just means your information is encrypted, and the public key you've used to set up that encryption actually belongs to a given site. That site could still be doing totally malicious or illegitimate things with the data you're sending it.