4 minutes
How to boost email throughput
At the core of our business at MailerLite are, as our name says, emails — in particular email sending. We send over 30 million emails per day, and a lot of our servers are MTAs doing their job.
To maximize the throughput of email servers you can do a few things, optimise the resources, optimise MTA, but there is one more thing that you can optimise and that is obvious but overlooked - DNS tweaking. I won’t go into the whole flow of sending and receiving the message, but the DNS part. When you want to send the email, MTA has to do at least two DNS queries - to get the MX record of the server you are trying to send the email to, and to get an A record of that server and there are even more — it could be getting PTR records. You can imagine how time costly is this operation when you are sending 30 million emails per day. To get a clearer picture — it is 21000 emails per minute. If you multiply it with the number of DNS queries that each email sending has to do, you come to crazy numbers.
To mitigate that overhead of running those DNS queries to an external DNS server, we came up with the idea of using a local recursive, validating, and caching DNS server. That is where Unbound comes into the picture.
Recursive part of Unbound gets the request from the client to traverse the path of DNS to get the answer to the question.
We will assume that the question in this example is “What is the A record of mailerlite.com”:
- Client sends a query “what is the A record of mailerlite.com” to the server configured in resolv.conf
- DNS server (Unbound) checks if it has that record in its cache
- If the server wasn’t asked before for that record it won’t be found in the cache so it has to look it up on the root servers
- Unbound asks the root servers for the record
- Root server replies to TLD servers for .com
- Unbound asks another query — this time to the .com TLD servers
- This time - TLD server replies with referral to the authoritative name server for mailerlite.com at CloudFlare (we host all our domains at CF)
- Once more, Unbound sends another query for A record of mailerlite.com to CF
- CF replies with the A record of mailerlite.com which contains the IP address for it
- Unbound answers with the IP address of mailerlite.com to the client
Sounds like a longer process than it should be, right? Imagine doing this each time we want to send an email. Yikes, it would kill our performance.
That is where caching part of Unbound comes into place. If the query has been asked before, Unbound will respond from its cache, if not — it will recursively resolve the query if it hasn’t seen it before. This operation saves a lot of time as cached results are returned in a couple of microseconds while usual DNS queries (recursive) can take hundreds of miliseconds or more.
The following is the speed comparison when the query from cache and from publicly available DNS servers:
DNS Server | avg response | min response | max response |
---|---|---|---|
Unbound - local | 160.12µs | 106.05µs | 252.481µs |
8.8.8.8 | 25.78ms | 8.05ms | 62.24ms |
8.8.4.4 | 54.64ms | 8.03ms | 355.17ms |
208.67.222.222 | 32.63ms | 10ms | 37.17ms |
156.154.71.1 | 27.20ms | 10ms | 30.11ms |
216.146.35.35 | 112.74ms | 10ms | 146.77ms |
Grafana graphs:
This one shows the cache hit/miss ratio. For the last 24 hours, we had 432 million hits and only 9.28 million misses. That means that only 2.14% of the queries went to external DNS servers to check the records and that the other 97.86% was answered from the cache in matter of microseconds.
Unbound is a fantastical piece of software, and as you have seen it provides quite a boost and also increases our email throughput.
646 Words
2020-10-18 13:15