How to boost email throughput

Posted on Oct 18, 2020

At the core of our business at MailerLite are, as our name says, emails — in particular email sending. We send over 30 million emails per day, and a lot of our servers are MTAs doing their job.

To maximise the throughput of email servers you can do a few things, optimise the resources, optimise MTA, but there is one more thing that you can optimise and that is obvious but overlooked - DNS tweaking. I won’t go into the whole flow of sending and receiving the message, but the DNS part. When you want to send the email, MTA has to do at least two DNS queries - to get the MX record of the server you are trying to send the email to, and to get an A record of that server and there are even more — it could be getting PTR records. You can imagine how time costly is this operation when you are sending 30 million emails per day. To get a clearer picture — it is 21000 emails per minute. If you multiply it with the number of DNS queries that each email sending has to do, you come to crazy numbers.

To mitigate that overhead of running those DNS queries to an external DNS server, we came up with the idea of using a local recursive, validating, and caching DNS server. That is where Unbound comes into the picture.

Recursive part of Unbound gets the request from the client to traverse the path of DNS to get the answer to the question.

We will assume that the question in this example is “What is the A record of mailerlite.com”:

Sounds like a longer process than it should be, right? Imagine doing this each time we want to send an email. Yikes, it would kill our performance.

That is where caching part of Unbound comes into place. If the query has been asked before, Unbound will respond from its cache, if not — it will recursively resolve the query if it hasn’t seen it before. This operation saves a lot of time as cached results are returned in a couple of microseconds while usual DNS queries (recursive) can take hundreds of miliseconds or more.

The following is the speed comparison when the query from cache and from publicly available DNS servers:

DNS Server avg response min response max response
Unbound - local 160.12µs 106.05µs 252.481µs
8.8.8.8 25.78ms 8.05ms 62.24ms
8.8.4.4 54.64ms 8.03ms 355.17ms
208.67.222.222 32.63ms 10ms 37.17ms
156.154.71.1 27.20ms 10ms 30.11ms
216.146.35.35 112.74ms 10ms 146.77ms

Grafana graphs:

Grafana 01
Cache hit/miss ratio

This one shows the cache hit/miss ratio. For the last 24 hours, we had 432 million hits and only 9.28 million misses. That means that only 2.14% of the queries went to external DNS servers to check the records and that the other 97.86% was answered from the cache in matter of microseconds.

Grafana 02
Response time distribution

Grafana 03
DNS query types

Unbound is a fantastical piece of software, and as you have seen it provides quite a boost and also increases our email throughput.

Share Tweet