A story about slow DNS & (a)sync dns.lookup
We were investigating a performance issue which initially looked like a simple downstream service slowness (e.g. a service you call, simply becomes slower and slows you down)
When following the trail of request (we used New Relic and our logs) we saw that
- Service A calls Service B and the average latency of response grew up to steady 500ms per call
- Service B shows that average latency of all calls to him as average of 150ms per call
- Service B calls MySQL and shows that average query execution time is 145 ms
- On MySQL the average query execution time is estimated as 15ms
Few additional facts
- Calling Service is is done using DNS name
- Calling MySQL is done using IP
- One of the traces for the service calls shows that majority of time was taken by dns.lookup
- The entire system became slow at portion of times
When looking at above it was almost clear to me that the DNS was to blame since
- DNS lookup is not cached by NodeJs and we don't have system local cache
- I couldn't find any other delay between services
- It's well known that if DNS is not behaving --> system slowly dies ...
- I looked at a network tool and saw plenty of DNS delays and timeout :)
I read few articles and one of them reminded me that DNS.Lookup is async but in essence it's a blocking call that gets trigged in the worker thread on NodeJS (Thread pool).
Additionally the pool size is actually defaulted as 4 ! which means that if we have 4 slow running DNS queries running we actually block the IO pool and our entire NodeJS service is in danger of severe latency.
<Party-Pooper> Than a friend came to me and reminded me that we don't use DNS when calling DB <Party-Pooper />
All above 'facts' lead me to believe that we are dealing with slow network problem (potentially some device in the middle is mis-behaving)
Next step is to work with network expert and see the source of culprit
* The default 4 looks good as starting point, but I am sure that we can actually increase our system performance if we monitor and tweak
* Any one knows how to monitor the thread queue length ?
A story about slow DNS & (a)sync dns.lookup
Reviewed by Ran Davidovitz
on
4:24 PM
Rating:

No comments:
Post a Comment