new_relic

in Development

Understanding New Relic Physical Memory Usage

Having spent years as a Unix sysadmin, there have been two tools I’ve come to rely on to get a quick overview of memory usage on a system: free and top. I’ve used them time and time again to troubleshoot, but it wasn’t until recently that I realized there are actually differing schools of thought on how one should assess memory usage on a system. I found the information useful and wanted to share it. Here’s what I learned.

I recently installed New Relic on one of our Logstash indexing servers and allowed it to run for a bit to collect some data. As I was looking at the Physical Memory readout, I noticed a huge discrepancy between what New Relic displayed as used memory and what free and top displayed as used memory. This was really bugging me so I started down a troubleshooting path that ended with me contacting New Relic support.

This is what New Relic’s dashboard was showing me:

new_relic_physical_memory_dashboard

New Relic reports 2870MB memory used, 37% of total memory.  Now if you take a look at the output of top and free, there’s a big discrepancy.

 

Here’s the top output, which states over 7G memory is used.

 

new_relic_memory_top

 

And here is the free output, also reporting over 7G memory used.

 

new_relic_memory_free

 

If you breakdown the memory leader board on the system according to New Relic, you’ll see they all add up to the graph above, 2870MB.

new_relic_memory_processes

 

So what in the world is going on?  It turns out that there is a big difference between how top, free, and New Relic’s nrsysmond daemon determine total memory usage.  The major difference being that top and free include memory buffers and memory cache when displaying total memory used, while New Relic nrsysmond does not.

Let’s take a look at the the formula New Relic uses to calculate memory usage on a system.  The nrsysmond daemon uses the data found in /proc/meminfo to calculate total memory usage, and a snapshot of that on this server looks like this:

new_relic_memory_meminofo

 

Let’s use these numbers to calculate memory usage the way New Relic does it.  I’ve highlighted the numbers above that are used in this formula:

System/Memory/Used/Bytes = (Memory Used - Kern) / MemTotal

This can be broken down into:

Memory Used = MemTotal - MemFree
Kern = Buffers + Cached + [SReclaimable]**

**Note: SReclaimable should only be in this formula if you have the following setting in your  nrsysmond.cfg file:

ignore_reclaimable = true
We have this configuration option set to false so we won’t be including it below.  This is a somewhat confusing setting, but this excerpt from the New Relic documentation should clarify it:

When set to the default of false, the agent treats reclaimable memory as in-use. If set to true, the agent reports reclaimable memory as free (it should also be mentioned, that the default for this setting is true on version 2.1.1 or greater of nrsysmond).

Based on our /proc/meminfo data, those numbers in New Relic’s formula would be:

Memory Used  =  7900768 - 146072  = 7,754,696
Kern = 112300 + 4706812 = 4,819,112
System Memory Used in Bytes = (7,754,696 -4,819,112) / 7900768
System Memory Used  = 37.16%

This coincides with the memory reading on our New Relic dashboard, and clears up the mystery.  Whether or not you agree that this method of calculating total memory usage on a server is the right or wrong way, understanding the variance on how different tools display memory usage can offer a new perspective on monitoring memory on your servers.