Out of Memory regularly on 2.0.1 and 2.1.1

haakonl · February 6, 2014, 9:19am

Hi,

we have a cluster where the nodes fails regularly because of an out of memory crash. In the console, we find this message in the logs:

Control connection to memcached on ‘ns_1@10.45.11.215’ disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_binary_vocal_recv,
5},
{mc_client_binary,
select_bucket,
2},
{ns_memcached,
ensure_bucket,
2},
{ns_memcached,
handle_info,
2},
{gen_server,
handle_msg,
5},
{ns_memcached,
init,
1},
{gen_server,
init_it,
6},
{proc_lib,
init_p_do_apply,
3}]}

And in the syslog on the server we find this:

Out of memory: Kill process 3264 (memcached) score 542 or sacrifice child
Killed process 3264 (memcached) total-vm:9536608kB, anon-rss:4115460kB, file-rss:2076kB

We’ve had this issue on both 2.0.1 and 2.1.1.

However, on one of our other clusters, also running 2.0.1, it never happens! The only difference between the 2 clusters is that the one that never fails does not have any views. The one that fails has 3 views that is updated constantly. Could this be the source of the memory leak?

Please advice.

avengedsixfold · February 6, 2014, 11:31am

Please can you provide more details, cluster size, utilized ram, how big is your view data? Do you use stale == false with your views? How many times does it crash? No of replicas etc. Cheers!

haakonl · February 6, 2014, 11:51am

Hi,

thanks for getting back to me.

The cluster is 3 nodes with 17.5Gb RAM in total. We’re using < 3Gb at the moment. There is 1 replica.

I’m not sure what you mean with “how big is your view data”. The view gets all objects for the application to loop through and destroy the objects that meets a set of criteria. At the moment the number of objects returned by the view is around 500 000, but it’s growing rapidly since around 50 000 new objects are created every day and only around 25 000 are deleted every day.

We run a cron job every night to keep the view warm:
View.new(:game, :by_updated_at).query(stale: ‘update_after’, include_docs: true).count

Then a little later another cron job runs a query to get all the objects from the view:
doc = design_doc.send(@view_name, {include_docs: true, stale: false}.merge(params))

How often it crashes depends on the traffic. But at least 2-3 times per week. It looks like the memcached process is constantly growing until it runs out of memory and is killed. Then it takes up to 1 minute before the process is back up and running, and in the meantime we get thousands of emails with couchbase errors from our error notification plugin (Airbrake.io). The errors are a mix of Couchbase::Error::TemporaryFail and Couchbase::Error::Timeout.

Thanks.

haakonl · February 6, 2014, 11:53am

I found the view size in the couchbase console statistics, it is 5.3 Mb.

avengedsixfold · February 6, 2014, 12:17pm

Only 5.3mb? Wow thats low, could you provide the view you use? How many view queries a second are you averaging? We have much larger emitted view sizes and have never fallen over on a larger cluster. How many cores do your nodes have?

haakonl · February 6, 2014, 12:22pm

Hi,

so, are we even sure this is related to the usage of views? Anyway…

There’s only 1 query per 24 hours, so then it should be 0,0000116 times per second

Each node has 4 vCPUs and 14 ECUs (Amazon EC2).

Here is the view we’re using:

function(doc, meta) {
var Base64 = {
// private property
_keyStr : “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=”,

  // public method for decoding
  decode : function (input) {
      var output = "";
      var chr1, chr2, chr3;
      var enc1, enc2, enc3, enc4;
      var i = 0;

      input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");

      while (i < input.length) {
          enc1 = this._keyStr.indexOf(input.charAt(i++));
          enc2 = this._keyStr.indexOf(input.charAt(i++));
          enc3 = this._keyStr.indexOf(input.charAt(i++));
          enc4 = this._keyStr.indexOf(input.charAt(i++));

          chr1 = (enc1 << 2) | (enc2 >> 4);
          chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
          chr3 = ((enc3 & 3) << 6) | enc4;

          output = output + String.fromCharCode(chr1);

          if (enc3 != 64) {
              output = output + String.fromCharCode(chr2);
          }
          if (enc4 != 64) {
              output = output + String.fromCharCode(chr3);
          }
      }
      return output;
  },

};

(function(globalScope) {
globalScope.msgpack = {
unpack: msgpackunpack, // msgpack.unpack(data:BinaryString/ByteArray):Mix
// [1][String to mix] msgpack.unpack("…") -> {}
// [2][ByteArray to mix] msgpack.unpack([…]) -> {}
};

var _bin2num    = {}, // BinaryStringToNumber   { "\00": 0, ... "\ff": 255 }
    _num2bin    = {}, // NumberToBinaryString   { 0: "\00", ... 255: "\ff" }
    _num2b64    = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
                   "abcdefghijklmnopqrstuvwxyz0123456789+/").split(""),
    _buf        = [], // decode buffer
    _idx        = 0,  // decode buffer[index]
    _error      = 0,  // msgpack.pack() error code. 1 = CYCLIC_REFERENCE_ERROR
    _isArray    = Array.isArray || (function(mix) {
                    return Object.prototype.toString.call(mix) === "[object Array]";
                  }),
    _toString   = String.fromCharCode, // CharCode/ByteArray to String
    _MAX_DEPTH  = 512;

// msgpack.unpack
function msgpackunpack(data) {
    _buf = typeof data === "string" ? toByteArray(data) : data;
    _idx = -1;
    return decode(); // mix or undefined
}

// inner - decoder
function decode() { // @return Mix:
    var size, i, iz, c, num = 0,
        sign, exp, frac, ary, hash,
        buf = _buf, type = buf[++_idx];

    if (type >= 0xe0) {             // Negative FixNum (111x xxxx) (-32 ~ -1)
        return type - 0x100;
    }
    if (type < 0xc0) {
        if (type < 0x80) {          // Positive FixNum (0xxx xxxx) (0 ~ 127)
            return type;
        }
        if (type < 0x90) {          // FixMap (1000 xxxx)
            num  = type - 0x80;
            type = 0x80;
        } else if (type < 0xa0) {   // FixArray (1001 xxxx)
            num  = type - 0x90;
            type = 0x90;
        } else { // if (type < 0xc0) {   // FixRaw (101x xxxx)
            num  = type - 0xa0;
            type = 0xa0;
        }
    }
    switch (type) {
    case 0xc0:  return null;
    case 0xc2:  return false;
    case 0xc3:  return true;
    case 0xca:  // float
                num = buf[++_idx] * 0x1000000 + (buf[++_idx] << 16) +
                                                (buf[++_idx] <<  8) + buf[++_idx];
                sign =  num & 0x80000000;    //  1bit
                exp  = (num >> 23) & 0xff;   //  8bits
                frac =  num & 0x7fffff;      // 23bits
                if (!num || num === 0x80000000) { // 0.0 or -0.0
                    return 0;
                }
                if (exp === 0xff) { // NaN or Infinity
                    return frac ? NaN : Infinity;
                }
                return (sign ? -1 : 1) *
                            (frac | 0x800000) * Math.pow(2, exp - 127 - 23); // 127: bias
    case 0xcb:  // double
                num = buf[++_idx] * 0x1000000 + (buf[++_idx] << 16) +
                                                (buf[++_idx] <<  8) + buf[++_idx];
                sign =  num & 0x80000000;    //  1bit
                exp  = (num >> 20) & 0x7ff;  // 11bits
                frac =  num & 0xfffff;       // 52bits - 32bits (high word)
                if (!num || num === 0x80000000) { // 0.0 or -0.0
                    _idx += 4;
                    return 0;
                }
                if (exp === 0x7ff) { // NaN or Infinity
                    _idx += 4;
                    return frac ? NaN : Infinity;
                }
                num = buf[++_idx] * 0x1000000 + (buf[++_idx] << 16) +
                                                (buf[++_idx] <<  8) + buf[++_idx];
                return (sign ? -1 : 1) *
                            ((frac | 0x100000) * Math.pow(2, exp - 1023 - 20) // 1023: bias
                             + num * Math.pow(2, exp - 1023 - 52));
    // 0xcf: uint64, 0xce: uint32, 0xcd: uint16
    case 0xcf:  num =  buf[++_idx] * 0x1000000 + (buf[++_idx] << 16) +
                                                 (buf[++_idx] <<  8) + buf[++_idx];
                return num * 0x100000000 +
                       buf[++_idx] * 0x1000000 + (buf[++_idx] << 16) +
                                                 (buf[++_idx] <<  8) + buf[++_idx];
    case 0xce:  num += buf[++_idx] * 0x1000000 + (buf[++_idx] << 16);
    case 0xcd:  num += buf[++_idx] << 8;
    case 0xcc:  return num + buf[++_idx];
    // 0xd3: int64, 0xd2: int32, 0xd1: int16, 0xd0: int8
    case 0xd3:  num = buf[++_idx];
                if (num & 0x80) { // sign -> avoid overflow
                    return ((num         ^ 0xff) * 0x100000000000000 +
                            (buf[++_idx] ^ 0xff) *   0x1000000000000 +
                            (buf[++_idx] ^ 0xff) *     0x10000000000 +
                            (buf[++_idx] ^ 0xff) *       0x100000000 +
                            (buf[++_idx] ^ 0xff) *         0x1000000 +
                            (buf[++_idx] ^ 0xff) *           0x10000 +
                            (buf[++_idx] ^ 0xff) *             0x100 +
                            (buf[++_idx] ^ 0xff) + 1) * -1;
                }
                return num         * 0x100000000000000 +
                       buf[++_idx] *   0x1000000000000 +
                       buf[++_idx] *     0x10000000000 +
                       buf[++_idx] *       0x100000000 +
                       buf[++_idx] *         0x1000000 +
                       buf[++_idx] *           0x10000 +
                       buf[++_idx] *             0x100 +
                       buf[++_idx];
    case 0xd2:  num  =  buf[++_idx] * 0x1000000 + (buf[++_idx] << 16) +
                       (buf[++_idx] << 8) + buf[++_idx];
                return num < 0x80000000 ? num : num - 0x100000000; // 0x80000000 * 2
    case 0xd1:  num  = (buf[++_idx] << 8) + buf[++_idx];
                return num < 0x8000 ? num : num - 0x10000; // 0x8000 * 2
    case 0xd0:  num  =  buf[++_idx];
                return num < 0x80 ? num : num - 0x100; // 0x80 * 2
    // 0xdb: raw32, 0xda: raw16, 0xa0: raw ( string )
    case 0xdb:  num +=  buf[++_idx] * 0x1000000 + (buf[++_idx] << 16);
    case 0xda:  num += (buf[++_idx] << 8)       +  buf[++_idx];
    case 0xa0:  // utf8.decode
                for (ary = [], i = _idx, iz = i + num; i < iz; ) {
                    c = buf[++i]; // lead byte
                    ary.push(c < 0x80 ? c : // ASCII(0x00 ~ 0x7f)
                             c < 0xe0 ? ((c & 0x1f) <<  6 | (buf[++i] & 0x3f)) :
                                        ((c & 0x0f) << 12 | (buf[++i] & 0x3f) << 6
                                                          | (buf[++i] & 0x3f)));
                }
                _idx = i;
                return ary.length < 10240 ? _toString.apply(null, ary)
                                          : byteArrayToByteString(ary);
    // 0xdf: map32, 0xde: map16, 0x80: map
    case 0xdf:  num +=  buf[++_idx] * 0x1000000 + (buf[++_idx] << 16);
    case 0xde:  num += (buf[++_idx] << 8)       +  buf[++_idx];
    case 0x80:  hash = {};
                while (num--) {
                    // make key/value pair
                    size = buf[++_idx] - 0xa0;

                    for (ary = [], i = _idx, iz = i + size; i < iz; ) {
                        c = buf[++i]; // lead byte
                        ary.push(c < 0x80 ? c : // ASCII(0x00 ~ 0x7f)
                                 c < 0xe0 ? ((c & 0x1f) <<  6 | (buf[++i] & 0x3f)) :
                                            ((c & 0x0f) << 12 | (buf[++i] & 0x3f) << 6
                                                              | (buf[++i] & 0x3f)));
                    }
                    _idx = i;
                    hash[_toString.apply(null, ary)] = decode();
                }
                return hash;
    // 0xdd: array32, 0xdc: array16, 0x90: array
    case 0xdd:  num +=  buf[++_idx] * 0x1000000 + (buf[++_idx] << 16);
    case 0xdc:  num += (buf[++_idx] << 8)       +  buf[++_idx];
    case 0x90:  ary = [];
                while (num--) {
                    ary.push(decode());
                }
                return ary;
    }
    return;
}

// inner - byteArray To ByteString
function byteArrayToByteString(byteArray) {
    try {
        return _toString.apply(this, byteArray); // toString
    } catch(err) {
        ; // avoid "Maximum call stack size exceeded"
    }
    var rv = [], i = 0, iz = byteArray.length, num2bin = _num2bin;

    for (; i < iz; ++i) {
        rv[i] = num2bin[byteArray[i]];
    }
    return rv.join("");
}

// inner - BinaryString To ByteArray
function toByteArray(data) {
    var rv = [], bin2num = _bin2num, remain,
        ary = data.split(""),
        i = -1, iz;

    iz = ary.length;
    remain = iz % 8;

    while (remain--) {
        ++i;
        rv[i] = bin2num[ary[i]];
    }
    remain = iz >> 3;
    while (remain--) {
        rv.push(bin2num[ary[++i]], bin2num[ary[++i]],
                bin2num[ary[++i]], bin2num[ary[++i]],
                bin2num[ary[++i]], bin2num[ary[++i]],
                bin2num[ary[++i]], bin2num[ary[++i]]);
    }
    return rv;
}

// --- init ---
(function() {
    var i = 0, v;

    for (; i < 0x100; ++i) {
        v = _toString(i);
        _bin2num[v] = i; // "\00" -> 0x00
        _num2bin[i] = v; //     0 -> "\00"
    }
    for (i = 0x80; i < 0x100; ++i) { // [Webkit][Gecko]
        _bin2num[_toString(0xf700 + i)] = i; // "\f780" -> 0x80
    }
})();

})(this);
var decoded = Base64.decode(doc);
var unpacked = this.msgpack.unpack(decoded);
if(unpacked && typeof(unpacked) === ‘object’ ) {
if(unpacked.hasOwnProperty(‘creator’)) { // is a game
emit(unpacked.updated_at, null);
}
}
}

avengedsixfold · February 6, 2014, 2:13pm

Not sure if the views cause it but seeing as at the moment its the main difference I thought it’d be good to look. That is the most complex map I have ever seen in Couchbase x100. I will have to look at it a bit. If you have a support license I’d recommend filing a ticket.

haakonl · February 6, 2014, 2:14pm

Haha. Well, just remove the MessagePack bits, and it suddenly super simple.

Topic		Replies	Views
Memcached, couch_view_grou and couch_view_index_updater crashing Couchbase Server	1	2312	July 28, 2015
Couchbase node going down due to views indexing Couchbase Server	0	1324	September 21, 2016
Bucket is unheathy due to dropping off some lines by memcached Couchbase Server	1	1644	March 11, 2016
Fails related to memcached but memcached not used Couchbase Server	0	2025	December 11, 2015
Hard Out Of Memory problem Couchbase Server	0	1456	January 2, 2017

Out of Memory regularly on 2.0.1 and 2.1.1

Related topics