Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Critical
-
Resolution: Duplicate
-
Affects Version/s: 2.0
-
Fix Version/s: 2.0.1
-
Component/s: couchbase-bucket
-
Security Level: Public
-
Labels:
-
Environment:Linux
Description
With 5 buckets on the same server node, even without any client workload, the memcached runs 50% of CPU means making 2 CPUs busy in 4 CPU box.
While running gdb, it shows that stats.numRemainingBgJobs is not updated correctly.
Here is the snapshot of the perf output:
19.88% memcached ep.so [.] VBucketMap::getBucket(unsigned short) const
12.31% memcached ep.so [.] BgFetcher::run(SingleThreadedRCPtr<Task>&)
11.81% memcached libpthread-2.12.so [.] pthread_mutex_lock
9.44% memcached ep.so [.] SpinLock::acquire()
6.60% memcached libpthread-2.12.so [.] pthread_mutex_unlock
6.39% memcached ep.so [.] VBucket::getBGFetchItems(std::tr1::unordered_map<unsigned long, std::list<VBucketBGFetchItem*, std::allocator<VBucketBGFetchItem*> >, std::tr1::hash<u
4.59% memcached ep.so [.] Mutex::release()
2.15% memcached ep.so [.] SpinLock::release()
1.80% memcached ep.so [.] Dispatcher::moveReadyTasks(timeval const&)
1.73% memcached ep.so [.] Mutex::acquire()
1.49% memcached ep.so [.] SpinLock::~SpinLock()
While running gdb, it shows that stats.numRemainingBgJobs is not updated correctly.
Here is the snapshot of the perf output:
19.88% memcached ep.so [.] VBucketMap::getBucket(unsigned short) const
12.31% memcached ep.so [.] BgFetcher::run(SingleThreadedRCPtr<Task>&)
11.81% memcached libpthread-2.12.so [.] pthread_mutex_lock
9.44% memcached ep.so [.] SpinLock::acquire()
6.60% memcached libpthread-2.12.so [.] pthread_mutex_unlock
6.39% memcached ep.so [.] VBucket::getBGFetchItems(std::tr1::unordered_map<unsigned long, std::list<VBucketBGFetchItem*, std::allocator<VBucketBGFetchItem*> >, std::tr1::hash<u
4.59% memcached ep.so [.] Mutex::release()
2.15% memcached ep.so [.] SpinLock::release()
1.80% memcached ep.so [.] Dispatcher::moveReadyTasks(timeval const&)
1.73% memcached ep.so [.] Mutex::acquire()
1.49% memcached ep.so [.] SpinLock::~SpinLock()
One is about using the items2fetch.size. When we delete an element in items2fetch, the size changes. The logic in clearItems() is wrong:
#1 0x00007f9e35eb6e0b in BgFetcher::clearItems (this=0x7f9e2003ae50, vbId=35) at src/bgfetcher.cc:71
71 vb_bgfetch_queue_t::iterator itr = items2fetch.begin();
(gdb) p items2fetch.size()
$1 = 1
(gdb) c
Continuing.
Breakpoint 3, BgFetcher::clearItems (this=0x7f9e2003ae50, vbId=35) at src/bgfetcher.cc:85
85 delete *dItr;
(gdb) n
86 assert(items2fetch_size == items2fetch.size());
(gdb) p items2fetch.size()
$2 = 1
The other one is the step to update numRemainingBgJobs. There is a condition that, the bgfetcher can run in between the item is just put on the bgfetcher queue, but the numRemainingBgJobs has not been updated. So the value can reach to -1 which is max value in the linux box I tested. This could be adjusted by later ++ operation on linux the version I am testing. But it doesn't always guaranteed across all OS.