Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 2.0-beta
-
Fix Version/s: 2.0-beta-2
-
Component/s: couchbase-bucket
-
Security Level: Public
-
Labels:
-
Environment:centos 6.2 64 bit on ec2 build 2.0.0-1808
Description
Create a 6 nodes cluster installed with couchbase server 2.0.0-1808. Consistent view is disable.
Each node has 14 GB RAM and 2 ebs volumes, one for /data and another for /view
Create 2 bucket and load 9 million items to each bucket.
Create a doc for bucket.
When add 2 nodes to cluster, rebalance failed due to one node down (segfault bugMB-6638).
Stop all loads and restart couchbase server on down node
During rebalance, got write commit failed error. Check memcached logs, see the following error
memcached.log.3.txt:Tue Oct 9 00:18:58.901338 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/122.couch.1 option=0 rev=1 retried=2 error=no such file [none]
memcached.log.3.txt:Tue Oct 9 00:18:58.901368 UTC 3: Warning: failed to open database, vbucketId = 122 fileRev = 1 numDocs = 68
memcached.log.3.txt:Tue Oct 9 00:18:58.901378 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 122 rev = 1
memcached.log.3.txt:Tue Oct 9 00:18:58.904685 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/55.couch.1 option=0 rev=1 retried=2 error=no such file [none]
memcached.log.3.txt:Tue Oct 9 00:18:58.904700 UTC 3: Warning: failed to open database, vbucketId = 55 fileRev = 1 numDocs = 12
memcached.log.3.txt:Tue Oct 9 00:18:58.904708 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 55 rev = 1
memcached.log.3.txt:Tue Oct 9 00:23:24.492253 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/344.couch.1 option=0 rev=1 retried=2 error=no such file [none]
memcached.log.3.txt:Tue Oct 9 00:23:24.492286 UTC 3: Warning: failed to open database, vbucketId = 344 fileRev = 1 numDocs = 67
memcached.log.3.txt:Tue Oct 9 00:23:24.492295 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 344 rev = 1
memcached.log.3.txt:Tue Oct 9 00:23:40.153322 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/499.couch.1
memcached.log.3.txt:Tue Oct 9 00:23:40.153668 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/499.couch.1
* Database file exists in node
[root@ip-10-248-109-239 logs]# ls /data/saslbucket/ | grep 499
499.couch.1
[root@ip-10-248-109-239 logs]# ls /data/saslbucket/ | grep 122
122.couch.1
Will add collect info later
Each node has 14 GB RAM and 2 ebs volumes, one for /data and another for /view
Create 2 bucket and load 9 million items to each bucket.
Create a doc for bucket.
When add 2 nodes to cluster, rebalance failed due to one node down (segfault bug
Stop all loads and restart couchbase server on down node
During rebalance, got write commit failed error. Check memcached logs, see the following error
memcached.log.3.txt:Tue Oct 9 00:18:58.901338 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/122.couch.1 option=0 rev=1 retried=2 error=no such file [none]
memcached.log.3.txt:Tue Oct 9 00:18:58.901368 UTC 3: Warning: failed to open database, vbucketId = 122 fileRev = 1 numDocs = 68
memcached.log.3.txt:Tue Oct 9 00:18:58.901378 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 122 rev = 1
memcached.log.3.txt:Tue Oct 9 00:18:58.904685 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/55.couch.1 option=0 rev=1 retried=2 error=no such file [none]
memcached.log.3.txt:Tue Oct 9 00:18:58.904700 UTC 3: Warning: failed to open database, vbucketId = 55 fileRev = 1 numDocs = 12
memcached.log.3.txt:Tue Oct 9 00:18:58.904708 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 55 rev = 1
memcached.log.3.txt:Tue Oct 9 00:23:24.492253 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/344.couch.1 option=0 rev=1 retried=2 error=no such file [none]
memcached.log.3.txt:Tue Oct 9 00:23:24.492286 UTC 3: Warning: failed to open database, vbucketId = 344 fileRev = 1 numDocs = 67
memcached.log.3.txt:Tue Oct 9 00:23:24.492295 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 344 rev = 1
memcached.log.3.txt:Tue Oct 9 00:23:40.153322 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/499.couch.1
memcached.log.3.txt:Tue Oct 9 00:23:40.153668 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/499.couch.1
* Database file exists in node
[root@ip-10-248-109-239 logs]# ls /data/saslbucket/ | grep 499
499.couch.1
[root@ip-10-248-109-239 logs]# ls /data/saslbucket/ | grep 122
122.couch.1
Will add collect info later
Activity
Karan Kumar
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Summary | [system test] write commit failed | [system test] Disk write commit failure during rebalance |
Karan Kumar
made changes -
| Assignee | Karan Kumar [ karan ] | Chiyoung Seo [ chiyoung ] |
Chiyoung Seo
made changes -
| Assignee | Chiyoung Seo [ chiyoung ] | Jin Lim [ jin ] |
Chiyoung Seo
made changes -
| Fix Version/s | 2.0-beta-2 [ 10385 ] | |
| Affects Version/s | 2.0-beta [ 10113 ] | |
| Sprint Status | Current Sprint |
Jin Lim
made changes -
| Assignee | Jin Lim [ jin ] | Thuan Nguyen [ thuan ] |
Thuan Nguyen
made changes -
| Attachment | memcahced-node-109-80-build-1808-disk-write-commit-failed-20121008.tgz [ 15323 ] | |
| Attachment | memcahced-node-109-239-build-1808-disk-write-commit-failed-20121008.tgz [ 15324 ] |
Chiyoung Seo
made changes -
| Assignee | Thuan Nguyen [ thuan ] | Jin Lim [ jin ] |
Jin Lim
made changes -
| Assignee | Jin Lim [ jin ] | Thuan Nguyen [ thuan ] |
Thuan Nguyen
made changes -
| Summary | [system test] Disk write commit failure during rebalance | [system test] failed to delete data, cannot locate database file |
Thuan Nguyen
made changes -
| Assignee | Thuan Nguyen [ thuan ] | Jin Lim [ jin ] |
Jin Lim
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
Chiyoung Seo
made changes -
| Sprint Status | Current Sprint |
Farshid Ghods
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |
Let me know if this does not fall into ep_engine project. I will re-assign it accordingly.