[MB-7337] [system test] node shown as pending for a long time after index path change Created: 03/Dec/12  Updated: 05/Dec/12  Resolved: 05/Dec/12

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.0
Fix Version/s: 2.0.1
Security Level: Public

Type: Bug Priority: Critical
Reporter: Thuan Nguyen Assignee: Ketaki Gangal
Resolution: Fixed Votes: 0
Labels: system-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows 2008 R2 64bit

Attachments: File 6337.tar    

 Description   
Online upgrade 5 nodes cluster from 1.8.1
Cluster has one default bucket with 20 million items. Data path is set to c:/data
10.3.2.11
10.3.2.12
10.3.2.16
10.3.2.10
10.3.2.75

to 2.0.0-1971
10.3.2.11 (data path and index path is set to default path when install 2.0.0-1971)
10.3.2.16
10.3.2.75
10.3.2.76
10.3.2.77

Change index path in node 11 to new path (c:/index), couchbase server on node 11 restart.

curl -i -v --data "index_path=c:/index" "http://Administrator:password@10.3.2.11:8091/nodes/self/controller/settings"
* About to connect() to 10.3.2.11 port 8091 (#0)
* Trying 10.3.2.11... Connection refused
* couldn't connect to host
* Closing connection #0
curl: (7) couldn't connect to host

I try to run again with cygwin style path

 curl -i -v --data "index_path=/cygdrive/c/index" "http://Administrator:password@10.3.2.11:8091/nodes/self/controller/settings"
* About to connect() to 10.3.2.11 port 8091 (#0)
* Trying 10.3.2.11... connected
* Connected to 10.3.2.11 (10.3.2.11) port 8091 (#0)
* Server auth using Basic with user 'Administrator'
> POST /nodes/self/controller/settings HTTP/1.1
> Authorization: Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==
> User-Agent: curl/7.21.3 (x86_64-pc-linux-gnu) libcurl/7.21.3 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
> Host: 10.3.2.11:8091
> Accept: */*
> Content-Length: 28
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 400 Bad Request
HTTP/1.1 400 Bad Request
< Server: Couchbase Server 2.0.0-1971-rel-enterprise
Server: Couchbase Server 2.0.0-1971-rel-enterprise
< Pragma: no-cache
Pragma: no-cache
< Date: Mon, 03 Dec 2012 21:14:06 GMT
Date: Mon, 03 Dec 2012 21:14:06 GMT
< Content-Type: application/json
Content-Type: application/json
< Content-Length: 47
Content-Length: 47
< Cache-Control: no-cache
Cache-Control: no-cache

<
* Connection #0 to host 10.3.2.11 left intact
* Closing connection #0
["An absolute path is required for index_path"]

In log page, see couchbase server restart on node 11

Couchbase Server has started on web port 8091 on node 'ns_1@10.3.2.11'. menelaus_sup001 ns_1@10.3.2.11 13:12:38 - Mon Dec 3, 2012
Shutting down bucket "default" on 'ns_1@10.3.2.11' for server shutdown ns_memcached002 ns_1@10.3.2.11 13:09:28 - Mon Dec 3, 2012
Setting database directory path to c:/Program Files/Couchbase/Server/var/lib/couchbase/data and index directory path to c:/index ns_storage_conf000 ns_1@10.3.2.11 13:09:28 - Mon Dec 3, 2012

Try connect to memcached on node 11, it hang

thuan@ubu-1604:/opt/couchbase/bin$ ./cbstats 10.3.2.11:11210 raw warmup




 Comments   
Comment by Thuan Nguyen [ 03/Dec/12 ]
Reproduce in ubuntu 11.04 64bit with couchbase server 2.0.0-1971
Install couchbase server 2.0.0-1971 on node 10.3.2.4 and set data and index to default path.
Create default bucket.
Change index path to /data from default path using curl command

huan@ubu-1604:/opt/couchbase/bin$ curl -i -v --data "index_path=/data" "http://Administrator:password@10.3.2.4:8091/nodes/self/controller/settings" * About to connect() to 10.3.2.4 port 8091 (#0)
* Trying 10.3.2.4... connected
* Connected to 10.3.2.4 (10.3.2.4) port 8091 (#0)
* Server auth using Basic with user 'Administrator'
> POST /nodes/self/controller/settings HTTP/1.1
> Authorization: Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==
> User-Agent: curl/7.21.3 (x86_64-pc-linux-gnu) libcurl/7.21.3 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
> Host: 10.3.2.4:8091
> Accept: */*
> Content-Length: 16
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Server: Couchbase Server 2.0.0-1971-rel-enterprise
Server: Couchbase Server 2.0.0-1971-rel-enterprise
< Pragma: no-cache
Pragma: no-cache
< Date: Mon, 03 Dec 2012 23:50:40 GMT
Date: Mon, 03 Dec 2012 23:50:40 GMT
< Content-Length: 0
Content-Length: 0
< Cache-Control: no-cache
Cache-Control: no-cache

<
* Connection #0 to host 10.3.2.4 left intact
* Closing connection #0

Couchbase server shutdown as in log below.

Couchbase Server has started on web port 8091 on node 'ns_1@127.0.0.1'. menelaus_sup001 ns_1@127.0.0.1 15:50:39 - Mon Dec 3, 2012
I'm the only node, so I'm the master. mb_master000 ns_1@127.0.0.1 15:50:39 - Mon Dec 3, 2012
Shutting down bucket "default" on 'ns_1@127.0.0.1' for server shutdown ns_memcached002 ns_1@127.0.0.1 15:50:30 - Mon Dec 3, 2012
Setting database directory path to /opt/couchbase/var/lib/couchbase/data and index directory path to /data ns_storage_conf000 ns_1@127.0.0.1 15:50:30 - Mon Dec 3, 2012
Comment by Farshid Ghods (Inactive) [ 03/Dec/12 ]
this is not a blocker bug because it does not destroy any data.

we can add this to documentation that resetting the index path will restart the couchbase server
Comment by Aliaksey Artamonau [ 03/Dec/12 ]
It's not really because of index path change. The problem is that we introduced a regression that would kill memcached port (not memcached itself) only after 60 seconds of wait. In some scenarios this could cause a data loss. For instance, if someone shut couchbase server down and then reboots the machine. On the moment of reboot there can still be memcached process alive writing something to databases.
Comment by Farshid Ghods (Inactive) [ 04/Dec/12 ]
Alaiksey ,

can you confirm the expected behavior ( after your fix ) :
1- should couchbase server itself restart ?
2- should memcached restart ?
3- does this restart mccouch ?
4- current index files are wiped out or kept as it is?
5- what happens to the ddoc definitions ? do they get copied over from original to the new location
6- does this API change the index path for all nodes in the cluster or is this per node ?
Comment by Aliaksey Artamonau [ 04/Dec/12 ]
1-3. Yes, to apply path changes ns_server restarts itself entirely including memcached and mccouch.
4. Current index files are kept intact.
5. Design document definitions are stored in master database that is stored together with other databases (i.e. in the database directory).
6. The API is per node.
Comment by Steve Yen [ 04/Dec/12 ]
http://review.couchbase.org/#/c/23020/
Comment by Steve Yen [ 04/Dec/12 ]
moved to 2.0.1 per bug-scrub.
Comment by Andrei Baranouski [ 05/Dec/12 ]
build 1974, centos 5.7

observation when change index path:
couchbase restarts, bucket was deleted

Couchbase Server has started on web port 8091 on node 'ns_1@127.0.0.1'. menelaus_sup001 ns_1@127.0.0.1 16:11:24 - Wed Dec 5, 2012
I'm the only node, so I'm the master. mb_master000 ns_1@127.0.0.1 16:11:24 - Wed Dec 5, 2012
Shutting down bucket "default" on 'ns_1@127.0.0.1' for deletion ns_memcached002 ns_1@127.0.0.1 16:11:16 - Wed Dec 5, 2012
Setting database directory path to /opt/couchbase/var/lib/couchbase/data and index directory path to /tmp ns_storage_conf000 ns_1@127.0.0.1 16:11:16 - Wed Dec 5, 2012
Bucket "default" loaded on node 'ns_1@127.0.0.1' in 0 seconds. ns_memcached001 ns_1@127.0.0.1 16:10:01 - Wed Dec 5, 2012
Comment by Aliaksey Artamonau [ 05/Dec/12 ]
Bucket should not be deleted when only index path is changed. I cannot reproduce it on my system. Could you please attach logs?
Comment by Farshid Ghods (Inactive) [ 05/Dec/12 ]
Ketaki,

please reproduce and update logs or pass the cluster to Aliaksey A.
Comment by Ketaki Gangal [ 05/Dec/12 ]
Hi Aliaksey,

I can repro this every time on my tests.

- Create a 3 node cluster with 2 buckets.
-Load 10k items.
- Create 1 view
- Change index path : curl -i -v --data "index_path=/data" "http://Administrator:password@10.1.3.176:8091/nodes/self/controller/settings"

Choosing the index path change on the *master node above.
 
- Post index path change, no data /bucket on the cluster.
- ls -a on nodes
shows empty @indexes file and empty data dir.

[root@grape-003 couchbase]# cd data/
[root@grape-003 data]# ls
@indexes isasl.pw ns_log _replicator.couch.1 _users.couch.1
Comment by Ketaki Gangal [ 05/Dec/12 ]
Adding logs here.
Comment by Ketaki Gangal [ 05/Dec/12 ]
Opened another bug to track the behaviour. http://www.couchbase.com/issues/browse/MB-7368

Not seeing above on the current testing.
Comment by Aliaksey Artamonau [ 05/Dec/12 ]
We found that the issue was that we didn't wait for memcached termination correctly. Then ns_server would start memcached again while the previous instance was still shutting down. Probably because it's windows, no eaddrinuse errors were reported. ns_server was just unable to connect to memcached. When old memcached instance finally died, node returned to a good state.
Comment by Aliaksey Artamonau [ 05/Dec/12 ]
fix merged
Generated at Wed Aug 27 11:17:44 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.