[MB-7341] [2.0-hotfix candidate] Offline upgrade hangs at cbtransfer converting sqlite file to couchstorefile Created: 03/Dec/12  Updated: 28/Jan/13  Resolved: 18/Dec/12

Status: Resolved
Project: Couchbase Server
Component/s: installer
Affects Version/s: 2.0
Fix Version/s: 2.0.1
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Chisheng Hong (Inactive) Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: 2.0-hotfix, 2.0-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Linux version 2.6.18-308.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52))

Issue Links:
Duplicate

 Description   
Have a 4 nodes cluster (http://172.23.96.12:8091). 1 default buckets 40G RAM on each node for the bucket. 200M items (920 G) in bucket. Resident is as low as 8.3%.
Try to do an offline upgrade from 181 to 2.0 build 1971:

[root@thor05 ~]# rpm -U couchbase-server-enterprise_x86_64_2.0.0-1971-rel.rpm
Stopping couchbase-server ...
Stopping couchbase-server
Upgrading couchbase-server ...
  /opt/couchbase/bin/install/cbupgrade -c /opt/couchbase/var/lib/couchbase/config -a yes
Automatic mode: running without interactive questions or confirmations.
Upgrading your Couchbase Server to 2.0.0-1971-rel.
The upgrade process might take awhile.
Analysing...
Previous config.dat file is /opt/couchbase/var/lib/couchbase/config/config.dat
Target node: ns_1@172.23.96.14

Couchbase should not be running.
  Please use: /etc/init.d/couchbase-server stop

Database dir: /data2

Buckets to upgrade: default

Checking disk space available for buckets in directory:
  /data2
  Free disk bucket space wanted: 494982651904.0
  Free disk bucket space available: 570048225280
  Free disk space factor: 2.0
  Ok.

Analysis complete.

Copying /opt/couchbase/var/lib/couchbase/config/config.dat
    cp /opt/couchbase/var/lib/couchbase/config/config.dat /opt/couchbase/bin/install/../../var/lib/couchbase/config/config.dat
Copying /opt/couchbase/var/lib/couchbase/ip
    cp /opt/couchbase/var/lib/couchbase/ip /opt/couchbase/bin/install/../../var/lib/couchbase/ip
Ensuring bucket data directories.
Ensuring bucket data directory: /data2/default
    mkdir -p /data2/default
Ensuring dbdir owner/group: /data2
    chown -R couchbase:couchbase /data2
Ensuring dbdir owner/group: /opt/couchbase/var/lib/couchbase/data
    chown -R couchbase:couchbase /opt/couchbase/var/lib/couchbase/data
Upgrading buckets.
Upgrading bucket: default
    /opt/couchbase/bin/install/../cbtransfer /data2/default-data/default couchstore-files:///data2 -b default --source-vbucket-state=active --destination-vbucket-state=active
cbdbupgrade pid: 14416
  [ ] 0.0% (0/49997137 msgs)



Offline upgrade hangs

 Comments   
Comment by Steve Yen [ 04/Dec/12 ]
moved to 2.0.1 per bug-scrub.

And, also recommend users do online upgrade, especially on huge datasizes and highly fragmented sqlite.
Comment by Bin Cui [ 04/Dec/12 ]
http://review.couchbase.org/#/c/23073/

Further observation:
1. cbtransfter was not stucked but just slow to retrieve items from sqlite files. Since cbtransfer will retrieve items vbucket by vbucket and in reverse vbucket id order, and all vbucket 512 to 937 are all replica vbuckets, it will retrieve none of active items. And it is quite time consuming to execute sql statement against fragmented sqlite files.
2. We can optimize sqlite access process to simply skip the whole kv table based on vbucket id and vbucket state. If we want to retrieve active items, we can easily jump to the first table that contains active items.

Comment by Bin Cui [ 04/Dec/12 ]
http://review.couchbase.org/#/c/23073/

Further observation:
1. cbtransfter was not stucked but just slow to retrieve items from sqlite files. Since cbtransfer will retrieve items vbucket by vbucket and in reverse vbucket id order, and all vbucket 512 to 937 are all replica vbuckets, it will retrieve none of active items. And it is quite time consuming to execute sql statement against fragmented sqlite files.
2. We can optimize sqlite access process to simply skip the whole kv table based on vbucket id and vbucket state. If we want to retrieve active items, we can easily jump to the first table that contains active items.

Comment by kzeller [ 05/Dec/12 ]
Added to RN as: Performing an upgrade from Couchbase 1.8.1 to Couchbase 2.0.0
        GA build freezes while transfering data from SQLite files
        to Couchstore files using <command>cbtransfer</command>.
Comment by Steve Yen [ 05/Dec/12 ]
Hmmm, looks like I put the wrong commit msg on this change, but it went in anyways...

  just a warning msg that cbupgrade could take awhile, asked by Farshid: http://review.couchbase.org/23112
Comment by Bin Cui [ 05/Dec/12 ]
One one setup to upgrade 50m items, it takes:

[root@thor08 data]# time /opt/couchbase/bin/install/../cbtransfer /data/thor05/default-data/default couchstore-files:///data/test
da default --source-vbucket-state=active --destination-vbucket-state=active
  [####### ] 33.6% (16797698/49997137 msgs))
  [####################] 100.0% (49997137/49997137 msgs)
bucket: default, msgs transferred...
       : total | last | per sec
 batch : 294023 | 294023 | 9.9
 byte : 117679410715 | 117679410715 | 3971062.9
 msg : 49997137 | 49997137 | 1687.1
done

real 493m54.341s
user 177m43.942s
sys 30m50.346s
Comment by Phil Labee [ 21/Dec/12 ]
Attached hotfix zipfile: couchbase-server-2.0.0-HOTFIX-MB-7341.zip
Generated at Mon Jul 28 11:27:18 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.