Best way to move couchbase community server v2.2.0 on windows 2008 R2 to v5.x linux Centos7

Hail, experts!

I have a 4-node cluster of CouchBase 2.2.0 community edition (build-837) running on Windows 8 R2.

Due to high CPU usage for half of the nodes, I would like to migrate to linux and see if I can bring down CPU utilization (or at least balance it more evenly across nodes in the cluster).

After combing thru the docs, I speculate that the correct upgrade/migration path from windows --> linux and from 2.2.0 --> 5.1 is to do an online swap-rebalance upgrade twice, as follows:

  1. 2.2.0 (windows) --> 4.6 (windows)
  • *Do I need to upgrade to 2.5.x first? *
  1. 4.6 (windows) --> 5.1 (Linux)
  • Can I switch from windows to linux in this step?
  • What extra planning or preparation is required?

However, the docs are unclear about the details and questions posed above:

The upgrade matrix for the current version of couchbase server community edition (5.1) suggests that you can upgrade from 2.5.x to any 4.x version using an online swap-rebalance upgrade or a standard online upgrade, followed by an upgrade to 5.x, but that for this last step “Services require more planning during an upgrade.” So far I have not found any indication of what that additional planning could be.

But I couldn’t find a windows version of 2.5.x to install. A little digging found this page suggesting that 2.2.0 is the latest 2.x version of the community edition, and 2.5.x is the latest 2.x version of the enterprise edition, so perhaps the docs above were written for enterprise edition, and perhaps an upgrade from 2.2.0 to any 4.x version is possible.

However, further reading about 2.2.0 community edition reveals that

It is not possible either to mix operating systems within the same cluster, or configure XDCR between clusters on different platforms. You should use same operating system on all machines within a cluster and on the same operating systems on multiple clusters if you perform XDCR between the clusters

I have not found that same restriction in any of the more recent versions of the manual (I looked at 2.5.x, 3.0.1, 3.1, 4.6, and v 5.1 (the current version)).

So again, in summary, my questions are:

  1. do I need to upgrade from 2.2.0 to 2.5.x on windows, somehow
  2. Can I run a mixed-platform cluster in version 4 and in version 5?
  3. What are the proper “extra planning” things to consider when upgrading from 4.6 (windows) to 5.1 (linux)?

You don’t /need/ to; but it’s generally better to update to the latest minor release before upgrading; as that’s sometimes necessary to be able to online-upgrade the major version.

However one problem you’re doing to encounter is the replication protocol changed between v2 and v3 - from TAP to DCP - and TAP support was removed in 5.0 (IIRC). As such, you’ll have to upgrade to something between v3 and v4 before you can online upgrade to 5.0

I generally wouldn’t suggest running with a mixed (Windows / Linux) cluster - afiak it’s an untested configuration, and certainly not supported EE deployment.

I think your best bet would be to setup a separate 5.x Linux cluster; and then XDCR the data across from the existing 2.2 Windows cluster. While this has the downside of not being 100% transparent to your application (you’ll have to flip the app over to talk to the new cluster); I think if you can accept that requirement it’ll be much more straightforward then performing an online upgrade across 3 major versions.

Hi Dave,

Thanks for taking time to respond. Your proposed solution:

is music to my ears, as it seems like a much easier course than the one I had charted!

Since my customer’s experience hangs in the balance (and not because I doubt you!), I’d like to take a minute to just have you confirm directly that the warning in the 2.2.0 manual against XDCR from 2.2.0 windows to linux can safely be ignored if I use linux 5.1. It was pretty strong language:

I’m crossing my fingers that you didn’t overlook this when you made your recommendation – hopefully you know something about 5.1 that wasn’t knowable when the 2.2.0 manual was written, that makes your proposal safe!

Thanks again, and thanks in advance for taking time to think through my goals and evaluate my plans and help make my journey better and my customer’s experience a good one!

You should obviously test the proposed migration before deploying it in production, but I believe XDCR’ing from a Windows cluster to Linux should work.

Thanks, Dave!

Using XDCR to sync the linux cluster to the windows cluster gives me an idea.

I’m using internal DNS, and have CNAME’s in my code configuration to refer to the nodes of the cluster. So once I’ve got the windows cluster replicated to the linux cluster, my thought was to just switch the CNAMEs to point to the nodes of the linux cluster instead of the windows cluster.

If I do that, do I need to restart the app to get the SDK to look up the new IP addresses, or is there a way to tell the SDK to flush its cache and issue a new DNS lookup of the nodes in the cluster?

Thanks to all those who provided input on this post.

I discovered that I could NOT join a cluster running v2.2 with an instance of server v4.5 nor v5.1.

However, XDCR from v2.2 windows to v4.5.1 linux worked great. After that, all I had to do was the DNS change an voila!

To upgrade to v5.1, I took each node of the cluster down one at a time and upgraded from v4.5 to v5.1 w/o removing the prior version nor its config. This put v5.1 in “v4.5 compatibility mode” and the v1.4 java client (and 1.x .NET client) were then able to connect to v5.1, since it was running as though it were v4.5.

I decided that there was marginal value in running v5.1 as though it were v4.5, and since my approach to upgrading to v5.1 required down time to keep it in compatibility mode, I decided to not to do the final upgrade.

So in the end, the steps to success were:

  1. update the code to use a DNS CNAME in the connect string to the v2.2 windows server
  2. setup v2.2 (windows) ----- XDCR ----> v4.5 (linux)
  3. change DNS cname from v2.2 (windows) server to v4.5 (linux) server
  4. restart the app.

thanks again to all who helped!