[MB-12750] XDCR ns_server Integration -- ns_server side Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Yu Sui Assignee: Yu Sui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 32h
Time Spent: Not Specified
Original Estimate: 32h

Epic Link: XDCR next release

 Description   
New XDCR needs to be integrated with ns_server. This work item tracks the work that needs to do done on the ns_server side. Design doc is at follows:
https://drive.google.com/open?id=1ZbAtUWOZgBW1MXBDwQVHkuJDdKIotgFNjS7H6kZ_Nrs&authuser=0

The design doc for the work on the XDCR side is at follows:
https://drive.google.com/open?id=1TCmoBQQWiwn8qIvo2NDm-ySJfmo5JT9Zp7kvAghQz3c&authuser=0

Follows are the work items on the XDCR side:
https://www.couchbase.com/issues/browse/MB-12586
https://www.couchbase.com/issues/browse/MB-12720
https://www.couchbase.com/issues/browse/MB-12022




[MB-12748] Add support to get rev id information via sdk Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: clients
Affects Version/s: 3.0.1, 3.0, 3.0.2
Fix Version/s: sherlock
Security Level: Public

Type: Improvement Priority: Major
Reporter: Parag Agarwal Assignee: Matt Ingenthron
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
This is currently not supported

Add support to get rev id information via sdk

This is of great use to the test team since we will be integrating python sdk to testrunner. I am sure support can use it for debugging issues. Example: revid was getting reset to 1 with Beats issue.




[MB-12747] Support multi_get and multi_set APIs in ForestDB Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: feature-backlog
Fix Version/s: feature-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Chiyoung Seo Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
multi_get and multi_set APIs would be quite useful if an application wants to perform the I/O operation in a batched manner. To make multi_get and multi_set APIs more efficient, we will consider using asynchronous I/O library (e.g., libaio) to provide a better performance and disk utilization on SSDs or flash memory on mobile.




[MB-12746] Optimize fdb_iterator_seek API to avoid scanning a doc at a time Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: techdebt-backlog
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Chiyoung Seo Assignee: Chiyoung Seo
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
fdb_iterator_seek API currently scans a single doc at a time to seek a given key. This will cause significant overhead in case of a large iteration range. To avoid this, we need to improve the seek API, so that it can leverage the main index's search routines.




[MB-12745] memcached segfaults with jemalloc on a typical gnu/linux distribution Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: memcached
Affects Version/s: sherlock
Fix Version/s: sherlock
Security Level: Public

Type: Bug Priority: Major
Reporter: Aliaksey Artamonau Assignee: Dave Rigby
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Checked on my ArchLinux and on Debian GNU/Linux.

System jemalloc library doesn't define je_ prefixed symbols:

# objdump -T /usr/lib64/libjemalloc.so.1 | grep je_
<empty output>

Instead it uses defines like this:

# define je_malloc malloc

So malloc ends up calling itself in a loop:

(gdb) bt
#0 0x000000000040965d in malloc (size=<error reading variable: Cannot access memory at address 0x7fff9fb41ff8>)
    at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:77
#1 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#2 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#3 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#4 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#5 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#6 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#7 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#8 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#9 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#10 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#11 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#12 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#13 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78
#14 0x000000000040966d in malloc (size=176) at /home/aa/dev/membase/repo30/memcached/daemon/alloc_hooks.c:78


 Comments   
Comment by Dave Rigby [ 21/Nov/14 ]
So I deliberately made our use of jemalloc use prefixed symbols. I can't really think of a straightforward way to suport both at runtime - the expectation is that we will use our own build of jemalloc and not a system one - particularly as we currently require a change which isn't yet in a released version of jemalloc (https://github.com/jemalloc/jemalloc/commit/e3a16fce5eb0c62a49e751f156d040c9f77fbc23).

The simplest fix for this is to use our build of jemalloc from cbdeps (http://hub.internal.couchbase.com/confluence/display/CR/Third-party+Dependencies). It isn't yet enabled by default (I need to speak to Ceej about it), but you can enable it by adding EXTRA_CMAKE_OPTIONS=-DCB_DOWNLOAD_DEPS=1 to your make arguments.

Alternatively you can revert to using TCMalloc by passing EXTRA_CMAKE_OPTIONS=-DCOUCHBASE_MEMORY_ALLOCATOR=tcmalloc.
Comment by Dave Rigby [ 21/Nov/14 ]
Thinking about this a little more, I can probably make us fail to compile if a non-prefixed jemalloc is used, which is probably preferable to a non-working build. I'll add something next week.




[MB-12744] n1ql stats: when run insert requests 'updates.Count' is increased, 'inserts.Count' stay 0 Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-alpha
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Colm Mchugh
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 2h
Time Spent: Not Specified
Original Estimate: 2h

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
and also updates are counted as 'inserts.Count'
insert query is:
cbq> insert into b0 key 'k2' values 1;
{
    "request_id": "cb26a9dd-8e57-49aa-9635-8ff3b7f988e4",
    "signature": null,
    "results": [
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "2.778ms",
        "executionTime": "2.301ms",
        "resultCount": 0,
        "resultSize": 0
    }
}

stats are:
{
"updates.Count": 2,
"inserts.Count": 0,
"requests.Count": 6,
"memstats": {"Alloc":10661184,"TotalAlloc":133655136,"Sys":302942264,"Lookups":51062,"Mallocs":1901836,"Frees":1882520,"HeapAlloc":10661184,"HeapSys":27918336,"HeapIdle":15605760,"HeapInuse":12312576,"HeapReleased":9347072,"HeapObjects":19316,"StackInuse":1236992,"StackSys":1441792,"MSpanInuse":101712,"MSpanSys":524288,"MCacheInuse":2024,"MCacheSys":131072,"BuckHashSys":1439992,"NextGC":19441520,"LastGC":1416595708190486000,"PauseTotalNs":298433000,"PauseNs":[413000,3844000,7253000,6790000,4462000,6711000,6772000,5758000,4866000,6773000,6657000,7297000,7175000,7085000,7267000,7269000,7158000,7225000,7336000,7242000,10748000,15459000,19209000,18442000,19851000,20166000,13960000,5744000,7038000,5456000,9660000,7844000,10689000,8814000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"NumGC":34,"EnableGC":true,"DebugGC":false,"BySize":[{"Size":0,"Mallocs":0,"Frees":0},{"Size":8,"Mallocs":598628,"Frees":597434},{"Size":16,"Mallocs":397896,"Frees":396805},{"Size":32,"Mallocs":477517,"Frees":464386},{"Size":48,"Mallocs":19317,"Frees":18439},{"Size":64,"Mallocs":44319,"Frees":43603},{"Size":80,"Mallocs":4490,"Frees":4172},{"Size":96,"Mallocs":78852,"Frees":78660},{"Size":112,"Mallocs":19284,"Frees":19114},{"Size":128,"Mallocs":11706,"Frees":11629},{"Size":144,"Mallocs":4621,"Frees":4578},{"Size":160,"Mallocs":381,"Frees":208},{"Size":176,"Mallocs":96430,"Frees":96368},{"Size":192,"Mallocs":62814,"Frees":62553},{"Size":208,"Mallocs":10297,"Frees":10214},{"Size":224,"Mallocs":256,"Frees":232},{"Size":240,"Mallocs":9,"Frees":7},{"Size":256,"Mallocs":20738,"Frees":20402},{"Size":288,"Mallocs":28881,"Frees":28818},{"Size":320,"Mallocs":896,"Frees":887},{"Size":352,"Mallocs":423,"Frees":392},{"Size":384,"Mallocs":2923,"Frees":2901},{"Size":448,"Mallocs":8765,"Frees":8725},{"Size":512,"Mallocs":4471,"Frees":4431},{"Size":576,"Mallocs":4161,"Frees":4151},{"Size":640,"Mallocs":15,"Frees":14},{"Size":704,"Mallocs":378,"Frees":341},{"Size":768,"Mallocs":5,"Frees":5},{"Size":832,"Mallocs":12,"Frees":10},{"Size":1024,"Mallocs":408,"Frees":362},{"Size":1152,"Mallocs":26,"Frees":17},{"Size":1280,"Mallocs":3,"Frees":2},{"Size":1408,"Mallocs":182,"Frees":178},{"Size":1536,"Mallocs":343,"Frees":341},{"Size":1664,"Mallocs":17,"Frees":11},{"Size":2048,"Mallocs":144,"Frees":72},{"Size":2304,"Mallocs":167,"Frees":161},{"Size":2560,"Mallocs":7,"Frees":6},{"Size":3072,"Mallocs":0,"Frees":0},{"Size":3328,"Mallocs":7,"Frees":4},{"Size":4096,"Mallocs":737,"Frees":678},{"Size":4352,"Mallocs":5,"Frees":3},{"Size":4608,"Mallocs":2,"Frees":2},{"Size":5120,"Mallocs":155,"Frees":153},{"Size":6144,"Mallocs":38,"Frees":38},{"Size":6656,"Mallocs":17,"Frees":15},{"Size":6912,"Mallocs":0,"Frees":0},{"Size":8192,"Mallocs":274,"Frees":272},{"Size":8704,"Mallocs":2,"Frees":1},{"Size":10240,"Mallocs":5,"Frees":0},{"Size":10496,"Mallocs":0,"Frees":0},{"Size":12288,"Mallocs":157,"Frees":157},{"Size":14080,"Mallocs":4,"Frees":4},{"Size":16384,"Mallocs":93,"Frees":92},{"Size":17664,"Mallocs":249,"Frees":194},{"Size":20480,"Mallocs":4,"Frees":4},{"Size":21248,"Mallocs":0,"Frees":0},{"Size":24576,"Mallocs":15,"Frees":13},{"Size":24832,"Mallocs":0,"Frees":0},{"Size":28672,"Mallocs":165,"Frees":153},{"Size":32768,"Mallocs":62,"Frees":55}]},
"selects.Count": 4,
"active_requests.Count": 0,
"service_time.Count": 1229260000,
"result_count.Count": 4063,
"warnings.Count": 0,
"deletes.Count": 0,
"queued_requests.Count": 0,
"mutations.Count": 0,
"time": "2014-11-21T22:49:11.156126+04:00",
"cmdline": ["./cbq-engine","-datastore=http://172.27.33.17:8091"],
"request_time.Count": 1232249000,
"errors.Count": 1,
"result_size.Count": 1736823
}




[MB-12743] unable to start cbq-engine with timeout option Created: 21/Nov/14  Updated: 21/Nov/14  Resolved: 21/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
[root@kiwi-r116 cbq-engine]# ./cbq-engine -datastore=http://172.27.33.17:8091 -timeout=1
invalid value "1" for flag -timeout: time: missing unit in duration 1
Usage of ./cbq-engine:
  -acctstore="gometrics:": Accounting store address (http://URL or stub:)
  -certfile="": HTTPS certificate file
  -configstore="stub:": Configuration store address (http://URL or stub:)
  -datastore="": Datastore address (http://URL or dir:PATH or mock:)
  -debug=false: Debug mode
  -http=":8093": HTTP service address
  -https=":18093": HTTPS service address
  -keyfile="": HTTPS private key file
  -logger="": Logger implementation
  -metrics=true: Whether to provide metrics
  -mutation-limit=0: Maximum LIMIT for data modification statements; use zero or negative value to disable
  -namespace="default": Default namespace
  -order-limit=0: Maximum LIMIT for ORDER BY clauses; use zero or negative value to disable
  -readonly=false: Read-only mode
  -request-cap=262144: Maximum number of queued requests
  -signature=true: Whether to provide signature
  -threads=256: Thread count
  -timeout=0: Server execution timeout; use zero or negative value to disable
[root@kiwi-r116 cbq-engine]#


 Comments   
Comment by Gerald Sangudi [ 21/Nov/14 ]
Added examples to -timeout usage message.

Use -timeout=3s or -timeout=650ms etc.




[MB-12742] queries with both ORDER BY and LIMIT are always empty Created: 21/Nov/14  Updated: 22/Nov/14  Resolved: 21/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: 4h
Time Spent: Not Specified
Original Estimate: 4h

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
'name' attribute is in all items
query without order by shows some result
cbq> select name from default limit 3;
{
    "request_id": "0e68ff8f-14a9-494e-bd12-60ce9bd9ad27",
    "signature": {
        "name": "json"
    },
    "results": [
        {
            "name": "employee-13"
        },
        {
            "name": "employee-22"
        },
        {
            "name": "employee-10"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "318.763ms",
        "executionTime": "318.633ms",
        "resultCount": 3,
        "resultSize": 135
    }
}

cbq>


but when i add order by :
cbq> select name from default order by name limit 3;
{
    "request_id": "82d27f15-953e-4e4b-814b-e042178ac26d",
    "signature": {
        "name": "json"
    },
    "results": [
        {},
        {},
        {}
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "838.88ms",
        "executionTime": "748.823ms",
        "resultCount": 3,
        "resultSize": 6
    }
}

cbq>


query with only order by works fine
 select name from default order by name;
"request_id": "e8e12cae-00a8-401e-8a24-6acbf2d75bcc",
    "signature": {
        "name": "json"
    },
    "results": [

<some_results>

 {
            "name": "employee-9"
        },
        {
            "name": "employee-9"
        },
        {
            "name": "employee-9"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "1.087013s",
        "executionTime": "1.086893s",
        "resultCount": 4061,
        "resultSize": 180202
    }
}



 Comments   
Comment by Gerald Sangudi [ 21/Nov/14 ]
Please post:

explain select name from default order by name limit 3;

Thanks.
Comment by Iryna Mironava [ 21/Nov/14 ]
cbq> explain select name from default order by name limit 3;
{
    "request_id": "33e86bd7-7651-4b7b-9928-8d5520353dcb",
    "signature": "json",
    "results": [
        {
            "#operator": "Sequence",
            "~children": [
                {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "PrimaryScan",
                            "index": "#primary"
                        },
                        {
                            "#operator": "Parallel",
                            "~child": {
                                "#operator": "Sequence",
                                "~children": [
                                    {
                                        "#operator": "Fetch",
                                        "keyspace": "default",
                                        "namespace": "default"
                                    },
                                    {
                                        "#operator": "InitialProject",
                                        "result_terms": [
                                            {
                                                "expr": "(`default`.`name`)"
                                            }
                                        ]
                                    }
                                ]
                            }
                        }
                    ]
                },
                {
                    "#operator": "Order",
                    "sort_terms": [
                        {
                            "expr": "(`default`.`name`)"
                        }
                    ]
                },
                {
                    "Expr": "3",
                    "Type": "limit"
                },
                {
                    "#operator": "Parallel",
                    "~child": {
                        "#operator": "FinalProject"
                    }
                }
            ]
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "3.336ms",
        "executionTime": "3.179ms",
        "resultCount": 1,
        "resultSize": 1918
    }
}

cbq>
Comment by Gerald Sangudi [ 21/Nov/14 ]
The problem is that MISSING sorts as the smallest value. So document with name=MISSING will show first, and appear as empty objects.

Try ORDER BY name desc




[MB-12741] Docs NRU boolean incorrect Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Ian McCloy Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
http://docs.couchbase.com/admin/admin/CLI/CBepctl/cbepctl-thresholds.html

"The server determines that items are not frequently used based on a not-recently-used (NRU) boolean. "

Boolean would imply that NRU is either true or false.
This is incorrect, NRU is a score, it can be 0,1,2 etc.. it's not a boolean value.




[MB-12740] Improve autofailover for replica counts over 1 Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1, 3.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
If all buckets in a cluster have more than 1 replica, we should be able to automatically sustain the loss of more than one node. I agree we still don't want to do anything if multiple nodes fail at one time, but if one node fails and is automatically failed over, a second node failure (or third) should also be automatically failed over if there are enough replicas.

We likely also want to add a setting for the cluster to limit the max number of nodes (extending the concept of autofailover quota we currently have).




[MB-12739] Improve Auto-failover for RZA Created: 21/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1, 3.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
If an entire zone fails, Couchbase should be able to automatically failover the entire group.

This should have similar split-brain precautions as our individual node autofailover in the sense that we only support autofailover of an entire group if there are 3 or more groups configured and so long as only one group appears down at a time.

 Comments   
Comment by Perry Krug [ 21/Nov/14 ]
The improvement of mb-12740 should also extend to multiple zones failing in succession when >1 replicas are configured




[MB-12738] Checkpoints are always purged if there are no cursors in them Created: 20/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: sherlock
Fix Version/s: sherlock
Security Level: Public

Type: Task Priority: Major
Reporter: Mike Wiederhold Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
We are aggressively removing checkpoints from memory. In this case I loaded 100k items into Couchbase, waited, and observed that there was only one item in each checkpoint manager. We should keep checkpoints in memory if we have space.

Mikes-MacBook-Pro:ep-engine mikewied$ management/cbstats 10.5.2.34:12000 checkpoint | grep num_checkpoint_items | cut -c 44- | awk '{s+=$1} END {print s}'
1024




[MB-12736] TRACE log level should log EXPLAIN for every request Created: 20/Nov/14  Updated: 20/Nov/14  Due: 08/Dec/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-alpha
Fix Version/s: cbq-alpha
Security Level: Public

Type: Improvement Priority: Major
Reporter: Gerald Sangudi Assignee: Colm Mchugh
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 4h
Time Spent: Not Specified
Original Estimate: 4h


 Description   
Useful for the support team.




[MB-12735] How to change the Admin password. Created: 20/Nov/14  Updated: 20/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Patrick Varley Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/admin/admin/UI/ui-settings-account-mgmt.html


 Description   
A customer commented that they could not find the documentation about changing the admin password.

We do have the page on cbreset_password but you kind of have to know that it is there:
http://docs.couchbase.com/admin/admin/CLI/cbreset_password_tool.html

Maybe we should have a more general page under security about changing the Admin password:
http://docs.couchbase.com/admin/admin/Concepts/security-intro.html

It would be good to get your input, I feel we do not really have a polished way of changing the password, maybe we have to wait until MB-12734 before we can improve the documentation.




[MB-12734] To be able to change the Administrator user password in the UI Created: 20/Nov/14  Updated: 20/Nov/14

Status: Open
Project: Couchbase Server
Component/s: UI
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Patrick Varley Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It would be good if the Administrator password could be changed in the UI.




[MB-12733] Design and implement go_cbq library Created: 20/Nov/14  Updated: 20/Nov/14  Due: 08/Dec/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-alpha
Fix Version/s: cbq-alpha
Security Level: Public

Type: Epic Priority: Critical
Reporter: Gerald Sangudi Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 40h
Time Spent: Not Specified
Original Estimate: 40h

Epic Name: go_cbq
Epic Status: To Do

 Description   
Client/go_cbq

This package provides a client library that will be used by the command-line shell to encapsulate cluster-awareness and other connectivity concerns.

The library will implement the standard golang database APIs at database/sql and database/sql/driver.

The library will connect using the Query REST API and the Query Clustering API.




[MB-12732] Doc: Timing stats does not have any information Created: 20/Nov/14  Updated: 20/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Patrick Varley Assignee: Ruth Harris
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Maybe it is worth copying the table from the "getting server timings" page into "Timing stats" page so it is like the other "[name] stats " pages.

"Getting server timings":
http://docs.couchbase.com/admin/admin/CLI/CBstats/cbstats-gettingservertiming.html

"Timing stats":
http://docs.couchbase.com/admin/admin/CLI/CBstats/cbstats-timing.html




[MB-12731] MB-12655 should be documented in the known issues for 3.0.1 Created: 20/Nov/14  Updated: 21/Nov/14  Resolved: 21/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1
Fix Version/s: 3.0.2
Security Level: Public

Type: Task Priority: Major
Reporter: Patrick Varley Assignee: marija jovanovic
Resolution: Fixed Votes: 0
Labels: customer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-12655 config replication of certain per-nod... Resolved

 Description   
Not sure if this is the correct process for getting something into the release notes. MB-12655 has the details about the issues and the work around.

 Comments   
Comment by marija jovanovic [ 20/Nov/14 ]
Patrick,
A release notes item for the docs can have the same JIRA ticket as the one resolved by Dev, just that it has to be additionally assigned as a "release notes item".
It seems that JIRA here is not set up for the "release notes item" option. It has to have two variations: fixed and known issue. When the issue is added to release notes, a writer would turn an additional flag saying it was "documented".
That was the process I was following in my previous job.
The way you did it is also fine since it makes the writer an assignee for resolving/documenting the issue. However, since it has a different #, it's not exactly easy to connect it to the initial ticket.

Thanks,
Marija
Comment by marija jovanovic [ 21/Nov/14 ]
I have looked at MB-12655 which is marked as Fixed.
For the doc purposes, it seems that what we have to add to the Release Notes something like this:


"Replication of certain per-node keys is broken after a node is first upgraded offline from 2.x to 3.0 and then added back to the formerly 2.x cluster that is now upgraded to 3.0.

Workaround: When performing an online upgrade from Couchbase Server 2.x to 3.x, always fully delete the 2.x package (including the config files) before installing the 3.x package."

Please confirm.
Comment by Patrick Varley [ 21/Nov/14 ]
That sounds good! thank you
Comment by marija jovanovic [ 21/Nov/14 ]
Added the explanation for the upgrade and workaround.
Closing the bug since it was verified.
Comment by marija jovanovic [ 21/Nov/14 ]
closing the bug since it was verified.




[MB-12730] [Windows 2012]Rebalance failed with node_down error during system tests Created: 20/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Sangharsh Agarwal Assignee: Sangharsh Agarwal
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.2-1531-rel

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
[1;31m[2014-11-17 05:33:04,107: ERROR/Worker-6] app.systest_manager.runPhase[None]: Running Phase: failover_one_and_rebalance_out (failover_one_and_rebalance_out_at_source)

[1;31m[2014-11-17 05:35:15,086: ERROR/Worker-6] {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed
[2014-11-17 05:35:15,174: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.74', u'code': 4, u'text': u"Node 'ns_1@172.23.107.74' saw that node 'ns_1@172.23.107.68' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-11-17T05:34:50.686Z', u'module': u'ns_node_disco', u'tstamp': 1416231290686, u'type': u'info'}
[2014-11-17 05:35:15,174: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.72', u'code': 4, u'text': u"Node 'ns_1@172.23.107.72' saw that node 'ns_1@172.23.107.68' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-11-17T05:34:50.677Z', u'module': u'ns_node_disco', u'tstamp': 1416231290677, u'type': u'info'}
[2014-11-17 05:35:15,175: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.73', u'code': 4, u'text': u"Node 'ns_1@172.23.107.73' saw that node 'ns_1@172.23.107.68' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-11-17T05:34:50.674Z', u'module': u'ns_node_disco', u'tstamp': 1416231290674, u'type': u'info'}
[2014-11-17 05:35:15,175: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.71', u'code': 4, u'text': u"Node 'ns_1@172.23.107.71' saw that node 'ns_1@172.23.107.68' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-11-17T05:34:50.592Z', u'module': u'ns_node_disco', u'tstamp': 1416231290592, u'type': u'info'}
[2014-11-17 05:35:15,175: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.70', u'code': 4, u'text': u"Node 'ns_1@172.23.107.70' saw that node 'ns_1@172.23.107.68' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-11-17T05:34:50.535Z', u'module': u'ns_node_disco', u'tstamp': 1416231290535, u'type': u'info'}
[2014-11-17 05:35:15,175: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.69', u'code': 4, u'text': u"Node 'ns_1@172.23.107.69' saw that node 'ns_1@172.23.107.68' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-11-17T05:34:49.720Z', u'module': u'ns_node_disco', u'tstamp': 1416231289720, u'type': u'info'}
[2014-11-17 05:35:15,175: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.67', u'code': 4, u'text': u"Node 'ns_1@172.23.107.67' saw that node 'ns_1@172.23.107.68' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-11-17T05:34:49.685Z', u'module': u'ns_node_disco', u'tstamp': 1416231289685, u'type': u'info'}
[2014-11-17 05:35:15,175: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.72', u'code': 5, u'text': u"Node 'ns_1@172.23.107.72' saw that node 'ns_1@172.23.107.68' went down. Details: [{nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-11-17T05:34:30.194Z', u'module': u'ns_node_disco', u'tstamp': 1416231270194, u'type': u'warning'}
[2014-11-17 05:35:15,175: ERROR/Worker-6] {u'node': u'ns_1@172.23.107.73', u'code': 5, u'text': u"Node 'ns_1@172.23.107.73' saw that node 'ns_1@172.23.107.68' went down. Details: [{nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-11-17T05:34:30.175Z', u'module': u'ns_node_disco', u'tstamp': 1416231270175, u'type': u'warning'}

 Comments   
Comment by Sangharsh Agarwal [ 20/Nov/14 ]
Cluster is live http://172.23.107.67:8091/index.html#sec=log.


Alk, I am not sure if its test bug or product. Since same test code is working fine on Linux and this issue occurring always only on Windows. Can you please help to debug from ns_server point of view. I am looking at test code issue perspective too.
Comment by Aleksey Kondratenko [ 20/Nov/14 ]
Live cluster is nice, but logs is something I'll need first anyways.
Comment by Aleksey Kondratenko [ 20/Nov/14 ]
Collected logs via uploading facility:

https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.67.zip
https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.68.zip
https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.69.zip
https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.70.zip
https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.71.zip
https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.72.zip
https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.73.zip
https://s3.amazonaws.com/cb-customers/alk/12730/collectinfo-2014-11-20T200329-ns_1%40172.23.107.74.zip
Comment by Aliaksey Artamonau [ 20/Nov/14 ]
It seems to me that node .68 was simply restarted. There are no babysitter logs from that time to confirm it because logs were collected much later. But everything else points to that. If you hit this again it's very important that you collect logs right away.
Comment by Meenakshi Goel [ 21/Nov/14 ]
Seeing similar Rebalance failures on Win2012 with build 3.0.2-1542-rel. http://qa.hq.northscale.net/job/win-2012-view-create-P0/17/consoleFull
Please find logs below:
https://s3.amazonaws.com/bugdb/jira/MB-12730/1a293c5a/10.5.3.14-11202014-229-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/dfb9f010/10.5.3.15-11202014-232-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/2eff3dc3/10.5.3.16-11202014-233-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/2506e1ff/10.3.5.153-11202014-235-diag.zip
Comment by Sangharsh Agarwal [ 21/Nov/14 ]
Latest logs of system test:

http://qa.sc.couchbase.com/view/System%20tests/job/System-test_KV+XDCR-Windows-01-source%28windows%29/20/console

uploading logs..
Comment by Meenakshi Goel [ 21/Nov/14 ]
Please find system tests logs below:
https://s3.amazonaws.com/bugdb/jira/MB-12730/b2519932/172.23.107.67-11212014-251-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/5e93c685/172.23.107.68-11212014-258-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/fea99c7a/172.23.107.69-11212014-311-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/cec7a7a0/172.23.107.70-11212014-321-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/76dc0013/172.23.107.71-11212014-330-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/9f033a18/172.23.107.72-11212014-340-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/64e144d7/172.23.107.73-11212014-350-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12730/84b70d87/172.23.107.74-11212014-40-diag.zip
Comment by Aliaksey Artamonau [ 21/Nov/14 ]
In the last set of logs there definitely was a restart. So please fix your tests. The logs from 12:02AM PST look different though. There's no restart in there and the problem seems to be caused by some network issues. Please assign back to me if you can reproduce after fixing the tests. And don't forget to upload new logs.

Thanks.




[MB-12729] Replica index is not triggered even when partition sequence > replicaUpdateMinChanges set Created: 20/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Meenakshi Goel Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.2-1542-rel

Triage: Triaged
Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.sc.couchbase.com/job/centos_x64-02-01-viewquery2-P0/153/consoleFull

Test To Reproduce:
python testrunner.py -i myfile.ini -t view.viewquerytests.ViewQueryTests.test_employee_dataset_min_changes_check -p max-dupe-result-count=10,get-cbcollect-info=True,num-tries=60,attempt-num=60,get-delays=True

Steps to Reproduce:
1. Create views with updateMinChanges and replicaUpdateMinChanges option
2. Load data less that changes
3. Check index is not started
4. Load data more than changes
5.Check index is triggered

Uploading Logs


 Comments   
Comment by Meenakshi Goel [ 20/Nov/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12729/8586d8eb/172.23.106.61-11192014-2327-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12729/d9aca249/172.23.106.63-11192014-2328-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12729/439baa5e/172.23.106.62-11192014-2329-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12729/d42ed0d3/172.23.106.64-11192014-2329-diag.zip
Comment by Nimish Gupta [ 21/Nov/14 ]
This issue is not reproducible on my machine. I tried on aws machine also. I got different error on my dev setup and aws machine. Currently debugging on Meenakshi's setup using toy build.




[MB-12728] Autofailover fails when node is unreachable due to firewall restrictions Created: 19/Nov/14  Updated: 20/Nov/14  Resolved: 20/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Parag Agarwal Assignee: Parag Agarwal
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: [servers]
1:10.1.2.99
2:10.1.2.100
3:10.1.2.101
4:10.1.2.102
5:10.1.2.103

3.0.2-1542


Triage: Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12728/10.1.2.100-11192014-202-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12728/10.1.2.101-11192014-204-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12728/10.1.2.102-11192014-205-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12728/10.1.2.103-11192014-207-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12728/10.1.2.99-11192014-201-diag.zip
Is this a Regression?: Yes

 Description   

1. Create 5 Node cluster
2. Create default bucket with 1000000 items
3. Create firewall restriction on one node
4. Wait to auto failover the node

Node does not auto failover after 30 seconds wait. This is true for 60 seconds wait as well

Test Case::

./testrunner -i centos_x64--01_01--autofailover_upr.ini -t autofailovertests.AutoFailoverTests.test_30s_timeout_firewall,keys-count=1000000,skip_cleanup=True,GROUP=P0


 Comments   
Comment by Parag Agarwal [ 19/Nov/14 ]
Sorry! the test case does not work in skip clean up mode - coded differently from other tests. You might find clean up in the end.
Comment by Parag Agarwal [ 20/Nov/14 ]
Tested in another env, it passed.




[MB-12727] Best practices need to point to Vormetric site Created: 19/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1, 3.0
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Don Pinto Assignee: marija jovanovic
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
In http://docs.couchbase.com/admin/admin/Concepts/security-outside-server.html, "the best practices include protecting access to different data locations such as" needs to be rephrased as "the best practices include encrypting certain data locations using transparent data encryption technologies like Vormetric -"

Vormetric word should be linked to http://www.vormetric.com/data-security-solutions/use-cases/big-data-security




[MB-12726] cbbackupwrapper: hides the "mode" functionality present in cbbackup. Created: 19/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Ashvinder Singh Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: All OSes

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Found in build: 3.0.2-1548

cbbackupwrapper does not offer "mode" option to have incremental, differential backups present in 'cbbackup' tool. The cbackupwrapper has the following text:
Usage: cbbackupwrapper CLUSTER BACKUPDIR OPTIONS

Options:
  -h, --help show this help message and exit
  -b BUCKET_SOURCE, --bucket-source=BUCKET_SOURCE
                        Specify the bucket to backup. Defaults to all buckets
  --single-node use a single server node from the source only
  -u USERNAME, --username=USERNAME
                        REST username for source cluster or server node.
                        Default is Administrator
  -p PASSWORD, --password=PASSWORD
                        REST password for source cluster or server node.
                        Defaults to PASSWORD
  -v, --verbose Enable verbose messaging
  --path=PATH Specify the path to cbbackup. Defaults to current
                        directory
  --port=PORT Specify the bucket port. Defaults to 11210
  -n NUMBER, --number=NUMBER
                        Specify the number of vbuckets per process. Defaults
                        to 100
  -x EXTRA, --extra=EXTRA
                        Provide extra, uncommon config parameters;
                        comma-separated key=val(,key=val)* pairs

 Comments   
Comment by Bin Cui [ 19/Nov/14 ]
http://review.couchbase.org/#/c/43433/




[MB-12725] cbrestorewrapper throws error and exits without restoring Created: 19/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Ashvinder Singh Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: All OSes

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Found on build: 3.0.2-1548

Steps to reproduce:
- On multiple node cluster:
- Create data, example: /opt/couchbase/bin/cbworkloadgen -i 1000 --prefix=x1
- Take backup using "cbbackupwrapper",./cbbackupwrapper http://172.23.106.71:8091 -u Administrator -p password /tmp/b
- Delete buckets
- try restoring data using "cbrestorewrapper". cbrestore gives the following error:


>./cbrestorewrapper /tmp/b http://172.23.106.71:8091 -u Administrator -p password
>Error reading source backup vBuckets for bucket default






 Comments   
Comment by Bin Cui [ 19/Nov/14 ]
http://review.couchbase.org/#/c/43431/




[MB-12724] Couple of changes on cbbackup doc page Created: 19/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Brian Williams Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: cli
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Should "yool" be Tool ?

"For a list of standard and special-use options, see cbtransfer yool.”

Also can you make this reference to cbtransfer a hyperlink to the page about cbtransfer?

The page in question is

http://docs.couchbase.com/admin/admin/CLI/cbbackup_tool.html

 Comments   
Comment by Ruth Harris [ 19/Nov/14 ]
Fixed yool -> tool
There's already xref to cbtransfer. It's under Related Links.

In-line xrefs are not used as a general practice.

Note: Publishing to the Couchbase website occur within 24 hours.


Thanks, Ruth




[MB-12723] Add -keep-alive-length to query server Created: 19/Nov/14  Updated: 19/Nov/14  Due: 05/Dec/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-alpha
Security Level: Public

Type: Improvement Priority: Major
Reporter: Gerald Sangudi Assignee: Colm Mchugh
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 16h
Time Spent: Not Specified
Original Estimate: 16h


 Description   
To support HTTP keep-alive, we need to set the Content Length on the server response, for small result sizes.

For small result sizes, we need to buffer the entire response body, calculate its size in bytes, and set the Content-Length header before writing the response body.

We need a server parameter -keep-alive-length that defines the maximum size of a small result. After the result body / buffer exceeds that size, we skip setting the Content-Length. Instead, we flush the buffer to the response body, and continue writing the results to the response body as they come in.

The default value of -keep-alive-length should balance buffer memory usage vs. keep-alive / TCP connection reuse. Perhaps 16k. Because these buffers are fixed length, they should be managed and reused via sync/pool.




[MB-12722] Go-XDCR: "xdcrReplicationType" param unknown to xdcr rest clients Created: 19/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: sherlock
Fix Version/s: sherlock
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Epic Link: XDCR next release
Is this a Regression?: No

 Description   
***Based on conversation with Alk on ns_server not planning to parse new XDCR's rest api requests****

XDCR rest api uses "type" param as follows to specify capi/xmem. There is also a "replicationType" param that takes only one value "continuous". This is an obsolete and mandatory(! unfortunately) param. So we must be prepared to receive this param but it can be safely dropped.

To keep new xdcr's rest api consistent with old xdcr's, pls use 'type' param to specify capi or xmem.
Pls also add value checks. Currently xdcrReplicationType accepts any value.

Arunas-MacBook-Pro:sherlock apiravi$ curl -X POST http://localhost:12160/controller/createReplication -d fromBucket=default -d uuid=localhost:9001 -d toBucket=target -d xdcrReplicationType=abcd
id=xdcr_127.0.0.1%3A9000_default_localhost%3A9001_target

Reference: old xdcr api-
curl -v -X POST http://localhost:9000/controller/createReplication -d fromBucket=default -d toCluster=remote_ref -d toBucket=default -d replicationType=continuous -d type=capi

* if type is not specified, default(xmem is used).






[MB-12721] Go-XDCR: support "toCluster" in createReplication rest api, uuid is only optional Created: 19/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: sherlock
Fix Version/s: sherlock
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Epic Link: XDCR next release
Is this a Regression?: No

 Description   
Currently clients creating replication through REST use the following-
curl -v -X POST -u Administrator:welcome http://localhost:9000/controller/createReplication -d fromBucket=default -d toCluster=remote_cluster_ref -d toBucket=default -d replicationType=continuous -d type=capi

Pls note 'toCluster' accepts remote cluster reference. Alk told me that ns_server would not parse new XDCR rest calls but just pass them on to xdcr server. So it looks like the new xdcr code needs to obtain uuid based on the remote cluster reference.

Currently(in old xdcr) uuid has no significance in createReplication request. Even if present/missing/incorrect, toCluster is what is needed to create replication.
uuid if accepted, would need to look like uuid=26fdc9ccfc55db4a9bc27fb75ef568d7 not IP:port that we accept now. This argument can be safely dropped.




[MB-12720] XDCR@next release - Move Remote Cluster Service into XDCR Created: 19/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: feature-backlog
Fix Version/s: None
Security Level: Public

Type: Task Priority: Major
Reporter: Yu Sui Assignee: Yu Sui
Resolution: Unresolved Votes: 0
Labels: sprint4_xdcr
Remaining Estimate: 32h
Time Spent: Not Specified
Original Estimate: 32h

Epic Link: XDCR next release

 Description   
Need to move the remote cluster service, which sits in ns_server now, into xdcr so as to reduce dependency on ns_server.




[MB-12718] debian package does not contain third party licenses text file. p Created: 19/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: build
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Cihan Biyikoglu Assignee: Chris Hillery
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: wget http://packages.couchbase.com/releases/3.0.1/couchbase-serve
r-enterprise_3.0.1-debian7_amd64.deb

/opt/couchbase/*
does not contain thurd party licenses text file. only the license.txt is included.
pls include this in future.

Triage: Untriaged
Is this a Regression?: Unknown




[MB-12717] Suggest alternatives to _all_doc Created: 19/Nov/14  Updated: 20/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Patrick Varley Assignee: marija jovanovic
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It might be worth explaining what a user can use instead of _all_doc in 3.0.0. This has already come up in support and on IRC.

 Comments   
Comment by Amy Kurtzman [ 19/Nov/14 ]
It would definitely be worth explaining that. If you let us know what the alternatives to _all_docs is, we'd be happy to add it to the information. Perhaps you already have a statement you send out when customers inquire about it?
Comment by Amy Kurtzman [ 19/Nov/14 ]
This is under Miscellaneous on this page: http://docs.couchbase.com/admin/admin/Misc/deprecated.html
Comment by Patrick Varley [ 20/Nov/14 ]
Let me look into this, as I'm not to sure myself.




[MB-12716] _all_docs is mention twice under Miscellaneous on the deprecated page Created: 19/Nov/14  Updated: 21/Nov/14  Resolved: 21/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Minor
Reporter: Patrick Varley Assignee: marija jovanovic
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/admin/admin/Misc/deprecated.html

Triage: Untriaged
Is this a Regression?: No

 Description   
On the http://docs.couchbase.com/admin/admin/Misc/deprecated.html under "Miscellaneous" _all_doc is mentioned twice once in the bullet points and again at the end.


 Comments   
Comment by marija jovanovic [ 21/Nov/14 ]
Second mentioning of all-docs removed.
May not be available on the website until next week




[MB-12715] fdb_iteration fails if multiple kv stores in file Created: 19/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: bug-backlog
Fix Version/s: techdebt-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sundar Sridharan Assignee: Sundar Sridharan
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged
Is this a Regression?: Unknown

 Description   
open main file
create default kvstore
create another kvstore in same file
insert items into both kvstores
iterator over items in any kvstore, items returned are not correct or iteration fails

 Comments   
Comment by Sundar Sridharan [ 19/Nov/14 ]
duplicate of MB-12465




[MB-12714] left join miss items Created: 19/Nov/14  Updated: 22/Nov/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Iryna Mironava
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 4h
Time Spent: Not Specified
Original Estimate: 4h

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
my items are:
{
  "name": "employee-1",
  "join_day": 1,
  "tasks_ids": [
    "test_task-1",
    "test_task-2"
  ],
  "mutated": 0,
  "join_mo": 1,
  "join_yr": 2010,
  "_id": "query-test-Engineer-2010-1-1-0",
  "job_title": "Engineer"
}

{"project": "MB",
"task_name": "name1"} with key test_task-1

Expected result - 2 items, i see only 1

SELECT employee.name, employee.tasks_ids, new_task.project, new_task.task_name FROM b0 as employee LEFT JOIN b1 as new_task ON KEYS employee.tasks_ids;
{
    "request_id": "448b1a88-7901-4130-b042-dd1d999f661a",
    "signature": {
        "name": "json",
        "project": "json",
        "task_name": "json",
        "tasks_ids": "json"
    },
    "results": [
        {
            "name": "employee-1",
            "project": "MB",
            "task_name": "name1",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "36.477ms",
        "executionTime": "36.101ms",
        "resultCount": 1,
        "resultSize": 210
    }
}

cbq>



 Comments   
Comment by Gerald Sangudi [ 22/Nov/14 ]
Hi Iryna,

The behavior is correct for this data set. In order to get 2 results with a LEFT JOIN, your b0 bucket should have 2 documents. The second document in b0 can have task_ids that are not present in b1.




[MB-12713] json non-doc ints are not validated correctly in where clause Created: 19/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: cbq-DP4
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Have one item with int 20
cbq> select * from b1;
{
    "request_id": "02baa08c-93c4-4fa5-856d-36e93086135e",
    "signature": {
        "*": "*"
    },
    "results": [
        {
            "b1": 20
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "30.451ms",
        "executionTime": "29.732ms",
        "resultCount": 1,
        "resultSize": 32
    }
}

try to use a where expression, it return empty result:

cbq> select v from b1 where v>10;
{
    "request_id": "54d432f4-748b-4850-be0a-6fa2eed8d68e",
    "signature": {
        "v": "json"
    },
    "results": [
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "7.899ms",
        "executionTime": "7.358ms",
        "resultCount": 0,
        "resultSize": 0
    }
}






[MB-12712] File pointer not correctly set Created: 19/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0
Fix Version/s: sherlock
Security Level: Public

Type: Bug Priority: Major
Reporter: Volker Mische Assignee: Volker Mische
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
The ID b-tree of the spatial views is updated through the native C updater (the views are not). This changes the file, hence the information about the end of the file needs to be updated.




[MB-12711] CMake picks up wrong nif header when using "download deps" Created: 19/Nov/14  Updated: 21/Nov/14  Resolved: 21/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: build
Affects Version/s: sherlock
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Trond Norbye Assignee: Volker Mische
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
If you're invoking make like:

trond@ok:1096> gmake EXTRA_CMAKE_OPTIONS="-DCB_DOWNLOAD_DEPS=1"

you'll see:

-- Erlang runtime and compiler found in /Users/trond/compile/couchbase/sherlock/install/bin/erl and /Users/trond/compile/couchbase/sherlock/install/bin/erlc
-- Escript interpreter found in /Users/trond/compile/couchbase/sherlock/install/bin/escript
-- Erlang nif header in /usr/local/lib/erlang/usr/include

As you see it picks up the correct binaries, but use the nif header I've installed from homebrew instead of the one with the erlang actually used.

 Comments   
Comment by Volker Mische [ 20/Nov/14 ]
Please provide me some information on how to get the Erlang dependency (on Linux), so that I can try to fix it properly.
Comment by Trond Norbye [ 21/Nov/14 ]
We don't use the deps for linux yet... you'd probably have the same problem if you have two different erlang versions installed...
Comment by Volker Mische [ 21/Nov/14 ]
I guess I could do a similar thing to deps on Linux so that it tries to pick it up. Can you give pointers. It would be so much easier if I can reproduce this issue easily.
Comment by Trond Norbye [ 21/Nov/14 ]
The deps basically extracts a tar archive of an installation into the "install" directory (so that you have install/bin/erl etc).
Comment by Volker Mische [ 21/Nov/14 ]
Trond, I think you're the one to close it.




[MB-12710] Panic when count(*) in order by (invalid memory address or nil ptr dereference) Created: 18/Nov/14  Updated: 20/Nov/14  Resolved: 19/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Major
Reporter: Isha Kandaswamy Assignee: Gerald Sangudi
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 4h
Time Spent: Not Specified
Original Estimate: 4h

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
 SELECT profile_details.prefs.ui_theme, count(*) AS theme_usage FROM default:user_profile GROUP BY profile_details.prefs.ui_theme ORDER BY count(*) ;
{
    "request_id": "64a1e431-a59a-4e08-a629-7573cadf89d2",
    "signature": {
        "theme_usage": "number",
        "ui_theme": "json"
    },
    "results": [_time="2014-11-18T20:36:09-08:00" _level="ERROR" _msg="" panic=runtime error: invalid memory address or nil pointer dereference stack="goroutine 628 [running]:\ngithub.com/couchbaselabs/query/execution.(*Context).Recover(0xc208b5a750)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/context.go:194 +0xd4\nruntime.panic(0x790d80, 0xcefe44)\n\t/usr/local/go/src/pkg/runtime/panic.c:248 +0x18d\ngithub.com/couchbaselabs/query/execution.(*Order).Less(0xc2080339a0, 0x1, 0x0, 0xc208991400)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/order.go:125 +0x65f\ngithub.com/couchbaselabs/query/sort.insertionSort(0xeb37e0, 0xc2080339a0, 0x0, 0x6)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/sort/sort.go:24 +0x8d\ngithub.com/couchbaselabs/query/sort.quickSort(0xeb37e0, 0xc2080339a0, 0x0, 0x6, 0x6, 0x0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/sort/sort.go:208 +0x429\ngithub.com/couchbaselabs/query/sort.Sort(0xeb37e0, 0xc2080339a0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/sort/sort.go:229 +0xf8\ngithub.com/couchbaselabs/query/execution.(*Order).afterItems(0xc2080339a0, 0xc208b5a750)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/order.go:72 +0x107\ngithub.com/couchbaselabs/query/execution.func·002()\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:158 +0x34d\nsync.(*Once).Do(0xc2080339f0, 0xc208bd4ab0)\n\t/usr/local/go/src/pkg/sync/once.go:40 +0x9f\ngithub.com/couchbaselabs/query/execution.(*base).runConsumer(0xc2080339a0, 0xeb3108, 0xc2080339a0, 0xc208b5a750, 0x0, 0x0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:159 +0x160\ngithub.com/couchbaselabs/query/execution.(*Order).RunOnce(0xc2080339a0, 0xc208b5a750, 0x0, 0x0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/order.go:54 +0xa2\ncreated by github.com/couchbaselabs/query/execution.func·002\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:135 +0x29d\n"
goroutine 628 [running]:
github.com/couchbaselabs/query/execution.(*Context).Recover(0xc208b5a750)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/context.go:194 +0xd4
runtime.panic(0x790d80, 0xcefe44)
/usr/local/go/src/pkg/runtime/panic.c:248 +0x18d
github.com/couchbaselabs/query/execution.(*Order).Less(0xc2080339a0, 0x1, 0x0, 0xc208991400)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/order.go:125 +0x65f
github.com/couchbaselabs/query/sort.insertionSort(0xeb37e0, 0xc2080339a0, 0x0, 0x6)
/Users/isha/query2/src/github.com/couchbaselabs/query/sort/sort.go:24 +0x8d
github.com/couchbaselabs/query/sort.quickSort(0xeb37e0, 0xc2080339a0, 0x0, 0x6, 0x6, 0x0)
/Users/isha/query2/src/github.com/couchbaselabs/query/sort/sort.go:208 +0x429
github.com/couchbaselabs/query/sort.Sort(0xeb37e0, 0xc2080339a0)
/Users/isha/query2/src/github.com/couchbaselabs/query/sort/sort.go:229 +0xf8
github.com/couchbaselabs/query/execution.(*Order).afterItems(0xc2080339a0, 0xc208b5a750)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/order.go:72 +0x107
github.com/couchbaselabs/query/execution.func·002()
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:158 +0x34d
sync.(*Once).Do(0xc2080339f0, 0xc208bd4ab0)
/usr/local/go/src/pkg/sync/once.go:40 +0x9f
github.com/couchbaselabs/query/execution.(*base).runConsumer(0xc2080339a0, 0xeb3108, 0xc2080339a0, 0xc208b5a750, 0x0, 0x0)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:159 +0x160
github.com/couchbaselabs/query/execution.(*Order).RunOnce(0xc2080339a0, 0xc208b5a750, 0x0, 0x0)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/order.go:54 +0xa2
created by github.com/couchbaselabs/query/execution.func·002
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:135 +0x29d

    ]
    "errors": [
        {
            "caller": "context:203",
            "cause": "runtime error: invalid memory address or nil pointer dereference",
            "code": 5000,
            "key": "Internal Error",
            "message": "Panic: runtime error: invalid memory address or nil pointer dereference"
        }
    ],
    "status": "stopped",
    "metrics": {
        "elapsedTime": "5.700674ms",
        "executionTime": "5.634316ms",
        "resultCount": 0,
        "resultSize": 0,
        "errorCount": 1
    }
}


 Comments   
Comment by Gerald Sangudi [ 19/Nov/14 ]
Fixed.
Comment by Iryna Mironava [ 20/Nov/14 ]
verified




[MB-12709] cbbackupwrapper and cbbackupwrapper throw error "argparse module not found" on CentOS-6.4 Created: 18/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Ashvinder Singh Assignee: Bin Cui
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Centos 6

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Found on build 3.0.2-1542
Running python: 2.6.6

cbbackupwrapper and cbrestorewrapper should be compatible with python version 2.4 and later.
By default the CentOS-6 comes with Python-2.6.

cbbackupwrapper and cbrestorewrapper throw the following error:

>>>>>>>>>>>>>>

[root@centos-64-x64 bin]# ./cbrestorewrapper
Traceback (most recent call last):
  File "/opt/couchbase/lib/python/cbrestorewrapper", line 4, in <module>
    import argparse
ImportError: No module named argparse

<<<<<<<<<<<




 Comments   
Comment by Bin Cui [ 19/Nov/14 ]
http://review.couchbase.org/#/c/43413/
Comment by Bin Cui [ 19/Nov/14 ]
json module doesn't exist too
Comment by Bin Cui [ 19/Nov/14 ]
http://review.couchbase.org/#/c/43415/




[MB-12708] Go-XDCR: Http server crashes on starting (gometa 'Repository' structure has changed) Created: 18/Nov/14  Updated: 18/Nov/14

Status: Open
Project: Couchbase Server
Component/s: cross-datacenter-replication
Affects Version/s: sherlock
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aruna Piravi Assignee: Yu Sui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Epic Link: XDCR next release
Is this a Regression?: No

 Description   
Pulled latest code(forestdb, gometa, goforestdb, goxdcr).

Please refer commit https://github.com/couchbase/gometa/commit/5df9fc0d5b602151a707c9332aaa3ded3eb9f05a from gometa that has changed "Repository" struct to accommodate forestdb api changes.

As a result, http server crashes at line 103 of metadata_service.go in ActiveReplicationSpecs()

Arunas-MacBook-Pro:bin apiravi$ ./xdcr localhost:9000
starting gometa service. this will take a couple seconds
started gometa service.
MetadataService17:09:38.052419 [INFO] Metdata service started with host=127.0.0.1:5003
PipelineManager17:09:38.052559 [INFO] Pipeline Manager is constucted
ReplicationManager17:09:38.052564 [INFO] Replication manager init - starting existing replications
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0x118 pc=0x7fff8a9cee3d]

runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x7fff8a9cee3d
runtime.throw(0x46c5a76)
/usr/local/go/src/pkg/runtime/panic.c:520 +0x69
runtime: unexpected return pc for runtime.sigpanic called from 0x7fff8a9cee3d
runtime.sigpanic()
/usr/local/go/src/pkg/runtime/os_darwin.c:439 +0x3d

goroutine 16 [syscall]:
runtime.cgocall(0x4001740, 0x49de8c8)
/usr/local/go/src/pkg/runtime/cgocall.c:143 +0xe5 fp=0x49de8b0 sp=0x49de868
github.com/couchbaselabs/goforestdb._Cfunc_fdb_iterator_init(0x4f00010, 0xc20803c128, 0xc208001160, 0x6, 0xc208001168, 0x6, 0x4b00000, 0xc20801290c)
github.com/couchbaselabs/goforestdb/_obj/_cgo_defun.c:215 +0x36 fp=0x49de8c8 sp=0x49de8b0
github.com/couchbaselabs/goforestdb.(*Database).IteratorInit(0xc20803c120, 0xc208001160, 0x6, 0x8, 0xc208001168, 0x6, 0x8, 0x4001600, 0xc208001150, 0x0, ...)
/Users/apiravi/sherlock/godeps/src/github.com/couchbaselabs/goforestdb/iterator.go:99 +0x117 fp=0x49de950 sp=0x49de8c8
github.com/couchbase/gometa/repository.(*Repository).NewIterator(0xc208001150, 0x447abf0, 0x6, 0x447ac10, 0x6, 0xc20803f9c0, 0x0, 0x0)
/Users/apiravi/sherlock/godeps/src/github.com/couchbase/gometa/repository/repo.go:156 +0x28c fp=0x49dea80 sp=0x49de950
github.com/couchbase/goxdcr/services.(*MetadataSvc).ActiveReplicationSpecs(0xc20800f3c0, 0x450e390, 0x0, 0x0)
/Users/apiravi/sherlock/goproj/src/github.com/couchbase/goxdcr/services/metadata_service.go:103 +0x9f fp=0x49deb48 sp=0x49dea80
github.com/couchbase/goxdcr/replication_manager.(*replicationManager).startReplications(0x46e6000)
/Users/apiravi/sherlock/goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:295 +0x77 fp=0x49dec08 sp=0x49deb48
github.com/couchbase/goxdcr/replication_manager.(*replicationManager).init(0x46e6000, 0x4b12dd8, 0xc20800f3c0, 0x4b12e20, 0x46e4db0, 0x4b12e70, 0x46e4db0, 0x4b12ec0, 0x46e4db0)
/Users/apiravi/sherlock/goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:64 +0x183 fp=0x49dec98 sp=0x49dec08
github.com/couchbase/goxdcr/replication_manager.func·001()
/Users/apiravi/sherlock/goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:48 +0x70 fp=0x49dece8 sp=0x49dec98
sync.(*Once).Do(0x46e6040, 0x49ded18)
/usr/local/go/src/pkg/sync/once.go:40 +0x9f fp=0x49ded00 sp=0x49dece8
github.com/couchbase/goxdcr/replication_manager.Initialize(0x4b12dd8, 0xc20800f3c0, 0x4b12e20, 0x46e4db0, 0x4b12e70, 0x46e4db0, 0x4b12ec0, 0x46e4db0)
/Users/apiravi/sherlock/goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:49 +0x6d fp=0x49ded48 sp=0x49ded00
main.main()
/Users/apiravi/sherlock/goproj/src/github.com/couchbase/goxdcr/main/main.go:76 +0x645 fp=0x49def50 sp=0x49ded48
runtime.main()
/usr/local/go/src/pkg/runtime/proc.c:247 +0x11a fp=0x49defa8 sp=0x49def50
runtime.goexit()
/usr/local/go/src/pkg/runtime/proc.c:1445 fp=0x49defb0 sp=0x49defa8
created by _rt0_go
/usr/local/go/src/pkg/runtime/asm_amd64.s:97 +0x120


 Comments   
Comment by Aruna Piravi [ 18/Nov/14 ]
Working around by commenting out contents of ActiveReplicationSpecs().




[MB-12707] Very difficult to determine why Couchbase Server fails to start up Created: 18/Nov/14  Updated: 18/Nov/14

Status: Open
Project: Couchbase Server
Component/s: installer, ns_server
Affects Version/s: 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Major
Reporter: Perry Krug Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
This has been an ongoing issue for many versions now but I hope we can make some meaningful progress in the near-term.

When there are low-level problems with Couchbase starting up it's very difficult to diagnose even when the root cause is something very simple.

The experience from the end user is:
-Installing the software says it works
-Starting the service says it works
-One or more beam.smp processes are even running
-Yet 8091 is not responding to any requests

Initial investigation usually shows the babysitter process running and ns_server continuously restarting, but there's no indication to the user:
-That there is a problem
-Which log to look for
-What log messages to look at

I know there can be many underlying causes, but it's the experience for the end-user that I'm hoping we can improve upon here.

For a specific example, I've uploaded this set of logs after trying to install and run one of our latest builds on a CentOS 6.5 system: http://s3.amazonaws.com/customers.couchbase.com/perry/output.zip

 Comments   
Comment by Bin Cui [ 18/Nov/14 ]
After installer launch couchbase-server module, the logic belongs to ns_server then.




[MB-12706] add operation on a temporary item fails Created: 18/Nov/14  Updated: 20/Nov/14  Resolved: 20/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 2.5.1
Fix Version/s: 2.5.1, 3.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Sriram Ganesan Assignee: Sriram Ganesan
Resolution: Fixed Votes: 0
Labels: MP2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-12274 2.5.1 Maintenance Patch-2 Release Open
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: No

 Description   
In a XDCR environment, when add->delete->add sequence of operations is performed, the second add operation could fail if the operation was performed on a temporary items. Temporary items will be created for items not present in memory, so that the background fetcher can update it with the meta data.

 Comments   
Comment by Chiyoung Seo [ 18/Nov/14 ]
http://review.couchbase.org/#/c/43383/

The fix was merged into ep-engine 2.5.1.1 branch.




[MB-12705] cbbackup should be able to recover when previous invocation was interrupted Created: 18/Nov/14  Updated: 19/Nov/14  Resolved: 18/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0.1
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Brian Williams Assignee: Bin Cui
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
If cbbackup -m full is interrupted ( say with a Control-C ) and then is invoked again with -m accu, it fails almost immediately with a Python stack trace. The error seems to happen because it tries to read back a malformed JSON file that it was writing to during the previous invocation.

Steps to reproduce:

1. /opt/couchbase/bin/cbbackup -u Administrator -p password -m full -b bucketname -v http://localhost:8091 ./backupfolder

2. Press Control-C when you see the progress meter

 [#### ] 19.0% (19000/estimated 100006 msgs)

3. Invoke cbbackup again but this time with the accu mode:

/opt/couchbase/bin/cbbackup -u Administrator -p password -m accu -b bucketname -v http://localhost:8091 ./backupfolder

4. Observe the python stack trace

Exception in thread w0:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/opt/couchbase/lib/python/pump.py", line 278, in run_worker
    curx)
  File "/opt/couchbase/lib/python/pump_bfd2.py", line 20, in check_spec
    getattr(opts, "mode", "diff"))
  File "/opt/couchbase/lib/python/pump_bfd.py", line 266, in find_seqno
    json_data = json.load(json_file)
  File "/opt/couchbase/lib/python/simplejson/__init__.py", line 267, in load
    parse_constant=parse_constant, **kw)
  File "/opt/couchbase/lib/python/simplejson/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/opt/couchbase/lib/python/simplejson/decoder.py", line 335, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/couchbase/lib/python/simplejson/decoder.py", line 351, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/opt/couchbase/lib/python/simplejson/scanner.py", line 36, in _scan_once
    return parse_object((string, idx + 1), encoding, strict, _scan_once, object_hook)
  File "/opt/couchbase/lib/python/simplejson/decoder.py", line 187, in JSONObject
    raise ValueError(errmsg("Expecting object", s, end))
ValueError: Expecting object: line 1 column 8192 (char 8192)

In this particular case, I found that pump_bfd.py was trying to read a file called failover.json which was under

backupfolder/2014-11-18T190153Z/2014-11-18T190153Z-full/bucket-bucketname/node-10.4.2.121%3A8091/

and failover.json was not a well-formed JSON object

The last few bytes of failover.json were as follows:

, "437": [[44613763991742, 0]],






 Comments   
Comment by Bin Cui [ 18/Nov/14 ]
cbbackup can be recovered from interruption under certain senarios, but NOT all scenarios.

1. If interruption comes during data transferred time, cbbackup can pick up continue from the last batch which is persisted.
2. if interruption comes during batch persistence time, some of meta files can be corrupted. That's the case with this bug description. Without these meta files, cbbackup cannot find the last sequence number that the document is successfully persisted. And make things worse, it is possible that we cannot correctly restore data even if the tool proceeds and back up as instructed but with corrupted meta data files.
3. Data security and integrity take highest priority than backup throughput. In this case, the best practice is to launch another full backup.
Comment by Brian Williams [ 19/Nov/14 ]
Thanks Bin.

Is it possible to intercept Control-C and exit cleanly?
Comment by Brian Williams [ 19/Nov/14 ]
Bin,
Also, is it possible for cbbackup to expect and handle the exception for malformed meta files?
It could exit with a message saying that the meta files are corrupt and that the user needs to restart the backup for the reasons you mentioned ( data security and integrity )
thanks
Brian




[MB-12704] Need more details on view consistency in 3.0 Created: 18/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: sherlock
Security Level: Public

Type: Task Priority: Major
Reporter: Sriram Melkote Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-12090 add stale=false semantic changes to d... Open

 Description   
SE requests that based on questions asked by customers, we should add a section to:

(a) Explain stale=false operation in 3.0 vs 2.5 (and explain lack of persistTo)

(b) Explain stale=ok with usage examples (including tuning update interval)

Dev would need to supply draft for these to docs to start this task.

 Comments   
Comment by Perry Krug [ 18/Nov/14 ]
Thanks Siri.

The sort of documentation that we really would like to see is the "how to do..." something to achieve what the end user is looking to achieve. I think it would also be helpful to provide examples of how to achieve end-to-end write consistency along with a "user-facing level" of explanation of how and why it works like that.

Write serialization is an important concept as well so I think it's worth covering at a high level but as a separate section for discussion.
Comment by Amy Kurtzman [ 18/Nov/14 ]
There is some information about stale=false views in the docs at http://docs.couchbase.com/admin/admin/Views/views-index-updates.html.
Comment by Amy Kurtzman [ 18/Nov/14 ]
Is Sherlock the correct affected version?
Is Sherlock the correct fix version?

It looks to me that it really applies to new 3.0 feature changes.
Comment by Amy Kurtzman [ 18/Nov/14 ]
Not sure if this is a duplicate, but it's a definitely related issue.
Comment by Mel Boulos [ 18/Nov/14 ]
It would be nice to include query behavior for views during failure scenarios. Here is a customer question I've been trying to get information on.

I wanted to run a failure scenario on how views would perform if 1 of the 4 nodes were down. This is 2.5, will this behavior change in 3.0?

Please confirm my understanding of how views will behave during a node failure. If you have 4 nodes, 1 is down then query a view the Stale will behave as follows
— if Stale == False, the query will return with error
— if Stale == True, the query will recognize there is a downed node and query the replica index

What I’m not sure about is how Stale == Update After will behave if a node is down.
- Since the node is down will it continuously spawn an indexer to update index?
- If so, how often and will there be thousands of indexer threads being spawn?
- When node fails over will it know that the 1k index threads were spawn during the failover and to only initiate just one index thread?
Comment by Benjamin Bryant [ 21/Nov/14 ]
We have also found that the views engine with DCP is running so fast that it even picks up writes that are being written after a stale=false query is submitted. This has caused some confusion. It is a different type of "inconsistency" from what we used to see before. In 2.5 without persistence we would miss the true "present" state of the dataset, but now in 3.0 we are so fast that the "present" can look a little fuzzy.




[MB-12703] Use consistent definition of CAS acronym Created: 18/Nov/14  Updated: 18/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1
Fix Version/s: 3.0.2
Security Level: Public

Type: Task Priority: Major
Reporter: Amy Kurtzman Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: client
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
CAS acronym definition is inconsistent throughout the documentation. When the acronym is expanded, use the definition compare and swap.

(The incorrect acronym definition is check and set, which is often used in the documentation.)




[MB-12702] Ambiguity with aliases in the select clause Created: 18/Nov/14  Updated: 21/Nov/14  Resolved: 21/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Improvement Priority: Major
Reporter: Isha Kandaswamy Assignee: Gerald Sangudi
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 4h
Time Spent: Not Specified
Original Estimate: 4h


 Description   
The following query used to throw a semantic error since there was already a column name title and an alias title.

SELECT title, details.title AS title FROM default:catalog ORDER BY title

It incorrectly gives an output.

For ambiguity in the from clause, the query returns an error.

Note : Return warning in the case of a.*, b.* having the same result.




[MB-12701] cbcollect, use numerical ports for ss / socket statistics command Created: 18/Nov/14  Updated: 18/Nov/14

Status: Open
Project: Couchbase Server
Component/s: None
Affects Version/s: 3.0.1, 3.0
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Trivial
Reporter: Ian McCloy Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates to
relates to MB-11671 couchbase.log needs alternitives to n... Resolved

 Description   
Follow up from MB-11671 / http://review.couchbase.org/#/c/39267/

a better way to call ss is to use -n to prevent DNS lookups and also not try to resolve port numbers to names
 -n, --numeric
              Do not try to resolve service names..

LinuxTask("Network socket statistics", "ss -a")
change to
LinuxTask("Network socket statistics", "ss -an")

I'll add a pull request.




[MB-12699] [3.0.2] View query failed with error "lexical error: invalid char in json text" Created: 18/Nov/14  Updated: 21/Nov/14  Resolved: 18/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Meenakshi Goel Assignee: Meenakshi Goel
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.2-1520-rel

Issue Links:
Duplicate
duplicates MB-12697 QueryViewException: Error occured que... Open
Triage: Triaged
Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
Jenkins Ref Link:
http://qa.sc.couchbase.com/job/centos_x64--65_03--view_dgm_tests-P1/10/consoleFull

Test to Reproduce:
view.createdeleteview.CreateDeleteViewTests.pending_removal_with_ddoc_ops,ddoc_ops=update,test_with_view=True,num_ddocs=3,num_views_per_ddoc=3,items=200000,nodes_out=1,active_resident_threshold=30,dgm_run=True,eviction_policy=fullEviction

Steps to Reproduce:
1. Setup 4-node cluster
2. Create default bucket
3. Load documents 200000
4. Load data to reach threshold 30
5. Create Views
6. Start ddoc operations parallel with node failover
7. Run View Queries

2014-11-17 23:06:26 | INFO | MainProcess | test_thread | [createdeleteview._verify_ddoc_data_all_buckets] DDoc Data Validation Started on bucket default. Expected Data Items 1650000
2014-11-17 23:06:26 | INFO | MainProcess | Cluster_Thread | [rest_client._query] index query url: http://172.23.107.20:8092/default/_design/dev_ddoc2/_view/views0?stale=false&connection_timeout=60000&full_
set=true
2014-11-17 23:10:00 | INFO | MainProcess | Cluster_Thread | [rest_client._query] index query url: http://172.23.107.20:8092/default/_design/dev_ddoc2/_view/views0?stale=false&connection_timeout=60000&full_
set=true
2014-11-17 23:15:27 | INFO | MainProcess | Cluster_Thread | [task.check] Server: 172.23.107.20, Design Doc: dev_ddoc2, View: views0, (1650000 rows) expected, (1111049 rows) returned
ERROR
[('/usr/lib64/python2.6/threading.py', 504, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib64/python2.6/threading.py', 532, '__bootstrap_inner', 'self.run()'), ('lib/tasks/taskmanager.py', 31, 'run'
, 'task.step(self)'), ('lib/tasks/task.py', 58, 'step', 'self.check(task_manager)'), ('lib/tasks/task.py', 1757, 'check', 'self.set_exception(e)'), ('lib/tasks/future.py', 264, 'set_exception', 'print trac
eback.extract_stack()')]
Mon Nov 17 23:15:27 2014
[('/usr/lib64/python2.6/threading.py', 504, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib64/python2.6/threading.py', 532, '__bootstrap_inner', 'self.run()'), ('testrunner.py', 262, 'run', '**self.
_Thread__kwargs)'), ('/usr/lib64/python2.6/unittest.py', 752, 'run', 'test(result)'), ('/usr/lib64/python2.6/unittest.py', 299, '__call__', 'return self.run(*args, **kwds)'), ('/usr/lib64/python2.6/unittes
t.py', 278, 'run', 'testMethod()'), ('pytests/view/createdeleteview.py', 627, 'pending_removal_with_ddoc_ops', 'self._verify_ddoc_data_all_buckets()'), ('pytests/view/createdeleteview.py', 275, '_verify_dd
oc_data_all_buckets', 'result = self.cluster.query_view(self.master, ddoc_name, view.name, query, num_items, bucket)'), ('lib/couchbase/cluster.py', 479, 'query_view', 'return _task.result(timeout)'), ('li
b/tasks/future.py', 160, 'result', 'return self.__get_result()'), ('lib/tasks/future.py', 111, '__get_result', 'print traceback.extract_stack()')]
2014-11-17 23:16:59 | WARNING | MainProcess | test_thread | [basetestcase.tearDown] CLEANUP WAS SKIPPED

======================================================================
ERROR: pending_removal_with_ddoc_ops (view.createdeleteview.CreateDeleteViewTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pytests/view/createdeleteview.py", line 627, in pending_removal_with_ddoc_ops
    self._verify_ddoc_data_all_buckets()
  File "pytests/view/createdeleteview.py", line 275, in _verify_ddoc_data_all_buckets
    result = self.cluster.query_view(self.master, ddoc_name, view.name, query, num_items, bucket)
  File "lib/couchbase/cluster.py", line 479, in query_view
    return _task.result(timeout)
  File "lib/tasks/future.py", line 160, in result
    return self.__get_result()
  File "lib/tasks/future.py", line 112, in __get_result
    raise self._exception
QueryViewException: Error occured querying view views0: {u'reason': u'lexical error: invalid char in json text.\n', u'from': u'http://172.23.107.22:8092/_view_merge/?stale=false&#39;}

Live Cluster:
172.23.107.20
172.23.107.21
172.23.107.22
172.23.107.23

Uploading Logs.

 Comments   
Comment by Meenakshi Goel [ 18/Nov/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12699/8586d8eb/172.23.107.20-11172014-2317-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12699/d9aca249/172.23.107.22-11172014-2318-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12699/439baa5e/172.23.107.21-11172014-2320-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12699/d42ed0d3/172.23.107.23-11172014-2321-diag.zip
Comment by Sriram Melkote [ 18/Nov/14 ]
Duplicate, tracking (potentially test) issue on MB-12697
Comment by Parag Agarwal [ 20/Nov/14 ]
 5 tests failing

http://qa.hq.northscale.net/job/centos_x64--1000_00--Failover-P0/68/console
Comment by Harsha Havanur [ 21/Nov/14 ]
This looks like a test issue. It fails even without Async Nif changes at commit 7f6f0e28fd8186b902a454f9c203b48d4dd19d16.

I have observed this test fails at least once in every 3 runs. To reach the point of failure quickly use
view.createdeleteview.CreateDeleteViewTests.pending_removal_with_ddoc_ops,ddoc_ops=update,test_with_view=True,num_ddocs=3,num_views_per_ddoc=3,items=20000,nodes_out=1,active_resident_threshold=100,dgm_run=True,eviction_policy=fullEviction
Comment by Harsha Havanur [ 21/Nov/14 ]
Serialize parse jason request chunks per query and assigned each chunk to same execution thread in the order of their arrival.
Review in progress at http://review.couchbase.org/#/c/43494/

Correct test that resulted and corrected the error was. It gets rid of lexical error nevertheless test fails.
view.viewquerytests.ViewQueryTests.test_query_node_warmup,docs-per-day=500,GROUP=P0,retries=250
Comment by Harsha Havanur [ 21/Nov/14 ]
Toy build at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_cent64-3.0.0-toy-hhs-x86_64_3.0.0-735-toy.rpm




[MB-12698] Disk queue (ep_queue_size) taking longer time to flush Created: 18/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sangharsh Agarwal Assignee: Sangharsh Agarwal
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.2-1514-rel

Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.121.217 : https://s3.amazonaws.com/bugdb/jira/MB-12698/1605ebac/10.3.121.217-11172014-1522-couch.tar.gz
10.3.121.217 : https://s3.amazonaws.com/bugdb/jira/MB-12698/a7cd7865/10.3.121.217-diag.txt.gz
10.3.121.217 : https://s3.amazonaws.com/bugdb/jira/MB-12698/fa83b303/10.3.121.217-11172014-1453-diag.zip
10.3.2.203 : https://s3.amazonaws.com/bugdb/jira/MB-12698/879ed3b1/10.3.2.203-diag.txt.gz
10.3.4.218 : https://s3.amazonaws.com/bugdb/jira/MB-12698/a02a66cb/10.3.4.218-11172014-1522-couch.tar.gz
10.3.4.218 : https://s3.amazonaws.com/bugdb/jira/MB-12698/dbb27529/10.3.4.218-11172014-154-diag.zip
10.3.4.218 : https://s3.amazonaws.com/bugdb/jira/MB-12698/faea2cd5/10.3.4.218-diag.txt.gz

[Destination]
10.3.121.219 : https://s3.amazonaws.com/bugdb/jira/MB-12698/a1ee2317/10.3.121.219-diag.txt.gz
10.3.121.220 : https://s3.amazonaws.com/bugdb/jira/MB-12698/286fba27/10.3.121.220-11172014-1517-diag.zip
10.3.121.220 : https://s3.amazonaws.com/bugdb/jira/MB-12698/d644f2e2/10.3.121.220-diag.txt.gz
10.3.121.220 : https://s3.amazonaws.com/bugdb/jira/MB-12698/ec676976/10.3.121.220-11172014-1523-couch.tar.gz
10.3.2.202 : https://s3.amazonaws.com/bugdb/jira/MB-12698/441085a7/10.3.2.202-11172014-1513-diag.zip
10.3.2.202 : https://s3.amazonaws.com/bugdb/jira/MB-12698/7c7dcb07/10.3.2.202-diag.txt.gz
10.3.2.202 : https://s3.amazonaws.com/bugdb/jira/MB-12698/b82f0e89/10.3.2.202-11172014-1523-couch.tar.gz
Is this a Regression?: Yes

 Description   
http://qa.hq.northscale.net/view/3.0%20By%20COMPONENT/job/centos_x64--31_03--rebalanceXDCR_SSL-P1/90/consoleFull

[Test]
./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True,demand_encryption=1,GROUP=ALL -t xdcr.rebalanceXDCR.Rebalance.async_rebalance_out,items=100000,rdirection=bidirection,ctopology=chain,expires=60,doc-ops=update-delete,doc-ops-dest=update-delete,rebalance=source-destination,num_rebalance=1,GROUP=P1

[Test Logs]
[2014-11-17 14:51:10,246] - [data_helper:295] INFO - creating direct client 10.3.4.218:11210 default
[2014-11-17 14:51:10,692] - [task:459] WARNING - Not Ready: ep_queue_size 6074 == 0 expected on '10.3.4.218:8091', default bucket
[2014-11-17 14:51:15,999] - [task:459] WARNING - Not Ready: ep_queue_size 87 == 0 expected on '10.3.121.217:8091', default bucket
[2014-11-17 14:51:16,009] - [task:459] WARNING - Not Ready: ep_queue_size 4 == 0 expected on '10.3.4.218:8091', default bucket
[2014-11-17 14:51:21,034] - [task:459] WARNING - Not Ready: ep_queue_size 28 == 0 expected on '10.3.121.217:8091', default bucket
[2014-11-17 14:51:21,043] - [task:459] WARNING - Not Ready: ep_queue_size 6 == 0 expected on '10.3.4.218:8091', default bucket
[2014-11-17 14:51:26,078] - [task:459] WARNING - Not Ready: ep_queue_size 2 == 0 expected on '10.3.121.217:8091', default bucket
[2014-11-17 14:51:26,168] - [task:459] WARNING - Not Ready: ep_queue_size 14 == 0 expected on '10.3.4.218:8091',


[Test Steps]
1. Setup Bi-directional CAPI mode XDCR.
     Source Nodes: 10.3.121.217, 10.3.2.203, 10.3.4.218
     Destination Nodes: 10.3.121.219, 10.3.121.220, 10.3.2.202
   Bucket: default
2. Load 1M items on Source and Destination Cluster.
3. Re-balance out one node (10.3.2.203) at Source Cluster.
4. Re-balance out one node (10.3.121.219) at Destination Cluster.
5. Wait for rebalance to finish.
6. Perform 30K Updates and deletes at both the clusters for distinct items.
7. Wait for ep_queue_size to 0 for 3 minutes. Failed here on Source cluster.


I checked the stats logs on 10.3.121.217, it seems ep_queue_size becomes 0 later on.



 Comments   
Comment by Sangharsh Agarwal [ 20/Nov/14 ]
Blocker.
Comment by Chiyoung Seo [ 20/Nov/14 ]
Sangharsh,

I need to understand how much longer the flusher task spends more time compared with which previous release (3.0.0 or 3.0.1).

Did you run the same test against the previous release in the same nodes?

From the engine stats, I didn't see anything suspicious in the flusher task.
Comment by Sangharsh Agarwal [ 20/Nov/14 ]
>I need to understand how much longer the flusher task spends more time compared with which previous release (3.0.0 or 3.0.1).

Yes we run each test with each release and it passed with 3.0.1-1365-rel.
Comment by Sangharsh Agarwal [ 20/Nov/14 ]
I can see we have fixed the issue http://www.couchbase.com/issues/browse/MB-12576.
Additionally this issue is occurring frequently on recent builds, can you please let me know what other information you need from my end?
Comment by Chiyoung Seo [ 21/Nov/14 ]
If draining the flusher queue takes a longer time, then this is a performance issue, not functional bug. I need to understand how much it gets longer. For example, the previous tests on the same nodes takes 100 sec to be completed, but the latest build takes more than 180 sec ... I will then work with the performance team to analyze this issue in detail.
Comment by Anil Kumar [ 21/Nov/14 ]
Thomas - Can you please take a look and confirm whether we are observing similar increase in flusher time in our perf runs.
Comment by Thomas Anderson [ 21/Nov/14 ]
like Chiyoung, i have to make assumptions (reluctantly) about what error is being reported. i believe the test scenario is (2) clusters, each 3 nodes of 4 cores each, 15G/12G memory, AWS provisioned. XDCR connects the two clusters, your load is 30K mutations and the test fails becuase it exceeds the allocated time to flush the queue(s) driving XDCR replication.
at question is whether the 'time limit' imposed is appropriate, or not, and whether the collections of activities is appropriate for the cluster size.
1) are there views?
2) performance test does not attempt XDCR or rebalace test on such a small cluster pair. we do not have minimum/maximum limits, but instead practical advice in sizing documents that would suggest the configuration is significantly undersized for the point load applied.
if i understand your followup explanation, if left to run (i.e., not force timeout) it will eventually complete/drain the queue correctly.
i would suggest that the 'functional' test either increase resources or decrease load and/or adjust timeout. i will use this reported issue as a test point for minimums.

in answer to Anil's question. yes we see DCP queue buildup in our tests, this is a special case of the Perry AWS Beer-sample threshold testing, simply we are not optimized for 4 node clusters with all streams active. drain rate has declined, with 3.x releases, as noted in perf tests, but considered intentional with the DCP paradigm introduced in 3.x.

suggest: to either increase the timelimit which cause the error reported, increase to 6 or 8 cores, reduce the load.
Comment by Thomas Anderson [ 21/Nov/14 ]
returning to allow you to close based on choice going forward




[MB-12697] QueryViewException: Error occured querying view default0: {u'reason': u'lexical error: invalid char in json text.\n'} Created: 18/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Sangharsh Agarwal Assignee: Harsha Havanur
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Build 3.0.2-1531-rel

Issue Links:
Duplicate
is duplicated by MB-12699 [3.0.2] View query failed with error ... Resolved
Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: [Source]
10.3.121.217 : https://s3.amazonaws.com/bugdb/jira/MB-12697/5da3a915/10.3.121.217-11172014-2258-diag.zip
10.3.121.217 : https://s3.amazonaws.com/bugdb/jira/MB-12697/c41e129b/10.3.121.217-diag.txt.gz

10.3.2.215 : https://s3.amazonaws.com/bugdb/jira/MB-12697/cbb89c5b/10.3.2.215-diag.txt.gz
10.3.2.215 : https://s3.amazonaws.com/bugdb/jira/MB-12697/d5ffe94e/10.3.2.215-11172014-2315-diag.zip
10.3.4.218 : https://s3.amazonaws.com/bugdb/jira/MB-12697/23beb4e0/10.3.4.218-11172014-236-diag.zip
10.3.4.218 : https://s3.amazonaws.com/bugdb/jira/MB-12697/ca475f3a/10.3.4.218-diag.txt.gz

[Destination]
10.3.121.219 : https://s3.amazonaws.com/bugdb/jira/MB-12697/1681275b/10.3.121.219-diag.txt.gz
10.3.121.219 : https://s3.amazonaws.com/bugdb/jira/MB-12697/4077be99/10.3.121.219-11172014-2332-diag.zip
10.3.121.220 : https://s3.amazonaws.com/bugdb/jira/MB-12697/5622a3a8/10.3.121.220-diag.txt.gz
10.3.121.220 : https://s3.amazonaws.com/bugdb/jira/MB-12697/8de93d7a/10.3.121.220-11172014-2326-diag.zip
10.3.2.202 : https://s3.amazonaws.com/bugdb/jira/MB-12697/26564f4e/10.3.2.202-diag.txt.gz
10.3.2.202 : https://s3.amazonaws.com/bugdb/jira/MB-12697/d416d1d0/10.3.2.202-11172014-2320-diag.zip
Is this a Regression?: Unknown

 Description   
http://qa.hq.northscale.net/view/3.0%20By%20COMPONENT/job/centos_x64--31_03--rebalanceXDCR_SSL-P1/90/consoleFull

[Test Error]
======================================================================
ERROR: swap_rebalance_replication_with_view_queries_and_ops (xdcr.rebalanceXDCR.Rebalance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pytests/xdcr/rebalanceXDCR.py", line 281, in swap_rebalance_replication_with_view_queries_and_ops
    [task.result(self._poll_timeout) for task in tasks]
  File "lib/tasks/future.py", line 160, in result
    return self.__get_result()
  File "lib/tasks/future.py", line 112, in __get_result
    raise self._exception
QueryViewException: Error occured querying view default0: {u'reason': u'lexical error: invalid char in json text.\n', u'from': u'http://10.3.2.215:8092/_view_merge/?stale=false&#39;}

----------------------------------------------------------------------
Ran 1 test in 3775.957s


[Test Steps]
1. Setup Uni-directional CAPI XDCR + SSL.
     Source Nodes: 10.3.121.217, 10.3.2.203, 10.3.4.218
     Destination Nodes: 10.3.121.219, 10.3.121.220, 10.3.2.202
    Bucket: default

2. Load 1M Binary items (Not JSON) on Source Nodes.
3. Create 5 Views (default0- default4) on Source and Destination cluster.
4. Perform 30% updates and Deletes on Source cluster.
5. Perform Swap re-balance on Source cluster. [remove_node:10.3.2.203] -> [add_node:10.3.2.215]
6. Peform view-query (?full_set=true&stale=false) on each views on source and destination untill swap-rebalance.
7. Verify Item counts on Source and Destination as 70000 items.
8. Perform view-query again to check 70000 rows returned by each view. -> Failed with error on newly added row.

======================================================================
ERROR: swap_rebalance_replication_with_view_queries_and_ops (xdcr.rebalanceXDCR.Rebalance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pytests/xdcr/rebalanceXDCR.py", line 281, in swap_rebalance_replication_with_view_queries_and_ops
    [task.result(self._poll_timeout) for task in tasks]
  File "lib/tasks/future.py", line 160, in result
    return self.__get_result()
  File "lib/tasks/future.py", line 112, in __get_result
    raise self._exception
QueryViewException: Error occured querying view default0: {u'reason': u'lexical error: invalid char in json text.\n', u'from': u'http://10.3.2.215:8092/_view_merge/?stale=false&#39;}

----------------------------------------------------------------------
Ran 1 test in 3775.957s

FAILED (errors=1)
downloading 10.3.121.217




 Comments   
Comment by Sangharsh Agarwal [ 18/Nov/14 ]
Cluster is not live.
Comment by Volker Mische [ 18/Nov/14 ]
This is not a bug in Couchbase but in the test.

You are mentioning that this test loads non-JSON data. Though the view functions (when checking the ddocs.log from node 10.3.2.215) looks like:

    function (doc) {
        emit(doc._id, doc);
    }

It tries to access doc._id, but as it's binary data it can't.

The actual error should be a different one, but this is just one obvious thing to get fixed first.
Comment by Volker Mische [ 18/Nov/14 ]
As it's a bug in the test, I'm assigning it back to Sangharsh.
Comment by Sangharsh Agarwal [ 19/Nov/14 ]
Same test is passed with same set of data in Without SSL XDCR in the same build. While failed with SSL + XDCR.
Comment by Sangharsh Agarwal [ 19/Nov/14 ]
It is a very old test. No recent changes.
Comment by Sangharsh Agarwal [ 19/Nov/14 ]
There are some recent changes in the view parsing: http://review.couchbase.org/#/c/42821/. Can you please confirm if this issue is not related to this change.
Comment by Sangharsh Agarwal [ 19/Nov/14 ]
Sri, Harsha, Can you please help in debugging this issue. Is there any change in back-end for which we need to change tests?
Comment by Meenakshi Goel [ 19/Nov/14 ]
-Please note that views tests failing was passing with earlier builds 3.0.2-1510-rel http://qa.sc.couchbase.com/job/centos_x64--65_03--view_dgm_tests-P1/6/consoleFull.
-Same failure is observed in some other tests too
Comment by Sriram Melkote [ 20/Nov/14 ]
Upgrading to blocker due to MB-12699 priority
Comment by Harsha Havanur [ 21/Nov/14 ]
Serialize parse jason request chunks per query and assigned each chunk to same execution thread in the order of their arrival.
Review in progress at http://review.couchbase.org/#/c/43494/
Comment by Harsha Havanur [ 21/Nov/14 ]
Toy build at
http://latestbuilds.hq.couchbase.com/couchbase-server-community_cent64-3.0.0-toy-hhs-x86_64_3.0.0-735-toy.rpm




[MB-12696] Couchbase Version: 3.0.0 Enterprise Edition (build-1209) Cluster State ID: 03B-020-218 Node Going Down after evry second day Created: 18/Nov/14  Updated: 21/Nov/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 3.0
Fix Version/s: bug-backlog
Security Level: Public

Type: Bug Priority: Critical
Reporter: Ashwini Ahire Assignee: Aleksey Kondratenko
Resolution: Unresolved Votes: 0
Labels: Down, Node
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Ram - 60 GB , Core 8 Core on each node .
CLuster with 3 Nodes. Harddisk - ssd 1TB

Operating System: Centos 64-bit
Is this a Regression?: Yes

 Description   
Couchbase 3.0 node going down at every weekend.
Version: 3.0.0 Enterprise Edition (build-1209)
Cluster State ID: 03B-020-217

Please see below logs.
Request you to pls let me know , to avoid this Failove.

Event Module Code Server Node Time
Remote cluster reference "Virginia_to_OregonS" updated. New name is "VirginiaM_to_OregonS". menelaus_web_remote_clusters000 ns_1ec2-####104.compute-1.amazonaws.com 12:46:38 - Mon Nov 17, 2014
Client-side error-report for user undefined on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com':
User-Agent:Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0
Got unhandled error:
Script error.
At:
http://ph.couchbase.net/v2?callback=jQuery162012552191850461902_1416204362614&launchID=8eba0b18a4e965daf1c3a0baecec994c-1416208180553-3638&version=3.0.0-1209-rel-enterprise&_=1416208180556:0:0
Backtrace:
<generated>
generateStacktrace@http://ec2-####108 -.compute-1.amazonaws.com:8091/js/bugsnag.js:411:7
bugsnag@http://ec2-####108 -.compute-1.amazonaws.com:8091/js/bugsnag.js:555:13

    menelaus_web102 ns_1@ec2-####108 -.compute-1.amazonaws.com 12:45:56 - Mon Nov 17, 2014
Replication from bucket "apro" to bucket "apro" on cluster "Virginia_to_OregonS" created. menelaus_web_xdc_replications000 ns_1@ec2-####108 -.compute-1.amazonaws.com 12:38:49 - Mon Nov 17, 2014
Replication from bucket "apro" to bucket "apro" on cluster "Virginia_to_OregonS" removed. xdc_rdoc_replication_srv000 ns_1@ec2-####108 -.compute-1.amazonaws.com 12:38:40 - Mon Nov 17, 2014
Rebalance completed successfully.
    ns_orchestrator001 ns_1@ec2-####107.compute-1.amazonaws.com 11:53:17 - Mon Nov 17, 2014
Bucket "ifa" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@ec2-####107.compute-1.amazonaws.com 11:53:04 - Mon Nov 17, 2014
Started rebalancing bucket ifa ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 11:53:02 - Mon Nov 17, 2014
Could not automatically fail over node ('ns_1@ec2-####108 -.compute-1.amazonaws.com'). Rebalance is running. auto_failover001 ns_1@ec2-####107.compute-1.amazonaws.com 11:49:58 - Mon Nov 17, 2014
Bucket "apro" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@ec2-####107.compute-1.amazonaws.com 11:48:02 - Mon Nov 17, 2014
Started rebalancing bucket apro ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 11:47:59 - Mon Nov 17, 2014
Bucket "apro" loaded on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com' in 366 seconds. ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 11:47:58 - Mon Nov 17, 2014
Bucket "ifa" loaded on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com' in 96 seconds. ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 11:43:29 - Mon Nov 17, 2014
Starting rebalance, KeepNodes = ['ns_1ec2-####104.compute-1.amazonaws.com',
'ns_1@ec2-####107.compute-1.amazonaws.com',
'ns_1@ec2-####108 -.compute-1.amazonaws.com'], EjectNodes = [], Failed over and being ejected nodes = [], Delta recovery nodes = ['ns_1@ec2-####108 -.compute-1.amazonaws.com'], Delta recovery buckets = all ns_orchestrator004 ns_1@ec2-####107.compute-1.amazonaws.com 11:41:52 - Mon Nov 17, 2014
Control connection to memcached on 'ns_1@ec2-####108 -.compute-1.amazonaws.com' disconnected: {badmatch,
{error,
closed}} ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:54 - Sun Nov 16, 2014
Node ('ns_1@ec2-####108 -.compute-1.amazonaws.com') was automatically failovered.
[stale,
{last_heard,{1416,152978,82869}},
{stale_slow_status,{1416,152863,60088}},
{now,{1416,152968,80503}},
{active_buckets,["apro","ifa"]},
{ready_buckets,["ifa"]},
{status_latency,5743},
{outgoing_replications_safeness_level,[{"apro",green},{"ifa",green}]},
{incoming_replications_conf_hashes,
[{"apro",
[{'ns_1ec2-####104.compute-1.amazonaws.com',126796989},
{'ns_1@ec2-####107.compute-1.amazonaws.com',41498822}]},
{"ifa",
[{'ns_1ec2-####104.compute-1.amazonaws.com',126796989},
{'ns_1@ec2-####107.compute-1.amazonaws.com',41498822}]}]},
{local_tasks,
[[{type,xdcr},
{id,<<"949dcce68db4b6d1add4c033ec4e32a9/apro/apro">>},
{errors,
[<<"2014-11-16 19:35:03 [Vb Rep] Error replicating vbucket 201. Please see logs for details.">>]},
{changes_left,220},
{docs_checked,51951817},
{docs_written,51951817},
{active_vbreps,4},
{max_vbreps,4},
{waiting_vbreps,210},
{time_working,1040792.401734},
{time_committing,0.0},
{time_working_rate,0.9101340661254117},
{num_checkpoints,53490},
{num_failedckpts,1},
{wakeups_rate,11.007892659036528},
{worker_batches_rate,20.514709046386255},
{rate_replication,22.015785318073057},
{bandwidth_usage,880.6314127229223},
{rate_doc_checks,22.015785318073057},
{rate_doc_opt_repd,22.015785318073057},
{meta_latency_aggr,0.0},
{meta_latency_wt,0.0},
{docs_latency_aggr,1271.0828664152195},
{docs_latency_wt,20.514709046386255}],
[{type,xdcr},
{id,<<"fc72b1b0e571e9c57671d6621cac6058/apro/apro">>},
{errors,[]},
{changes_left,278},
{docs_checked,51217335},
{docs_written,51217335},
{active_vbreps,4},
{max_vbreps,4},
{waiting_vbreps,269},
{time_working,1124595.930738},
{time_committing,0.0},
{time_working_rate,1.019751359238166},
{num_checkpoints,54571},
{num_failedckpts,3},
{wakeups_rate,6.50472893793788},
{worker_batches_rate,16.51200422707308},
{rate_replication,23.01673316501096},
{bandwidth_usage,936.6809670630547},
{rate_doc_checks,23.01673316501096},
{rate_doc_opt_repd,23.01673316501096},
{meta_latency_aggr,0.0},
{meta_latency_wt,0.0},
{docs_latency_aggr,1500.9621995190503},
{docs_latency_wt,16.51200422707308}],
[{type,xdcr},
{id,<<"16b1afb33dbcbde3d075e2ff634d9cc0/apro/apro">>},
{errors,
[<<"2014-11-16 19:21:55 [Vb Rep] Error replicating vbucket 258. Please see logs for details.">>,
<<"2014-11-16 19:22:41 [Vb Rep] Error replicating vbucket 219. Please see logs for details.">>,
<<"2014-11-16 19:23:04 [Vb Rep] Error replicating vbucket 315. Please see logs for details.">>,
<<"2014-11-16 20:06:40 [Vb Rep] Error replicating vbucket 643. Please see logs for details.">>,
<<"2014-11-16 20:38:20 [Vb Rep] Error replicating vbucket 651. Please see logs for details.">>]},
{changes_left,0},
{docs_checked,56060297},
{docs_written,56060297},
{active_vbreps,0},
{max_vbreps,4},
{waiting_vbreps,0},
{time_working,140073.119377},
{time_committing,0.0},
{time_working_rate,0.04649055712180432},
{num_checkpoints,103504},
{num_failedckpts,237},
{wakeups_rate,21.524796565643623},
{worker_batches_rate,22.52594989427821},
{rate_replication,22.52594989427821},
{bandwidth_usage,913.0518357147434},
{rate_doc_checks,22.52594989427821},
{rate_doc_opt_repd,22.52594989427821},
{meta_latency_aggr,0.0},
{meta_latency_wt,0.0},
{docs_latency_aggr,13.732319632216313},
{docs_latency_wt,22.52594989427821}],
[{type,xdcr},
{id,<<"b734095ad63ea9832f9da1b1ef3449ac/apro/apro">>},
{errors,
[<<"2014-11-16 19:36:22 [Vb Rep] Error replicating vbucket 260. Please see logs for details.">>,
<<"2014-11-16 19:36:38 [Vb Rep] Error replicating vbucket 299. Please see logs for details.">>,
<<"2014-11-16 19:36:43 [Vb Rep] Error replicating vbucket 205. Please see logs for details.">>,
<<"2014-11-16 19:36:48 [Vb Rep] Error replicating vbucket 227. Please see logs for details.">>,
<<"2014-11-16 20:26:19 [Vb Rep] Error replicating vbucket 175. Please see logs for details.">>,
<<"2014-11-16 20:26:25 [Vb Rep] Error replicating vbucket 221. Please see logs for details.">>,
<<"2014-11-16 21:16:40 [Vb Rep] Error replicating vbucket 293. Please see logs for details.">>,
<<"2014-11-16 21:16:40 [Vb Rep] Error replicating vbucket 251. Please see logs for details.">>,
<<"2014-11-16 21:17:06 [Vb Rep] Error replicating vbucket 270. Please see logs for details.">>]},
{changes_left,270},
{docs_checked,50418639},
{docs_written,50418639},
{active_vbreps,4},
{max_vbreps,4},
{waiting_vbreps,261},
{time_working,1860159.788732},
{time_committing,0.0},
{time_working_rate,1.008940755729142},
{num_checkpoints,103426},
{num_failedckpts,87},
{wakeups_rate,6.50782891818858},
{worker_batches_rate,16.01927118323343},
{rate_replication,23.027702325898055},
{bandwidth_usage,933.1225464233472},
{rate_doc_checks,23.027702325898055},
{rate_doc_opt_repd,23.027702325898055},
{meta_latency_aggr,0.0},
{meta_latency_wt,0.0},
{docs_latency_aggr,1367.9901922012182},
{docs_latency_wt,16.01927118323343}],
[{type,xdcr},
{id,<<"e213600feb7ec1dfa0537173ad7f2e02/apro/apro">>},
{errors,
[<<"2014-11-16 20:16:39 [Vb Rep] Error replicating vbucket 647. Please see logs for details.">>,
<<"2014-11-16 20:17:31 [Vb Rep] Error replicating vbucket 619. Please see logs for details.">>]},
{changes_left,854},
{docs_checked,33371659},
{docs_written,33371659},
{active_vbreps,4},
{max_vbreps,4},
{waiting_vbreps,318},
{time_working,2421539.8537169998},
{time_committing,0.0},
{time_working_rate,1.7382361098734072},
{num_checkpoints,102421},
{num_failedckpts,85},
{wakeups_rate,3.0038659755104824},
{worker_batches_rate,7.009020609524459},
{rate_replication,30.539304084356573},
{bandwidth_usage,1261.6237097144026},
{rate_doc_checks,30.539304084356573},
{rate_doc_opt_repd,30.539304084356573},
{meta_latency_aggr,0.0},
{meta_latency_wt,0.0},
{docs_latency_aggr,1997.2249284829577},
{docs_latency_wt,7.009020609524459}]]},
{memory,
[{total,752400928},
{processes,375623512},
{processes_used,371957960},
{system,376777416},
{atom,594537},
{atom_used,591741},
{binary,94783616},
{code,15355960},
{ets,175831736}]},
{system_memory_data,
[{system_total_memory,64552329216},
{free_swap,0},
{total_swap,0},
{cached_memory,27011342336},
{buffered_memory,4885585920},
{free_memory,12694065152},
{total_memory,64552329216}]},
{node_storage_conf,
[{db_path,"/data/couchbase"},{index_path,"/data/couchbase"}]},
{statistics,
[{wall_clock,{552959103,4997}},
{context_switches,{8592101014,0}},
{garbage_collection,{2034857586,5985868018204,0}},
{io,{{input,270347194989},{output,799175854069}}},
{reductions,{833510054494,7038093}},
{run_queue,0},
{runtime,{553128340,5090}},
{run_queues,{0,0,0,0,0,0,0,0}}]},
{system_stats,
[{cpu_utilization_rate,2.5316455696202533},
{swap_total,0},
{swap_used,0},
{mem_total,64552329216},
{mem_free,44590993408}]},
{interesting_stats,
[{cmd_get,0.0},
{couch_docs_actual_disk_size,21729991305},
{couch_docs_data_size,11673379153},
{couch_views_actual_disk_size,0},
{couch_views_data_size,0},
{curr_items,30268090},
{curr_items_tot,60625521},
{ep_bg_fetched,0.0},
{get_hits,0.0},
{mem_used,11032659776},
{ops,116.0},
{vb_replica_curr_items,30357431}]},
{per_bucket_interesting_stats,
[{"ifa",
[{cmd_get,0.0},
{couch_docs_actual_disk_size,611617800},
{couch_docs_data_size,349385716},
{couch_views_actual_disk_size,0},
{couch_views_data_size,0},
{curr_items,1020349},
{curr_items_tot,2039753},
{ep_bg_fetched,0.0},
{get_hits,0.0},
{mem_used,307268040},
{ops,0.0},
{vb_replica_curr_items,1019404}]},
{"apro",
[{cmd_get,0.0},
{couch_docs_actual_disk_size,21118373505},
{couch_docs_data_size,11323993437},
{couch_views_actual_disk_size,0},
{couch_views_data_size,0},
{curr_items,29247741},
{curr_items_tot,58585768},
{ep_bg_fetched,0.0},
{get_hits,0.0},
{mem_used,10725391736},
{ops,116.0},
{vb_replica_curr_items,29338027}]}]},
{processes_stats,
[{<<"proc/(main)beam.smp/cpu_utilization">>,0},
{<<"proc/(main)beam.smp/major_faults">>,0},
{<<"proc/(main)beam.smp/major_faults_raw">>,0},
{<<"proc/(main)beam.smp/mem_resident">>,943411200},
{<<"proc/(main)beam.smp/mem_share">>,6901760},
{<<"proc/(main)beam.smp/mem_size">>,2951794688},
{<<"proc/(main)beam.smp/minor_faults">>,0},
{<<"proc/(main)beam.smp/minor_faults_raw">>,456714435},
{<<"proc/(main)beam.smp/page_faults">>,0},
{<<"proc/(main)beam.smp/page_faults_raw">>,456714435},
{<<"proc/beam.smp/cpu_utilization">>,0},
{<<"proc/beam.smp/major_faults">>,0},
{<<"proc/beam.smp/major_faults_raw">>,0},
{<<"proc/beam.smp/mem_resident">>,108077056},
{<<"proc/beam.smp/mem_share">>,2973696},
{<<"proc/beam.smp/mem_size">>,1113272320},
{<<"proc/beam.smp/minor_faults">>,0},
{<<"proc/beam.smp/minor_faults_raw">>,6583},
{<<"proc/beam.smp/page_faults">>,0},
{<<"proc/beam.smp/page_faults_raw">>,6583},
{<<"proc/memcached/cpu_utilization">>,0},
{<<"proc/memcached/major_faults">>,0},
{<<"proc/memcached/major_faults_raw">>,0},
{<<"proc/memcached/mem_resident">>,17016668160},
{<<"proc/memcached/mem_share">>,6885376},
{<<"proc/memcached/mem_size">>,17812746240},
{<<"proc/memcached/minor_faults">>,0},
{<<"proc/memcached/minor_faults_raw">>,4385001},
{<<"proc/memcached/page_faults">>,0},
{<<"proc/memcached/page_faults_raw">>,4385001}]},
{cluster_compatibility_version,196608},
{version,
[{lhttpc,"1.3.0"},
{os_mon,"2.2.14"},
{public_key,"0.21"},
{asn1,"2.0.4"},
{couch,"2.1.1r-432-gc2af28d"},
{kernel,"2.16.4"},
{syntax_tools,"1.6.13"},
{xmerl,"1.3.6"},
{ale,"3.0.0-1209-rel-enterprise"},
{couch_set_view,"2.1.1r-432-gc2af28d"},
{compiler,"4.9.4"},
{inets,"5.9.8"},
{mapreduce,"1.0.0"},
{couch_index_merger,"2.1.1r-432-gc2af28d"},
{ns_server,"3.0.0-1209-rel-enterprise"},
{oauth,"7d85d3ef"},
{crypto,"3.2"},
{ssl,"5.3.3"},
{sasl,"2.3.4"},
{couch_view_parser,"1.0.0"},
{mochiweb,"2.4.2"},
{stdlib,"1.19.4"}]},
{supported_compat_version,[3,0]},
{advertised_version,[3,0,0]},
{system_arch,"x86_64-unknown-linux-gnu"},
{wall_clock,552959},
{memory_data,{64552329216,51966836736,{<13661.389.0>,147853368}}},
{disk_data,
[{"/",10309828,38},
{"/dev/shm",31519692,0},
{"/mnt",154817516,1},
{"/data",1056894132,3}]},
{meminfo,
<<"MemTotal: 63039384 kB\nMemFree: 12396548 kB\nBuffers: 4771080 kB\nCached: 26378264 kB\nSwapCached: 0 kB\nActive: 31481704 kB\nInactive: 17446048 kB\nActive(anon): 17750620 kB\nInactive(anon): 2732 kB\nActive(file): 13731084 kB\nInactive(file): 17443316 kB\nUnevictable: 0 kB\nMlocked: 0 kB\nSwapTotal: 0 kB\nSwapFree: 0 kB\nDirty: 13312 kB\nWriteback: 0 kB\nAnonPages: 17753376 kB\nMapped: 14516 kB\nShmem: 148 kB\nSlab: 1297976 kB\nSReclaimable: 1219296 kB\nSUnreclaim: 78680 kB\nKernelStack: 2464 kB\nPageTables: 39308 kB\nNFS_Unstable: 0 kB\nBounce: 0 kB\nWritebackTmp: 0 kB\nCommitLimit: 31519692 kB\nCommitted_AS: 19222984 kB\nVmallocTotal: 34359738367 kB\nVmallocUsed: 114220 kB\nVmallocChunk: 34359618888 kB\nHardwareCorrupted: 0 kB\nAnonHugePages: 17432576 kB\nHugePages_Total: 0\nHugePages_Free: 0\nHugePages_Rsvd: 0\nHugePages_Surp: 0\nHugepagesize: 2048 kB\nDirectMap4k: 6144 kB\nDirectMap2M: 63993856 kB\n">>}] auto_failover001 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:53 - Sun Nov 16, 2014
Failed over 'ns_1@ec2-####108 -.compute-1.amazonaws.com': ok ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:53 - Sun Nov 16, 2014
Skipped vbucket activations and replication topology changes because not all remaining node were found to have healthy bucket "ifa": ['ns_1@ec2-####107.compute-1.amazonaws.com'] ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:53 - Sun Nov 16, 2014
Shutting down bucket "ifa" on 'ns_1@ec2-####108 -.compute-1.amazonaws.com' for deletion ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:49 - Sun Nov 16, 2014
Starting failing over 'ns_1@ec2-####108 -.compute-1.amazonaws.com' ns_rebalancer000 ns_1@ec2-####107.compute-1.amazonaws.com 21:19:48 - Sun Nov 16, 2014
Bucket "apro" loaded on node 'ns_1@ec2-####108 -.compute-1.amazonaws.com' in 0 seconds. ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:44 - Sun Nov 16, 2014
Control connection to memcached on 'ns_1@ec2-####108 -.compute-1.amazonaws.com' disconnected: {{badmatch,
{error,
timeout}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file,
"src/mc_client_binary.erl"},
{line,
151}]},
{mc_client_binary,
select_bucket,
2,
[{file,
"src/mc_client_binary.erl"},
{line,
346}]},
{ns_memcached,
ensure_bucket,
2,
[{file,
"src/ns_memcached.erl"},
{line,
1269}]},
{ns_memcached,
handle_info,
2,
[{file,
"src/ns_memcached.erl"},
{line,
744}]},
{gen_server,
handle_msg,
5,
[{file,
"gen_server.erl"},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
"src/ns_memcached.erl"},
{line,
171}]},
{gen_server,
init_it,
6,
[{file,
"gen_server.erl"},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
"proc_lib.erl"},
{line,
239}]}]} ns_memcached000 ns_1@ec2-####108 -.compute-1.amazonaws.com 21:19:44 - Sun Nov 16, 2014

 Comments   
Comment by Aleksey Kondratenko [ 18/Nov/14 ]
I cannot help without logs
Comment by Anil Kumar [ 18/Nov/14 ]
Ashwini - Thanks for reporting the issue. Please run the cbcollect_info (http://docs.couchbase.com/admin/admin/CLI/cbcollect_info_tool.html) to gather the logs and attach it to ticket so we can investigate the issue. Thanks
Comment by Anil Kumar [ 21/Nov/14 ]
Ashwini - We would need complete log file to investigate. Can you please collect log and attach to the ticket. If you're concerned about the sensitive data you can contact support@couchbase.com so we can provide you secured storage path to upload file.




[MB-12695] certain xdcr logging exposes remote cluster passwords ("Checkpointing related POST to XXX failed") Created: 17/Nov/14  Updated: 18/Nov/14  Resolved: 18/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1, 3.0.1, 3.0, 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Aleksey Kondratenko Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: security
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: No

 Description   
SUBJ


 Comments   
Comment by Aleksey Kondratenko [ 17/Nov/14 ]
http://review.couchbase.org/43342 is pending review




[MB-12694] [Windows] Document workaround to increase ops/sec in single bucket case for append heavy workloads. Created: 17/Nov/14  Updated: 17/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Venu Uppalapati Assignee: Anil Kumar
Resolution: Unresolved Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Yes

 Description   
Ref: https://docs.google.com/document/d/19l-iDEfM1EetLCoYCjkvNCloVAUfpmIX3PCBOpCx9DU/edit?usp=sharing

It has been observed that there is a drop in ops/sec with append heavy workload on Windows. The workaround is to tune the number of front end memcached threads down to 4 to match the number in 2.5.1. This issue appears to be caused by the scheduling mechanism on Windows. On Linux, with the default number of front end threads the append heavy use case ops/sec has in fact improved.

Linux: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_251-1083_fec_load&snapshot=ares_302-1518_3af_load
the





[MB-12693] Exception launchs for running cbrestore if wrong restore path provided. Created: 17/Nov/14  Updated: 18/Nov/14

Status: Open
Project: Couchbase Server
Component/s: tools
Affects Version/s: 3.0.1
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Bin Cui Assignee: Bin Cui
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Testing on GA of 3.0 (build 1209)

I took a few incremental backups and am trying to restore.

Under one backup directory I have 1 full and 2 diff directories, all created by repeatedly running this command after adding some data each time:
/opt/couchbase/bin/cbbackup -u Administrator -p password http://localhost:8091 -b test -m diff -v ./

It's not clear from the documentation what I should specify to cbrestore in order to restore the entire dataset.

If I supply the source directory as the high-level directory, it gives me an error:
[root@cb1 backup]# /opt/couchbase/bin/cbrestore -u Administrator -p password ./2014-10-28T061810Z -b test -B test2 http://localhost:8091 -v
2014-10-28 06:23:50,873: mt cbrestore...
2014-10-28 06:23:50,873: mt source : ./2014-10-28T061810Z
2014-10-28 06:23:50,873: mt sink : http://localhost:8091
2014-10-28 06:23:50,873: mt opts : {'username': '<xxx>', 'verbose': 1, 'dry_run': False, 'extra': {'max_retry': 10.0, 'rehash': 0.0, 'data_only': 0.0, 'uncompress': 0.0, 'nmv_retry': 1.0, 'conflict_resolve': 1.0, 'cbb_max_mb': 100000.0, 'mcd_compatible': 1.0, 'try_xwm': 1.0, 'backoff_cap': 10.0, 'batch_max_bytes': 400000.0, 'report_full': 2000.0, 'seqno': 0.0, 'batch_max_size': 1000.0, 'report': 5.0, 'design_doc_only': 0.0, 'recv_min_bytes': 4096.0}, 'from_date': None, 'bucket_destination': 'test2', 'add': False, 'vbucket_list': None, 'threads': 4, 'to_date': None, 'key': None, 'password': '<xxx>', 'id': None, 'bucket_source': 'test'}
Traceback (most recent call last):
  File "/opt/couchbase/lib/python/cbrestore", line 12, in <module>
    pump_transfer.exit_handler(pump_transfer.Restore().main(sys.argv))
  File "/opt/couchbase/lib/python/pump_transfer.py", line 94, in main
    rv = pumpStation.run()
  File "/opt/couchbase/lib/python/pump.py", line 108, in run
    rv, source_map, sink_map = self.check_endpoints()
  File "/opt/couchbase/lib/python/pump.py", line 156, in check_endpoints
    rv, source_map = self.source_class.check(self.opts, self.source_spec)
  File "/opt/couchbase/lib/python/pump_bfd.py", line 318, in check
    bucket_dirs = glob.glob(latest_dir + "/bucket-*")
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

If I specify either the 'full' or either of the 'diff' directories, it only restores that data.

Is the problem that all of my backups are under a single top-level directory? In addition to fixing the error above, can we also update the documentation to provide examples of the "standard" way to backup incrementally and restore the whole data set?




[MB-12692] EP engine side of changes needed for supporting XDCR LWW in Sherlock Created: 17/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: sherlock
Fix Version/s: sherlock
Security Level: Public

Type: Task Priority: Major
Reporter: Xiaomei Zhang Assignee: Xiaomei Zhang
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
The requirement doc is at https://docs.google.com/document/d/1tvyWq7PcQsCwGJROnHxxTSopObggNfIqZez-8jHnBtE/edit#heading=h.n1x6umi5p7wv

 Comments   
Comment by Chiyoung Seo [ 19/Nov/14 ]
Xiaomei,

Per our discussion, the ep-engine team needs more clarifications regarding how HLC should be generated in various edge cases.

Please update this ticket when the requirement doc is ready.




[MB-12691] Assertion failure in _hbtrie_find after deleting/re-creating KV store Created: 17/Nov/14  Updated: 17/Nov/14  Resolved: 17/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: forestdb
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Jens Alfke Assignee: Jung-Sang Ahn
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
My unit tests are crashing with an assertion failure in _hbtrie_find. I haven't come up with a reduced test case, but roughly what's happening is that I create a named KV store, write to it, close it, delete it, recreate it, then try to read from it with fdb_get … which crashes. It looks to me as though there's a bug in _hbtrie_find where it leaves data uninitialized.

(This is with the latest master branch commit of forestdb, 1e2dbfd.)

Assertion failed: (btree->ksize == trie->chunksize && btree->vsize == trie->valuelen), function _hbtrie_find, file /Couchbase/CouchbaseLite/vendor/CBForest/vendor/forestdb/src/hbtrie.cc, line 961.
(lldb) bt
* thread #1: tid = 0x9e956, 0x00007fff887ef282 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fff887ef282 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff859e34c3 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff89094b73 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fff8905cc59 libsystem_c.dylib`__assert_rtn + 321
  * frame #4: 0x0000000104f58e77 CBForest Tests`_hbtrie_find(trie=0x000000010027fd30, key=0x00007fff5fbfd8d0, keylen=24, valuebuf=0x00007fff5fbfda38, btreelist=0x0000000000000000, flag='\0') + 727 at hbtrie.cc:961
    frame #5: 0x0000000104f595d5 CBForest Tests`hbtrie_find(trie=0x000000010027fd30, rawkey=0x00007fff5fbfd940, rawkeylen=9, valuebuf=0x00007fff5fbfda38) + 149 at hbtrie.cc:1103
    frame #6: 0x0000000104f723a8 CBForest Tests`fdb_get(handle=0x0000000100418b70, doc=0x00007fff5fbfdc38) + 1128 at forestdb.cc:1870
...

Looking at the code, I think there's a bug in _hbtrie_find whenever its 'btreelist' parameter is NULL, as it is in this case:
* On line 946, _hbtrie_find points the local var 'btree' at the address of an uninitialized local struct 'btree_static'.
* On line 954, it calls btree_init_from_bid().
* btree_init_from_bid() initializes most of the fields of *btree, _except_ ksize, vsize and aux.
* Back in _hbtrie_find, line 961 tests the ksize and vsize fields of *btree, which are still uninitialized garbage, and aborts because they're not correct.

 Comments   
Comment by Jens Alfke [ 17/Nov/14 ]
Actually, I think the bug in _hbtrie_find exists whether or not the btreelist parameter is NULL. In the non-NULL case, btree points to a newly-allocated heap block (lines 942-944) so its fields are still uninitialized.

So any call to _hbtrie_find that gets past line 951 is likely to crash.
Comment by Jens Alfke [ 17/Nov/14 ]
Looking further, I'm finding trouble earlier on during the call to fdb_kvs_remove() — it looks like it ends up corrupting the in-memory trie structure.

* Call to fdb_kvs_remove() ends up in _hbtrie_remove().
* This calls _hbtrie_find, with a non-NULL btreelist [hbtrie.cc:1146]
* _hbtrie_find() allocates a new *uninitialized* btreelist_item and adds it to the btreelist [hbtrie.cc:941-944]
* _hbtrie_find() returns HBTRIE_RESULT_FAIL [hbtrie.cc:951]
* Back in _hbtrie_remove(), it jumps down to the call to _hbtrie_btree_cascaded_update() [hbtrie.cc:1191]
* _hbtrie_btree_cascaded_update() replaces trie->root_bid with the garbage value from the uninitialized list item [hbtrie.cc:888]

Looks like _hbtrie_find should be emptying the btreelist before returning an error; or else maybe it shouldn't even add an item to the list before the test of trie->root_bid on line 949. I'm not sure. This code is REALLY messy and it's very had to figure out what it should be doing.
Comment by Jens Alfke [ 17/Nov/14 ]
Oops, sorry, btree_init_from_bid() *does* initialize ksize and vsize; it just does so using a macro (_get_kvsize) that doesn't look as though it sets those variables.

I think the actual bug is the one I described in the comment just above this one — the preceding call to fdb_kvs_remove corrupts the trie, which is what's causing ksize and vsize to be initialized incorrectly later on.
Comment by Jung-Sang Ahn [ 17/Nov/14 ]
Thanks so much Jens for reporting this bug and your analysis; it is very helpful for me.

Seems like that _hbtrie_btree_cascaded_update() at hbtrie.cc:1191 should be moved to the inside if-clause (between the line 1188 and 1189).

And regarding the initialization of the 'btree' variable in _hbtrie_find(), both btree->ksize and btree->vsize are initialized by calling _get_kvsize() at [btree.cc:346], and btree->aux is initialized just after the btree_init_from_bid() at [hbtrie.cc:960]; so the 'btree' variable is correctly initialized. The reason why ksize and vsize contain garbage value is because 'trie->root_bid' is corrupted so that garbage values are read from incorrect location of the file, and they are propagated from the 'root' variable to the 'btree' variable.

I will upload a new patch addressing this issue soon.

However, I'm concerning why _hbtrie_find() called by _hbtrie_remove() returned fail; it should succeeded. Did you check it was correctly removed?

Thanks.
Comment by Jung-Sang Ahn [ 17/Nov/14 ]
Oh you already wrote new comment while I was writing the comment above.
Comment by Jung-Sang Ahn [ 17/Nov/14 ]
Hmm, I think it is reasonable that _hbtrie_find() returns fail if you tried to remove a KV store that has no document in it.
Comment by Jung-Sang Ahn [ 17/Nov/14 ]
http://review.couchbase.org/#/c/43336/
Comment by Chiyoung Seo [ 17/Nov/14 ]
The fix was merged. Please reopen this ticket if you still see the same issue.
Comment by Jens Alfke [ 17/Nov/14 ]
That fixed it! Thanks!




[MB-12690] [windows 2012] rebalance hang when add couchbase server 3.0.2-1529 to cluster 2.5.0 Created: 17/Nov/14  Updated: 21/Nov/14  Resolved: 20/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: couchbase-bucket
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Test Blocker
Reporter: Thuan Nguyen Assignee: Abhinav Dangeti
Resolution: Fixed Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: Windows server 2012 R2 64 bit

Attachments: Zip Archive 172.23.105.212-11172014-125-diag.zip     Zip Archive 172.23.105.213-11172014-126-diag.zip     Zip Archive 172.23.106.251-11172014-128-diag.zip     Zip Archive 172.23.106.251-11192014-1326-diag.zip     Zip Archive 172.23.107.48-11172014-1210-diag.zip     Zip Archive 172.23.107.48-11192014-1329-diag.zip     PNG File ss 2014-11-17 at 11.40.36 AM.png    
Triage: Untriaged
Operating System: Windows 64-bit
Is this a Regression?: Unknown

 Description   
Install cb server 2.5.0 on 2 nodes
172.23.105.212
172.23.105.213
Create default bucket. Load 1000 keys
Install cb server 3.0.2-1529 on 2 nodes
172.23.106.251
172.23.107.48
Add nodes 251 and 48 to 2.5.0 cluster
Rebalance. Rebalance hang and has error

Write Commit Failure. Disk write failed for item in Bucket "default" on node 172.23.107.48

pop up sometimes.

Live cluster available to debug.
I will add cb collectinfo soon.

 Comments   
Comment by Aleksey Kondratenko [ 17/Nov/14 ]
Write commit failures are purely storage related.

Death of mccouch makes it possible to pass such bugs to ep-engine team without looking at logs (i.e. because unlike in past "erlang bits" don't touch data files anymore and cannot corrupt them)
Comment by Thuan Nguyen [ 18/Nov/14 ]
Tried this test on another set of vms, hit the same issue. Rebalance hang due to "Write commit failures" error when it writes to 3.0.2 nodes
Comment by Thuan Nguyen [ 18/Nov/14 ]
New set vms :
2.5.0
172.23.107.90
172.23.107.91

3.0.2-1529
172.23.107.92
172.23.107.93
Comment by Thuan Nguyen [ 18/Nov/14 ]
Update on 1st cluster. Rebalance did complete after more than 5 hours rebalance with only 10k items, 512 vbuckets.
There is a lot of error "Write commit failures" in the log.
Comment by Thuan Nguyen [ 18/Nov/14 ]
Update on second cluster. Rebalance completed in 50 minutes.
There are a lot of write commit error in logs and
there is missing 30 replica keys over 10K keys at the end of upgrade.
Comment by Thuan Nguyen [ 18/Nov/14 ]
Tested on 3rd cluster with windows 2008 R2 64-bit.
Test passed without any error.
Rebalance took only 1.5 minutes
Comment by Abhinav Dangeti [ 19/Nov/14 ]
Hey Tony, can you run the test with this toy build: http://factory.couchbase.com/job/package_cs_toy_win/ws/couchbase_server--enterprise-windows-amd64-3.0.2-27.exe,
and once you do see the issue, can you get the logs from the 3.0.2 nodes?
Comment by Thuan Nguyen [ 19/Nov/14 ]
Hit this bug with toy build http://factory.couchbase.com/job/package_cs_toy_win/ws/couchbase_server--enterprise-windows-amd64-3.0.2-27.exe
I will add cbcollectinfo soon.
Comment by Thuan Nguyen [ 19/Nov/14 ]
Here are the logs from 3.0.2-27 toy build
Comment by Abhinav Dangeti [ 19/Nov/14 ]
Thanks Tony, I have a potential fix, i'm getting a toy build setup for it now, once I share the build details with you (which I will once the build is ready), please run the test again, and share your results.
Comment by Thuan Nguyen [ 19/Nov/14 ]
I will retest when new toy build is ready
Comment by Abhinav Dangeti [ 20/Nov/14 ]
Here's the toy build: http://factory.couchbase.com/job/package_cs_toy_win/ws/couchbase_server--enterprise-windows-amd64-3.0.2-28.exe
Comment by Thuan Nguyen [ 20/Nov/14 ]
Running test on new toy build http://factory.couchbase.com/job/package_cs_toy_win/ws/couchbase_server--enterprise-windows-amd64-3.0.2-28.exe
Comment by Thuan Nguyen [ 20/Nov/14 ]
Tested add 3.0.2 nodes with toy build 3.0.2-28.exe to 2.5.0 cluster (windows 2012 R2).
Rebalance finished without any error "Write to disk failure".
So this bug is fixed with this toy build.
Comment by Abhinav Dangeti [ 20/Nov/14 ]
This is the fix (in the toy build): http://review.couchbase.org/#/c/43430

What we observed in the logs is: GetLastError on Windows 2012 returned a SUCCESS even when the file wasn't found for an openFile request (in which case we usually would expect a FILE_NOT_FOUND). The fix is in couchstore, where we handle the case when GetLastError's return were SUCCESS in such a scenario.
Comment by Thuan Nguyen [ 21/Nov/14 ]
Verified on build 3.0.2-1556.
I could not reproduce this bug




[MB-12689] Panic during fetch from file datastore Created: 17/Nov/14  Updated: 18/Nov/14  Due: 18/Nov/14  Resolved: 18/Nov/14

Status: Resolved
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Gerald Sangudi Assignee: Colm Mchugh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 4h
Time Spent: Not Specified
Original Estimate: 4h

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
Hi Colm,

Your fix to FetchOne() to handle missing keys by resizing the return slice, may have introduced an array index panic. The return slice is no longer parallel to the input keys. Please take a look. Panic at file.go, line 294.

From Isha:

cbq> select * from game;
{
    "request_id": "78c1dd52-9343-42d7-8d37-b3d985a45ca7",
    "signature": {
        "*": "*"
    },
    "results": [
    ]
    "errors": [
        {
            "caller": "context:203",
            "cause": "runtime error: index out of range",
            "code": 5000,
            "key": "Internal Error",
            "message": "Panic: runtime error: index out of range"
        }
    ],
    "status": "stopped",
    "metrics": {
        "elapsedTime": "1.245762ms",
        "executionTime": "1.171602ms",
        "resultCount": 0,
        "resultSize": 0,
        "errorCount": 1
    }
}

_time="2014-11-17T09:40:09-08:00" _level="ERROR" _msg="" panic=runtime error: index out of range stack="goroutine 1618 [running]:\ngithub.com/couchbaselabs/query/execution.(*Context).Recover(0xc2091c81b0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/context.go:194 +0xd4\nruntime.panic(0x786980, 0xcdc81c)\n\t/usr/local/go/src/pkg/runtime/panic.c:248 +0x18d\ngithub.com/couchbaselabs/query/datastore/file.(*keyspace).Fetch(0xc208085bc0, 0xc20895f000, 0x17d, 0x17d, 0x0, 0x0, 0x0, 0x0, 0x0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/datastore/file/file.go:294 +0x684\ngithub.com/couchbaselabs/query/execution.(*Fetch).flushBatch(0xc20956bd00, 0xc2091c81b0, 0x0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/fetch.go:86 +0x5c7\ngithub.com/couchbaselabs/query/execution.(*Fetch).afterItems(0xc20956bd00, 0xc2091c81b0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/fetch.go:52 +0x31\ngithub.com/couchbaselabs/query/execution.func·002()\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:158 +0x34d\nsync.(*Once).Do(0xc20956bd50, 0xc209039bf0)\n\t/usr/local/go/src/pkg/sync/once.go:40 +0x9f\ngithub.com/couchbaselabs/query/execution.(*base).runConsumer(0xc20956bd00, 0xe9a640, 0xc20956bd00, 0xc2091c81b0, 0x0, 0x0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:159 +0x160\ngithub.com/couchbaselabs/query/execution.(*Fetch).RunOnce(0xc20956bd00, 0xc2091c81b0, 0x0, 0x0)\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/fetch.go:44 +0xa2\ncreated by github.com/couchbaselabs/query/execution.func·002\n\t/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:135 +0x29d\n"
goroutine 1618 [running]:
github.com/couchbaselabs/query/execution.(*Context).Recover(0xc2091c81b0)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/context.go:194 +0xd4
runtime.panic(0x786980, 0xcdc81c)
/usr/local/go/src/pkg/runtime/panic.c:248 +0x18d
github.com/couchbaselabs/query/datastore/file.(*keyspace).Fetch(0xc208085bc0, 0xc20895f000, 0x17d, 0x17d, 0x0, 0x0, 0x0, 0x0, 0x0)
/Users/isha/query2/src/github.com/couchbaselabs/query/datastore/file/file.go:294 +0x684
github.com/couchbaselabs/query/execution.(*Fetch).flushBatch(0xc20956bd00, 0xc2091c81b0, 0x0)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/fetch.go:86 +0x5c7
github.com/couchbaselabs/query/execution.(*Fetch).afterItems(0xc20956bd00, 0xc2091c81b0)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/fetch.go:52 +0x31
github.com/couchbaselabs/query/execution.func·002()
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:158 +0x34d
sync.(*Once).Do(0xc20956bd50, 0xc209039bf0)
/usr/local/go/src/pkg/sync/once.go:40 +0x9f
github.com/couchbaselabs/query/execution.(*base).runConsumer(0xc20956bd00, 0xe9a640, 0xc20956bd00, 0xc2091c81b0, 0x0, 0x0)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:159 +0x160
github.com/couchbaselabs/query/execution.(*Fetch).RunOnce(0xc20956bd00, 0xc2091c81b0, 0x0, 0x0)
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/fetch.go:44 +0xa2
created by github.com/couchbaselabs/query/execution.func·002
/Users/isha/query2/src/github.com/couchbaselabs/query/execution/base.go:135 +0x29d


 Comments   
Comment by Colm Mchugh [ 18/Nov/14 ]
Commit 9f67995ed1d7ae0358fe0ef1a477d65f2d8336e6

If necessary, filter out nils after all key/value pairs have been fetched.




[MB-12688] Update the XDCR createReplication REST API Created: 17/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.5.0, 2.5.1
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Major
Reporter: Anil Kumar Assignee: Ruth Harris
Resolution: Fixed Votes: 0
Labels: rest
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Triaged
Is this a Regression?: Unknown

 Description   
http://docs.couchbase.com/couchbase-manual-2.5/cb-rest-api/#creating-xdcr-replications

curl -v -X POST -u admin:password1 http://10.4.2.4:8091/controller/createReplication
-d uuid=9eee38236f3bf28406920213d93981a3
-d fromBucket=beer-sample
-d toCluster=remote1
-d toBucket=remote_beer
-d replicationType=continuous

Needs to be updated to include the Type

curl -v -X POST -u admin:password1
http://10.4.2.4:8091/controller/createReplication
-d uuid=9eee38236f3bf28406920213d93981a3
-d fromBucket=beer-sample
-d toCluster=remote1
-d toBucket=remote_beer
-d replicationType=continuous-d type=capi [version1=capi, version2=xmem]

 Comments   
Comment by marija jovanovic [ 17/Nov/14 ]
re-assigning to Ruth since she is documenting REST
Comment by Ruth Harris [ 19/Nov/14 ]
Fixed in 2.5 and 3.0
Added syntax: -d type=[capi | xmem] and example: -d type=capi
Note: The <codeph>type</codeph> values, capi and xmem, are represented by version1 and version2 in the web console.
        <codeph>xmem</codeph> is the default.

See 3.0: http://docs.couchbase.com/admin/admin/REST/rest-xdcr-create-replication.html
See 2.5: http://docs.couchbase.com/couchbase-manual-2.5/cb-rest-api/#creating-xdcr-replications

Note: Publishing to the Couchbase website occur within 24 hours.




[MB-12687] keys clause shows {} for non-existing keys Created: 17/Nov/14  Updated: 20/Nov/14  Resolved: 19/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Manik Taneja
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 2h
Time Spent: Not Specified
Original Estimate: 2h

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
'wrong' key doesn't exist
select task_name FROM default USE KEYS ['test_task-0', 'test_task-1', 'test_task-2', 'test_task-3', 'test_task-4', 'wrong']
{
    "request_id": "9e8b1bec-0a8a-4740-b46a-e82ddd5bd374",
    "signature": {
        "task_name": "json"
    },
    "results": [
        {
            "task_name": "test_task-0"
        },
        {
            "task_name": "test_task-3"
        },
        {
            "task_name": "test_task-4"
        },
        {
            "task_name": "test_task-2"
        },
        {
            "task_name": "test_task-1"
        },
        {}
    ],
    "state": "success",
    "metrics": {
        "elapsedTime": "4.742ms",
        "executionTime": "4.311ms",
        "resultCount": 6,
        "resultSize": 252
    }
}

i think this relates to this bug http://www.couchbase.com/issues/browse/MB-12686

 Comments   
Comment by Gerald Sangudi [ 17/Nov/14 ]
Manik,

Fetch() should skip non-existent keys in the Couchbase datastore.

Thanks.
Comment by Manik Taneja [ 19/Nov/14 ]
commit 7b82b57c769c60a6c83fbaa54718d331f0a61ff3
Author: manik <manik@couchbase.com>
Date: Wed Nov 19 17:37:08 2014 +0530

    MB-MB-12687 MB-12304 Return the actual number of keys found
Comment by Iryna Mironava [ 20/Nov/14 ]
verified




[MB-12686] left joins show extra items Created: 17/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
I have 4 items:
{
            "name": "employee-1",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
{
            "name": "employee-2",
            "tasks_ids": [
                "test_task-2",
                "test_task-3"
            ]
        }
(id is test_task-1){
            "project": "CB",
            "task_name": "test_task-1",
       },
(id is test_task-2){
            "project": "MB",
            "task_name": "test_task-2",
        }

run query
SELECT employee.name, employee.tasks_ids, new_task.project, new_task.task_name FROM b0 as employee LEFT JOIN b0 as new_task ON KEYS employee.tasks_ids
I would expect to see 6 items in result

but now there are 8 (i see duplicated items, seems like item+last key item is shown twice) :
"results": [
        {},
        {},
        {
            "name": "employee-1",
            "project": "CB",
            "task_name": "test_task-1",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
        {
            "name": "employee-1",
            "project": "MB",
            "task_name": "test_task-2",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
        {
            "name": "employee-1",
            "project": "MB",
            "task_name": "test_task-2",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
        {
            "name": "employee-2",
            "project": "MB",
            "task_name": "test_task-2",
            "tasks_ids": [
                "test_task-2",
                "test_task-3"
            ]
        },
        {
            "name": "employee-2",
            "tasks_ids": [
                "test_task-2",
                "test_task-3"
            ]
        },
        {
            "name": "employee-2",
            "tasks_ids": [
                "test_task-2",
                "test_task-3"
            ]
        }
    ]

 Comments   
Comment by Gerald Sangudi [ 17/Nov/14 ]
Iryna,

This is the correct behavior for LEFT JOIN. Every left-hand object is included at least once, even if it doesn't contain task_ids. If you use 2 separate buckets, you will get 6 results. But if you use 1 bucket, every item in b0 will appear at least once as an employee.
Comment by Iryna Mironava [ 17/Nov/14 ]
Hi Gerald,
still need an explanation
{},
{},
{
            "name": "employee-1",
            "project": "CB",
            "task_name": "test_task-1",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
        {
            "name": "employee-1",
            "project": "MB",
            "task_name": "test_task-2",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
        {
            "name": "employee-1",
            "project": "MB",
            "task_name": "test_task-2",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
I have an item A { "name": "employee-1", "tasks_ids": ["test_task-1", "test_task-2" ]}
and have items B(key test_task-2): {"project": "MB", "task_name": "test_task-2"} and C(key test_task-1):{"project": "CB", "task_name": "test_task-1"}
so I see result B+nothing, which is {}, C+nothing, A+B, A+C (4 items, i expect to see them)
but i also see a duplicate for A+B, not sure why.
Comment by Gerald Sangudi [ 17/Nov/14 ]
Try using 2 buckets.
Comment by Iryna Mironava [ 18/Nov/14 ]
cbq> SELECT employee.name, employee.tasks_ids, new_task.project, new_task.project, new_task.task_name FROM b0 as employee LEFT JOIN b1 as new_task ON KEYS employee.tasks_ids;
{
    "request_id": "4de0e684-0066-4479-bf4d-f4c22e48af97",
    "signature": {
        "name": "json",
        "project": "json",
        "task_name": "json",
        "tasks_ids": "json"
    },
    "results": [
        {
            "name": "employee-1",
            "project": "CB",
            "task_name": "test_task-1",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
        {
            "name": "employee-1",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        },
        {
            "name": "employee-1",
            "tasks_ids": [
                "test_task-1",
                "test_task-2"
            ]
        }
    ],
    "state": "success",
    "metrics": {
        "elapsedTime": "21.675ms",
        "executionTime": "21.251ms",
        "resultCount": 3,
        "resultSize": 510
    }
}

cbq>
tried with 2 buckets with 1 item each, i have 3 results, one duplicated. I expect only 2 to be displayed
Comment by Gerald Sangudi [ 18/Nov/14 ]
Iryna, that is the meaning of outer join, no? Inner join will give you 2 results.
Comment by Iryna Mironava [ 18/Nov/14 ]
i have 1 item :with t1 and t2 as keys to join - {'name': 'some_name', 'tasks_ids': ['t1','t2']}
another bucket has only one item of them - key t1 {'project' : 'MB'}
so if i do inner join i would expect 1 item, because current bucket has one item. this bucket has 2 join keys, one of them is not existing: {'name': 'some_name', 'tasks_ids': ['t1','t2'], 'project' : 'MB'}
if i do left outer join one joined object is produced for each left hand source object. I expect to see 2 - one of them will have missing fields: {'name': 'some_name', 'tasks_ids': ['t1','t2'], 'project' : 'MB'} ,{'name': 'some_name', 'tasks_ids': ['t1','t2']}
am i understanding it right?
Comment by Iryna Mironava [ 19/Nov/14 ]
opened another bug http://www.couchbase.com/issues/browse/MB-12714




[MB-12685] joins run with error invalid memory address or nil pointer dereference Created: 17/Nov/14  Updated: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4, sherlock
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Iryna Mironava
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
[test]
./testrunner -i /tmp/tuqvm.ini doc-per-day=6,standard_buckets=1,skip_build_tuq=True,cbq_version=sherlock,get-cbcollect-info=False -t tuqquery.tuq_join.JoinTests.test_simple_join_keys,type_join=LEFT,GROUP=SIMPLE;P0


query: SELECT employee.name, employee.tasks_ids, new_project FROM default as employee LEFT JOIN default.project as new_project ON KEYS employee.tasks_ids

some tasks_ids keys are not in the bucket

some results are shown, but also i see an error
{
    "request_id": "5c676634-646f-4e43-8d75-c6a8fe86bc4d",
    "signature": {
        "name": "json",
        "new_project": "json",
        "tasks_ids": "json"
    },
    "results": [
        {
            "name": "employee-18",
            "new_project": "MB",
            "tasks_ids": [
                "test_task-18",
                "test_task-19"
            ]
        },
        {
            "name": "employee-18",
            "new_project": "MB",
            "tasks_ids": [
                "test_task-18",
                "test_task-19"
            ]
        },
...<some_more_items>


"errors": [
        {
            "caller": "context:203",
            "cause": "runtime error: invalid memory address or nil pointer dereference",
            "code": 5000,
            "key": "Internal Error",
            "message": "Panic: runtime error: invalid memory address or nil pointer dereference"
        }
    ],
    "state": "stopped",
    "metrics": {
        "elapsedTime": "319.122271ms",
        "executionTime": "319.006612ms",
        "resultCount": 50,
        "resultSize": 9126,
        "errorCount": 1
    }

 Comments   
Comment by Gerald Sangudi [ 17/Nov/14 ]
Hi Iryna,

Please retest this with the latest checkins.

Thanks.
Comment by Gerald Sangudi [ 19/Nov/14 ]
Please retest with latest code.

Thanks.




[MB-12684] cbq-engine crashes if try to query bucket with dropped primary index Created: 17/Nov/14  Updated: 20/Nov/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4, sherlock
Fix Version/s: cbq-DP4
Security Level: Public

Type: Bug Priority: Critical
Reporter: Iryna Mironava Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 4h
Time Spent: Not Specified
Original Estimate: 4h

Attachments: File drop_index    
Triage: Untriaged
Is this a Regression?: Unknown

 Description   
1)created primary index
2) dropped primary index, query succeed
3) run any select --> cbq-engine crashes

log is attached

 Comments   
Comment by Gerald Sangudi [ 19/Nov/14 ]
Hi Iryna,

Could you please retest this on the latest code and attach the new stack trace?

Thanks.
Comment by Iryna Mironava [ 20/Nov/14 ]
_time="2014-11-20T20:27:33+04:00" _level="WARN" _msg="Error loading indexes for keyspace default, Error Error loading 2i indexes - cause: Post http://localhost:9101/list: dial tcp 127.0.0.1:9101: connection refused"
_time="2014-11-20T20:27:45+04:00" _level="INFO" _msg="Number of indexes 1"
_time="2014-11-20T20:27:45+04:00" _level="ERROR" _msg="" panic=runtime error: invalid memory address or nil pointer dereference stack="goroutine 318 [running]:\ngithub.com/couchbaselabs/query/execution.(*Context).Recover(0xc20076e7e0)\n\t/root/tuq/gocode/src/github.com/couchbaselabs/query/execution/context.go:194 +0xc9\ngithub.com/couchbaselabs/query/execution.(*PrimaryScan).scanEntries(0xc20075ef00, 0xc20076e7e0, 0xc200259e40)\n\t/root/tuq/gocode/src/github.com/couchbaselabs/query/execution/scan_primary.go:84 +0xa4\ncreated by github.com/couchbaselabs/query/execution.(*PrimaryScan).scanPrimary\n\t/root/tuq/gocode/src/github.com/couchbaselabs/query/execution/scan_primary.go:57 +0x18a\n"
goroutine 318 [running]:
github.com/couchbaselabs/query/execution.(*Context).Recover(0xc20076e7e0)
/root/tuq/gocode/src/github.com/couchbaselabs/query/execution/context.go:194 +0xc9
github.com/couchbaselabs/query/execution.(*PrimaryScan).scanEntries(0xc20075ef00, 0xc20076e7e0, 0xc200259e40)
/root/tuq/gocode/src/github.com/couchbaselabs/query/execution/scan_primary.go:84 +0xa4
created by github.com/couchbaselabs/query/execution.(*PrimaryScan).scanPrimary
/root/tuq/gocode/src/github.com/couchbaselabs/query/execution/scan_primary.go:57 +0x18a
_time="2014-11-20T20:28:01+04:00" _level="INFO" _msg="Refreshing pool default"




[MB-12683] Swap space should not be under "Couchbase in the cloud" Created: 17/Nov/14  Updated: 17/Nov/14  Resolved: 17/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0
Fix Version/s: 3.0.2
Security Level: Public

Type: Improvement Priority: Major
Reporter: Patrick Varley Assignee: marija jovanovic
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: http://docs.couchbase.com/admin/admin/Concepts/bp-cloudDeployment.html


 Description   
The swap space section should be under deployment considerations as it applies to all Linux distributions.

 Comments   
Comment by marija jovanovic [ 17/Nov/14 ]
Moved the section Swap Space from "CB in the cloud" to "Deployment considerations" and fixed the links.
Comment by marija jovanovic [ 17/Nov/14 ]
Closed the issues since reorganization was done as requested




[MB-12682] Admonition: Python version 2.6 or greater required for command line utilities Created: 17/Nov/14  Updated: 17/Nov/14

Status: Open
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Improvement Priority: Major
Reporter: Brian Shumate Assignee: Amy Kurtzman
Resolution: Unresolved Votes: 0
Labels: sdk
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
It should be noted in the command line utilities documentation that they require Python version 2.6 or greater be installed on the system they are being excuted on in order to function properly.

Customers and others often encounter inexplicable errors when executing the tools using older Python versions.

Perhaps a note in http://docs.couchbase.com/admin/admin/CLI/cli-overview.html would suffice for this.





[MB-12681] Define datastore CAS API Created: 17/Nov/14  Updated: 17/Nov/14  Due: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Task Priority: Major
Reporter: Gerald Sangudi Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-12680] Define datastore auth API Created: 17/Nov/14  Updated: 17/Nov/14  Due: 19/Nov/14

Status: Open
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Task Priority: Major
Reporter: Gerald Sangudi Assignee: Gerald Sangudi
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified





[MB-12679] Admin guide, typo if you node / if your node Created: 17/Nov/14  Updated: 19/Nov/14  Resolved: 19/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: documentation
Affects Version/s: 2.0, 2.5.0, 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Trivial
Reporter: Ian McCloy Assignee: Amy Kurtzman
Resolution: Fixed Votes: 0
Labels: admin
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
http://docs.couchbase.com/admin/admin/Tasks/tasks-manage-replication.html
http://docs.couchbase.com/couchbase-manual-2.1/#specifying-backoff-for-replication
http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#specifying-backoff-for-replication

"If you node experiences a heavy load ..."

Please change to "If your node experiences a heavy load ..."


 Comments   
Comment by Ian McCloy [ 18/Nov/14 ]
https://github.com/couchbaselabs/docs-ng/pull/171
Comment by Amy Kurtzman [ 18/Nov/14 ]
Pull request returned because it was against the wrong branch. It needs to be submitted against stage branch, not master branch.
Comment by Amy Kurtzman [ 18/Nov/14 ]
Fixed in the 3.0 version of the doc.
Comment by Ian McCloy [ 19/Nov/14 ]
resubmitted pull request on stage branch
Comment by Amy Kurtzman [ 19/Nov/14 ]
Pull request accepted and merged for publication.




[MB-12678] Fix output of explain Created: 17/Nov/14  Updated: 18/Nov/14

Status: In Progress
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4
Fix Version/s: cbq-DP4
Security Level: Public

Type: Task Priority: Major
Reporter: Manik Taneja Assignee: Manik Taneja
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 16h
Time Spent: Not Specified
Original Estimate: 16h


 Description   
Explain output resemble the following output

==explain==

--sample query--

select b1.f1, sum(b2.f2) as fa2, avg(b2.f3) as fa3
from b1 join b2 as ba2 on keys b1.b2_id
where b1.f0 > 5
group by b1.f1
order by b1.f1

--logical plan--

sequence
scan
parallel
fetch
join
filter
initialGroup
intermediateGroup
finalGroup
initialProjection
order
finalProjection

--results of EXPLAIN--

results: [
{
"operator" : "Sequence",
"children" : [
{
"operator" : "PrimaryScan",
"index" : "#primary"
},
{
"operator" : "Parallel",
"child" : {
"operator" : "Sequence",
"children" : [
{
"operator" : "Fetch",
"keyspace" : "b1"
},
{
"operator" : "Join",
"join_type" : "inner",
"keyspace" : "b2",
"join_keys" : "b1.b2_id"
},
{
"operator" : "Filter",
"condition" : "(b1.f0 > 5)"
},
{
"operator" : "InitialGroup",
"group_keys" : [ "b1.f1" ],
"aggregates" : [ "sum(b2.f2)", "avg(b2.f3)" ]
},
{
"operator" : "IntermediateGroup",
"group_keys" : [ "b1.f1" ],
"aggregates" : [ "sum(b2.f2)", "avg(b2.f3)" ]
}
]
},
{
"operator" : "FinalGroup",
"group_keys" : [ "b1.f1" ],
"aggregates" : [ "sum(b2.f2)", "avg(b2.f3)" ]
},
{
"operator" : "InitialProjection",
"distinct" : false,
"raw" : false,
"result_terms" : [
{
"expr" : "b1.f1",
"star" : false,
"as" : ""
},
{
"expr" : "sum(b2.f2)",
"star" : false,
"as" : "fa2"
},
{
"expr" : "avg(b2.f3)",
"star" : false,
"as" : "fa3"
}
]
},
{
"operator" : "Order",
"sort_terms" : [
{
"expr" : "b1.f1",
"desc" : false
}
]
},
{
"operator" : "FinalProjection"
}
]
}
]




[MB-12677] cbcollect_info should collect /etc/hosts, /etc/resolv.conf and /etc/nsswitch.conf Created: 17/Nov/14  Updated: 17/Nov/14

Status: Open
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 2.5.1, 3.0.1
Fix Version/s: sherlock
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Patrick Varley Assignee: Patrick Varley
Resolution: Unresolved Votes: 0
Labels: supportability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
There has been a few cases relating to issues with hostname lookup.
It would be good if cbcollect_info collected the following files:

/etc/hosts
/etc/resolv.conf
/etc/nsswitch.conf

 Comments   
Comment by Aleksey Kondratenko [ 17/Nov/14 ]
you probably want us to collect /etc/nsswitch.conf and not /etc/nsserver.conf right ?
Comment by Patrick Varley [ 17/Nov/14 ]
Yes, I got ns_server on the brain, I have updated the title and description.
Comment by Aleksey Kondratenko [ 17/Nov/14 ]
BTW patch for that should not be hard to produce isnt ? :)

I'll even be able to accept it for 3.0.2.
Comment by Patrick Varley [ 17/Nov/14 ]
I will throw the patch together, just need to see if there is a way to get the hosts file on Windows (we have all the other details already for Windows).
Comment by Patrick Varley [ 17/Nov/14 ]
Patch submitted: http://review.couchbase.org/#/c/43330/
Comment by Patrick Varley [ 17/Nov/14 ]
I have verified it on Linux and OS X.
I will verify Windows tomorrow.




[MB-12676] Views timeout after failover Created: 16/Nov/14  Updated: 17/Nov/14

Status: Open
Project: Couchbase Server
Component/s: ns_server, view-engine
Affects Version/s: sherlock
Fix Version/s: sherlock
Security Level: Public

Type: Bug Priority: Blocker
Reporter: Parag Agarwal Assignee: Sriram Melkote
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.5.0-256, centos 6x

1:10.3.5.115
2:10.3.5.116
3:10.3.5.117
4:10.3.5.118
5:10.6.2.185
6:10.6.2.186
7:10.5.3.5


Triage: Untriaged
Operating System: Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump: https://s3.amazonaws.com/bugdb/jira/MB-12676/10.3.5.115-11162014-1512-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12676/10.3.5.116-11162014-1514-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12676/10.3.5.117-11162014-1515-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12676/10.3.5.118-11162014-1516-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12676/10.5.3.5-11162014-1520-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12676/10.6.2.185-11162014-1517-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12676/10.6.2.186-11162014-1519-diag.zip
Is this a Regression?: Yes

 Description   
Test Case::

./testrunner -i centos_x64--01_01--failover_upr.ini -t failover.failovertests.FailoverTests.test_failover_stop_server,replicas=1,graceful=False,num_failed_nodes=1,numViews=5,withViewsOps=True,createIndexesDuringFailover=True,items=100000,active_resident_threshold=70,dgm_run=True,failoverMaster=True,skip_cleanup=True,GROUP=P0

1. Create 7 node cluster
2. Create defaul bucket with 100K items
3. Create 5 views
4. Stop 1 node and hard failover
5. Run queries to create indexes in parallel to step 4

Step 5 fails with timeout. Expected results not returned.

2014-11-16 14:56:35 | INFO | MainProcess | Cluster_Thread | [task.check] Server: 10.3.5.116, Design Doc: dev_ddoc1, View: default_view2, (100000 rows) expected, (83994 rows) returned
ERROR
[('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ([('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('./testrunner.py', 262, 'run', '**self._Thread__kwargs)'), ('/usr/lib/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/usr/lib/python2.7/unittest/case.py', 391, '__call__', 'return self.run(*args, **kwds)'), ('/usr/lib/python2.7/unittest/case.py', 327, 'run', 'testMethod()'), ('pytests/failover/failovertests.py', 25, 'test_failover_stop_server', "self.common_test_body('stop_server')"), ('pytests/failover/failovertests.py', 100, 'common_test_body', 'self.run_failover_operations_with_ops(self.chosen, failover_reason)'), ('pytests/failover/failovertests.py', 408, 'run_failover_operations_with_ops', 'self.query_and_monitor_view_tasks(nodes)'), ('pytests/failover/failovertests.py', 538, 'query_and_monitor_view_tasks', 'self.verify_query_task()'), ('pytests/failover/failovertests.py', 562, 'verify_query_task', 'self.perform_verify_queries(num_views, prefix, ddoc_name, query, bucket=bucket, wait_time=timeout, expected_rows=expected_rows)'), ('pytests/basetestcase.py', 778, 'perform_verify_queries', 'task.result(wait_time)'), ('lib/tasks/future.py', 160, 'result', 'return self.__get_result()'), ('lib/tasks/future.py', 111, '__get_result', 'print traceback.extract_stack()')]
Error occured querying view default_view2: {u'reason': u'lexical error: invalid char in json text.\n', u'from': u'http://10.6.2.185:8092/_view_merge/?stale=false&#39;}

Seen timeouts with graceful failover as well. Following tests seen failing

test_failover_firewall,replicas=1,graceful=False,num_failed_nodes=1,items=100000,active_resident_threshold=70,dgm_run=True,doc_ops=update,withMutationOps=true,withQueries=True,numViews=5,withViewsOps=True,GROUP=P0
test_failover_normal,replicas=1,graceful=False,num_failed_nodes=1,items=100000,active_resident_threshold=70,dgm_run=True,withQueries=True,numViews=5,withViewsOps=True,GROUP=P0
test_failover_stop_server,replicas=1,graceful=False,num_failed_nodes=1,numViews=5,withViewsOps=True,createIndexesDuringFailover=True,items=100000,active_resident_threshold=70,dgm_run=True,failoverMaster=True,GROUP=P0
test_failover_stop_server,replicas=1,graceful=False,num_failed_nodes=1,numViews=5,withViewsOps=True,createIndexesDuringFailover=True,items=100000,active_resident_threshold=70,dgm_run=True,failoverMaster=True,GROUP=P0
test_failover_stop_server,replicas=1,graceful=False,num_failed_nodes=1,items=100000,active_resident_threshold=70,dgm_run=True,withQueries=True,numViews=5,withViewsOps=True,max_verify=10000,GROUP=P0
test_failover_then_add_back,replicas=1,num_failed_nodes=1,items=100000,numViews=5,withViewsOps=True,createIndexesDuringFailover=True,sasl_buckets=1,upr_check=False,recoveryType=full,graceful=True,GROUP=P0;GRACEFUL
test_failover_then_add_back,replicas=1,num_failed_nodes=1,items=100000,numViews=5,withViewsOps=True,createIndexesDuringFailover=True,sasl_buckets=1,upr_check=False,recoveryType=delta,graceful=True,GROUP=P0;GRACEFUL
test_failover_then_add_back,replicas=1,num_failed_nodes=1,items=100000,numViews=5,compact=True,withViewsOps=True,createIndexesDuringFailover=True,sasl_buckets=1,upr_check=False,recoveryType=delta,graceful=True,GROUP=P1;GRACEFUL

 Comments   
Comment by Aleksey Kondratenko [ 17/Nov/14 ]
I don't see anything suspicious in logs except this error:

[couchdb:error,2014-11-16T13:15:21.399,couchdb_ns_1@127.0.0.1:<0.21006.0>:couch_log:error:44]Set view `default`, main (prod) group `_design/dev_ddoc1`, DCP process <0.21014.0> died with unexpected reason: {{case_clause,
                                                                                                                  {{error,
                                                                                                                    vbucket_stream_not_found},
                                                                                                                   {bufsocket,
                                                                                                                    #Port<0.17642>,
                                                                                                                    <<>>}}},
                                                                                                                 [{couch_dcp_client,
                                                                                                                   init,
                                                                                                                   1,
                                                                                                                   [{file,
                                                                                                                     "/home/buildbot/jenkins/workspace/sherlock-testing/couchdb/src/couch_dcp/src/couch_dcp_client.erl"},
                                                                                                                    {line,
                                                                                                                     305}]},
                                                                                                                  {couch_dcp_client,
                                                                                                                   restart_worker,
                                                                                                                   1,
                                                                                                                   [{file,
                                                                                                                     "/home/buildbot/jenkins/workspace/sherlock-testing/couchdb/src/couch_dcp/src/couch_dcp_client.erl"},
                                                                                                                    {line,
                                                                                                                     1433}]},
                                                                                                                  {gen_server,
                                                                                                                   handle_msg,
                                                                                                                   5,
                                                                                                                   [{file,
                                                                                                                     "gen_server.erl"},
                                                                                                                    {line,
                                                                                                                     604}]},
                                                                                                                  {proc_lib,
                                                                                                                   init_p_do_apply,
                                                                                                                   3,
                                                                                                                   [{file,
                                                                                                                     "proc_lib.erl"},
                                                                                                                    {line,
                                                                                                                     239}]}]}


Whether it's cause of timeouts or not I cannot say. But passing to view engine because it's likely a place where something gets stuck.




[MB-12675] add support for btoa() function in view processing Created: 16/Nov/14  Updated: 18/Nov/14

Status: Open
Project: Couchbase Server
Component/s: view-engine
Affects Version/s: 3.0.1
Fix Version/s: sherlock
Security Level: Public

Type: Improvement Priority: Minor
Reporter: Matt Ingenthron Assignee: Nimish Gupta
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
When trying to do some data analysis on ascii data, I found that Couchbase doesn't have the btoa() function common in browsers. I can work around it, but it'd be nice if this were easily included.

See details on how this works here:
http://stackoverflow.com/questions/2820249/base64-encoding-and-decoding-in-client-side-javascript




[MB-12674] password change for bucket doesn't take effect Created: 16/Nov/14  Updated: 16/Nov/14  Resolved: 16/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: memcached
Affects Version/s: 3.0.1
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Matt Ingenthron Assignee: Trond Norbye
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Yes

 Description   
Create a bucket without a password. Then, through the UI, edit the bucket to add a password.

Expected behavior: bucket requires password

Observed behavior: bucket still only works with an empty password

I believe this worked fine in 2.x.

Filing against memcached though I know it's really bucket engine, but I don't see a component for that.

 Comments   
Comment by Matt Ingenthron [ 16/Nov/14 ]
My fault. Error in how I had the client configured.




[MB-12673] [system-tests]items count mismatch uni-directional XDCR Created: 16/Nov/14  Updated: 18/Nov/14

Status: Open
Project: Couchbase Server
Component/s: couchbase-bucket, ns_server
Affects Version/s: 3.0.2
Fix Version/s: 3.0.2
Security Level: Public

Type: Bug Priority: Critical
Reporter: Andrei Baranouski Assignee: Andrei Baranouski
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 3.0.2-1520

Attachments: PNG File uni.png    
Triage: Untriaged
Operating System: Centos 64-bit
Is this a Regression?: Unknown

 Description   
bidirection replication for 4 buckets on source 172.23.105.156 and destination 172.23.105.160

AbRegNums
MsgsCalls
RevAB
UserInfo

data load more then 2 days
all this time has been done a large number of steps with different scenarios.
More detailed steps can be found here:
https://github.com/couchbaselabs/couchbase-qe-docs/blob/master/system-tests/viber/build_3.0.2-1520/report.txt

the problem is that I can not say at what stage of discrepancy of data occurred because I check data match only when I stopped data load( last step)

result
source:
AbRegNums 1607045
MsgsCalls 33301
RevAB 35716338
UserInfo 292190

destination:
AbRegNums 1607045
MsgsCalls 33300
RevAB 35716351
UserInfo 292190

diff <(curl http://172.23.105.156:8092/MsgsCalls/_design/docs/_view/docs?inclusive_end=true&stale=false&connection_timeout=60000&skip=0) <(curl http://172.23.105.160:8092/MsgsCalls/_design/docs/_view/docs?inclusive_end=true&stale=update_after&connection_timeout=60000&skip=0)
   % %To taTl o t % aRecleiv ed % X fer d %Ave ragRe Sepeecd e Tiimev e Tidme % T imeX Cfurreentr
 d A ve r a g e S Dploaed eUpldoad Tot al T Sipenmt e L eft S pee d
i 0m e 0 0 T0 i 0m e 0 C 0 u r 0r --e:--n:--t -
-:- -:- - - -:- -:- - 0 Dload Upload Total Spent Left Speed
100 2664k 0 2664k 0 0 680k 0 --:--:-- 0:00:03 --:--:-- 680k
100 2664k 0 2664k 0 0 321k 0 --:--:-- 0:00:08 --:--:-- 588k
1c1
< {"total_rows":33301,"rows":[
---
> {"total_rows":33300,"rows":[
33244d33243
< {"id":"MSG_owmiixgxuqiptrwjjhzgorkfsvrxcgwsrlmrtxkp_myiunhwwynfqjobdtwffjwoic","key":null,"value":null},

so, "MSG_owmiixgxuqiptrwjjhzgorkfsvrxcgwsrlmrtxkp_myiunhwwynfqjobdtwffjwoic" exists on src, doesn't on dest

just in case, leave the cluster alive for investigation for a few days


 Comments   
Comment by Andrei Baranouski [ 16/Nov/14 ]
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.156-11162014-214-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.157-11162014-235-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.158-11162014-224-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.160-11162014-37-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.206-11162014-33-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.207-11162014-310-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.22-11162014-254-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-12673/fc3ae2d4/172.23.105.159-11162014-245-diag.zip

Comment by Mike Wiederhold [ 17/Nov/14 ]
The expiry pager is running on the destination cluster. This needs to be turned off.

Mikes-MacBook-Pro:ep-engine mikewied$ management/cbstats 172.23.105.207:11210 -b MsgsCalls all | grep exp
 ep_exp_pager_stime: 3600
 ep_expired_access: 0
 ep_expired_pager: 19655
 ep_item_flush_expired: 0
 ep_num_expiry_pager_runs: 53
 vb_active_expired: 19547
 vb_pending_expired: 0
 vb_replica_expired: 108
Comment by Andrei Baranouski [ 18/Nov/14 ]
Sorry Mike, it's not clear to me

I didn't run any expiry pagers in the tests. why I need to turn off something? I used default settings for the clusters
when you say "The expiry pager is running on the destination cluster", does it mean that it should be completed and then items should be matched? but it does not occur

[root@centos-64-x64 bin]# ./cbstats 172.23.105.207:11210 -b MsgsCalls all | grep exp
 ep_exp_pager_stime: 3600
 ep_expired_access: 0
 ep_expired_pager: 19655
 ep_item_flush_expired: 0
 ep_num_expiry_pager_runs: 105
 vb_active_expired: 19547
 vb_pending_expired: 0
 vb_replica_expired: 108
[root@centos-64-x64 bin]# ./cbstats 172.23.105.156:11210 -b MsgsCalls all | grep exp
 ep_exp_pager_stime: 3600
 ep_expired_access: 0
 ep_expired_pager: 8
 ep_item_flush_expired: 0
 ep_num_expiry_pager_runs: 135
 vb_active_expired: 8
 vb_pending_expired: 0
 vb_replica_expired: 0
Comment by Andrei Baranouski [ 18/Nov/14 ]
diff <(curl http://172.23.105.156:8092/MsgsCalls/_design/docs/_view/docs?inclusive_end=true&stale=false&connection_timeout=60000&skip=0) <(curl http://172.23.105.160:8092/MsgsCalls/_design/docs/_view/docs?inclusive_end=true&stale=update_after&connection_timeout=60000&skip=0)
   % %To taTl o t % aRecleiv ed % X fer d %Ave ragRe Sepeecd e Tiimev e Tidme % T imeX Cfurreentr
 d A ve r a g e S Dploaed eUpldoad Tot al T Sipenmt e L eft S pee d
i 0m e 0 0 T0 i 0m e 0 C 0 u r 0r --e:--n:--t -
-:- -:- - - -:- -:- - 0 Dload Upload Total Spent Left Speed
100 2664k 0 2664k 0 0 680k 0 --:--:-- 0:00:03 --:--:-- 680k
100 2664k 0 2664k 0 0 321k 0 --:--:-- 0:00:08 --:--:-- 588k
1c1
< {"total_rows":33301,"rows":[
---
> {"total_rows":33300,"rows":[
33244d33243
< {"id":"MSG_owmiixgxuqiptrwjjhzgorkfsvrxcgwsrlmrtxkp_myiunhwwynfqjobdtwffjwoic","key":null,"value":null},

key "MSG_owmiixgxuqiptrwjjhzgorkfsvrxcgwsrlmrtxkp_myiunhwwynfqjobdtwffjwoic" doesn't exist on dest, exists on source
Comment by Mike Wiederhold [ 18/Nov/14 ]
Andre,

The stat ep_expired_pager_runs shows that the expiry pager is running. You should not have this running on the destination cluster otherwise items might be deleted. This will cause the rev sequence number to be increased and can result in items not being replicated to the destination. This is a known issue so you need to re-run the test and make sure that the expiry pager is not running. You can turn off the expiry pager by running the command below on each node.

cbepctl host:port set flush_param exp_pager_stime 0
Comment by Andrei Baranouski [ 18/Nov/14 ]
Thanks Mike,

could you point the ticket for "a known issue"
so, it should be run only on all nodes in destination cluster?

BTW, how we proceed to test bi-XDCR replication, I believe that there may also be a problem?
Comment by Mike Wiederhold [ 18/Nov/14 ]
Yes, for unidirectional you need to disable the expiry pager on the destination nodes. You can leave it on in the source cluster. Also, I don't know of a ticket that specifically relates to this issue, but I discussed it with support and it is known. If I can find something I'll post it here.

The problem is that if the destination cluster has any traffic (in this case expiry counts as traffic) then the rev sequence number will be increased. This can cause the destination node to win conflict resolution and as a result would mean an item from the source would not end up getting to the destination node. At some point this issue would work itself out, but only after the item expired on both sides.

For bi-directional this wouldn't be an issue because the destination will replicate back the source. In the case of this ticket the destination rev id is 74 and the source is 73. So when the destination replicates this item back it will win the conflict resolution.
Comment by Andrei Baranouski [ 18/Nov/14 ]
Thanks for the update!




[MB-12672] subqueries show 'Duplicate subquery alias' error if main query is for same bucket Created: 16/Nov/14  Updated: 17/Nov/14  Resolved: 16/Nov/14

Status: Closed
Project: Couchbase Server
Component/s: query
Affects Version/s: cbq-DP4, sherlock
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Iryna Mironava Assignee: Iryna Mironava
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Triage: Untriaged
Is this a Regression?: Unknown

 Description   
example of queries:
select task_name, (select count(task_name) cn from default use keys ['test_task-0', 'test_task-1']) as names from default

select name, join_day from default where join_day = (select AVG(join_day) as average from default use keys ['query-test-Engineer-2010-1-1-0', 'query-test-Engineer-2010-1-2-0'])[0].average;

error
 "errors": [
        {
            "caller": "server:135",
            "cause": "Duplicate subquery alias default.",
            "code": 5000,
            "key": "Internal Error",
            "message": ""
        }
    ],

 Comments   
Comment by Gerald Sangudi [ 16/Nov/14 ]
Hi Iryna,

This is by design. Try:

example of queries:
select task_name, (select count(task_name) cn from default d use keys ['test_task-0', 'test_task-1']) as names from default

select name, join_day from default d1 where join_day = (select AVG(join_day) as average from default d2 use keys ['query-test-Engineer-2010-1-1-0', 'query-test-Engineer-2010-1-2-0'])[0].average;





Generated at Sat Nov 22 20:19:41 CST 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.