[CBES-11] Unable to parse decoded base64 data as JSON Created: 21/Jan/13  Updated: 29/Jan/13

Status: Open
Project: Couchbase Server Elastic Search Integration
Component/s: None
Affects Version/s: None
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Tommie McAfee Assignee: J Chris Anderson
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: elasticsearch adapter beta snapshot

2 couchbase nodes
2 elasticsearch nodes

Attachments: GZip Archive elasticsearch.log.gz     GZip Archive elasticsearch_simple.log.gz     GZip Archive xdcr.tar.gz    

 Description   
Applied a 20k op/sec load:
run workload --get 50 --create 12 --update 30 --delete 8 --ops 20000

sample 512byte document template:

{
  "padding": "456GDVRHLDIZA6LB1MS8F90WZRB1CQQ2DKWCOYOTHBYPXAF0NY2ER6GO6OHFO8UI0EHSP7KCKIERSPKRDOUF1MNHPS9XPLW08494SN0XANL2HAGSH2DVI1SBPWBX0LJR4K34IFQHDD74MK5VPYXEEW325L25TC87E44JUXY9RIL6DNF2W1A6JE7N72R58ROG9PB6N6IS5BS6MVIZB8TD1PTNQHHOG0VLH64FO0JZW1FEKU3MY2C4KF1SD12X709BWSRQCZKPKYZ9HSYQKUXCFH6RB2ELU1EM06VE9LU27URVLWO43LL4TF1YUBYJZ84V5DCMGX089YQZHC30CLPDF8MT1423NHUW80F2SJHEA8J5WJ4HJL9EQKD077ET712PYUXGFOR98QZDFPQ2C8FRQC44Z4SIMIBCD03EB6Z6EGYSI10VJBIQCOH024Q1LYEWTABAG74ZQQ22KJW78UPN2IBE4N57B",
  "city": "mateo",
  "list": [
    1,
    2,
    3
  ],
  "hello": "world",
  "map": {
    "sample": "map",
    "complex": 1
  }
}



and observing the following errors in logs:

[2013-01-21 16:05:56,005][ERROR][transport.couchbase.capi ] [Omega Red] Unable to parse decoded base64 data as JSON, indexing stub for id: D963FBCB91C6037390BB6404
[2013-01-21 16:05:56,006][ERROR][transport.couchbase.capi ] [Omega Red] Body was: �)+k^K^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ Parse error was: org.codehaus.jackson.JsonParseException: Unexpected character ('ð' (code 240)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: [B@42ff1c7f; line: 1, column: 2]
[2013-01-21 16:05:55,787][TRACE][com.couchbase.capi.servlet.CAPIServlet] Got bulk docs request for default/988;00000000000000000000000000000000
[2013-01-21 16:05:55,786][DEBUG][transport.couchbase.capi ] [Omega Red] Bulk doc entry is [{meta={id=DB55C88BCF4D5242B5EF15E7, rev=1-00448c25e6d518f20000000000000000, expiration=0, flags=0}, base64=eyJwYWRkaW5nIjogIlQ2VDY3RFNUWE9XMU9aMzlPRThaTlRPVzhSUkhaRTM3NEtIWU9LREVQRkQ4VDhTQ1hDTEk2WEc3Q1pCN0xNNTg3TjBCRDI0Rkw5NUJZSzBOTkNGSE5YTERMQjRXVzVVRkxHRDYwVjlMUDVSTEk3VkhKSU9BRFA2U0NGSldNRzQ0MDZaRTVMQjYzRExUM0hFSEZXSTVOSDdOODFWSDIzWFBLU1JNMkRKUE9DM1gzWUxZTzIwODZONkJQMFI1VDMyWlhLMTJESUsxNzc0SjMyVzBRUUlKVE9VUUNNWUpRN1g5SVI0S1hMWk9NM1hJWkVYQjEzNU42WlBPNzkxNkxYQ0U3SjJKVlJCTzBGNkk2TURXOVBLWlVIQ0xKOTZPUzBWTU00MkhBTlpHMVlOME1OMlcyTjdYWTFWNjRBR1ZaT0w4SDdWMVFPVVFGTzVWTkgyWTNOR0o0RkZSRDMxQzFKSDNXVkNJS0QzRUVOQ0FSVlVMQTZWTkNLTVpEQVo0TDBDS003TEFINFNJMk5YMVVBM080UkJMQkVMOEoyTzM5TjFVNjVDVUE2RkNWRExHQk9BTThCVjVKMzIwVU85RkRIVUtaMjRQNk02QzdBSjA2TDkxWjhaSFpUVVA0QlVOTiIsICJjaXR5IjogIm1hdGVvIiwgImxpc3QiOiBbMSwgMiwgM10sICJoZWxsbyI6ICJ3b3JsZCIsICJtYXAiOiB7InNhbXBsZSI6ICJtYXAiLCAiY29tcGxleCI6IDF9fQ==}, {meta={id=A49129193F3E5B5A733D9C1A, rev=4-00448c278e0946e20000000000000000, expiration=0, flags=0}, base64=eyJwYWRkaW5nIjogIk5QUlFGOVdNTDBFQkVCUU0zS0NVNVo3VU9ZMkRKVllDWEsyRlFCWTBPREhMUzRPTjJZRk5VSzczNkc4QlRLRFM1T0pQNkZHRjEzQ0FZRFE3MVpLSDJEVzlWTElQUTJWTFFHS1lPTFlQQTZaQUlOTThOUzUxSzdYVFBIT1o2RkxKUThJOFpQV003NE1MSTFYOTkxVENRMTZZUlBZNFlKRTZTN0NZS1czSDZJRzY0SzNVMEw4MjIwSk42R1Y4TjVUTVhDN0lORDBZU1I4SlM2Uk82UkpXRFRaUUNJWUhMRlY4V1U2VjNaU0NZN0lWS0xZVU9WWFFHS0ZXOUU4WVA2Q1RRSDJSTUJVNjI4M0ROMkZSVUJIR1FVV1c2R1NCVExVRzg4MDJKOEszSVcwTEpZREJIQjZYME5VNExKMTRETDNMREQ4SjUwRTVLTEdEOFZMQlk0VkJPOVU2TVhZM085UjFOS1ZMWkdGVFc3MjUxUlU4VEkzSkNCRkI4WkU1NE8yWFkyV0pMV0cxMlFRTEg3WjFQUUhTUDhSTzlYT1hVRFBQU01ZQTYxSUFYQ05ESUVGT1IzNzhaWVVYNlIwU01XSTdIVjAxMVozQ0w5SldDUVZXVEpTS0VMTFVUWFQ2NyIsICJjaXR5IjogIm1hdGVvIiwgImxpc3QiOiBbMSwgMiwgM10sICJoZWxsbyI6ICJ3b3JsZCIsICJtYXAiOiB7InNhbXBsZSI6ICJtYXAiLCAiY29tcGxleCI6IDF9fQ==},

 Comments   
Comment by Tommie McAfee [ 21/Jan/13 ]
full logs available at:

scp root@10.3.2.4:/tmp/elasticsearch-log-snippit.log.gz

Comment by Marty Schoch [ 21/Jan/13 ]
I've looked at the logs and in the bulk-docs request before this error was logged, we see the following:

{meta={id=D963FBCB91C6037390BB6404, rev=5-002e64fdb7fa7f3b0000000000000000, att_reason=non-JSON mode, expiration=0, flags=0, deleted=true}, b
ase64=8CkrawsAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=}

This is odd for several reasons:

1. att_reason=non-JSON mode implies that couchbase thinks this is non-JSON, but doing a get request on the server shows it as valid JSON
2. deleted=true but the document exists in Couchbase server now, Tommie says the keys are random so its not impossible it was recreated, but unlikely

Partially related... the plug-in was designed to detect non-JSON mode but it contained a bug, and was instead looking for "attr_reason" which is the wrong key. This explains the error message being logged, but not the rest of the behavior.
Comment by Tommie McAfee [ 22/Jan/13 ]
I've reproduced this with following functional test that creates then deletes docs at revision 1, so there are no intermediate mutations like in system testing:

python26 testrunner -i es-3.ini -t pytests.xdcr.esXDCR.ESTests.load_with_async_ops -p items=10000,rdirection=unidirection,doc-ops=delete,end_replication_flag=1


In the attached logs, non-JSON mode is there at time of delete. And since we are at rev-1 seems like it should be there at time of creation.

[2013-01-22 08:51:19,658][TRACE][com.couchbase.capi.servlet.CAPIServlet] parsed value is {new_edits=false, docs=[{meta={id=es_xdcr_docs-6486, rev=1-0044c3018c499ba20000000000000000, expiration=0, flags=0}, base64=eyJvcmRlcmluZyI6IDE0ODYsICJtdXRhdGVkIjogMCwgIl9pZCI6ICJlc194ZGNyX2RvY3MtNjQ4NiIsICJzaXRlX25hbWUiOiAieWFob28ifQ==}, {meta={id=es_xdcr_docs-8443, rev=2-0044c3026268e2930000000000000000, att_reason=non-JSON mode,

full es logs from test attached, gathering couch xdcr as well
Comment by Junyi Xie (Inactive) [ 24/Jan/13 ]
It seems to me the JSON parser called by ep_engine or couchstore incorrectly mark the doc as non-JSON and then propagate the flag to CouchDB and XDCR.

Talked to Chris briefly, actually neither XCDR and CouchDB would parse the doc content to determine it is a JSON or not. CouchDB (couch_doc module) just takes the flag from upper level (ep_engine or couchstore?) to determine the doc is a JSON or not.


Chris, please advise and put your comment here so we could know whom should look at this issue. Thank you!
Comment by Marty Schoch [ 28/Jan/13 ]
i overheard a bit of the conversation and it was mentioned that it was using some heuristic, not full parsing.

that may be fine, but we implemented an optimization which will do the wrong thing if this heuristic is ever wrong. so, if we suspect the heuristic may be wrong, then i should remove the optimization and try to parse it anyway.
Comment by Marty Schoch [ 28/Jan/13 ]
Had a conversation with Chris and Aaron about this. There are a couple of possibilities here:

1. It's possible that deletes always come through as non-JSON mode because the body is empty?
2. There were some legacy validation issues around docs containing things like "_id". At least one of the docs above contains this.

Aaron has asked for a sample document that exhibits the behavior, will wait until tomorrow 1/29 when Tommie will be in the office.

Also, Aaron has confirmed that the optimization we're using should be safe as the heuristic should only fail the other direction.
Comment by Tommie McAfee [ 29/Jan/13 ]

Just ran a simpler case by creating a sample doc from UI and then deleting it.

Document body:

{
  "click": "to edit",
  "new in 2.0": "there are no reserved field names"
}

In log stream below, looks like doc is created with rev=1-xxx and at time of delete it's rev=2-xxx. Not sure if a new revision of the document is first sent prior to delete or if this is expected. Additional logs attached(elasticsearch_simple.log.gz).


[2013-01-29 10:46:27,718][TRACE][com.couchbase.capi.servlet.CAPIServlet] parsed value is {new_edits=false, docs=[{meta={id=newdoc, rev=1-0046efc8ac0d3a230000000000000000, expiration=0, flags=0}, base64=eyJjbGljayI6InRvIGVkaXQiLCJuZXcgaW4gMi4wIjoidGhlcmUgYXJlIG5vIHJlc2VydmVkIGZpZWxkIG5hbWVzIn0=}]}
[2013-01-29 10:46:27,718][DEBUG][transport.couchbase.capi ] [Maestro] Bulk doc entry is [{meta={id=newdoc, rev=1-0046efc8ac0d3a230000000000000000, expiration=0, flags=0}, base64=eyJjbGljayI6InRvIGVkaXQiLCJuZXcgaW4gMi4wIjoidGhlcmUgYXJlIG5vIHJlc2VydmVkIGZpZWxkIG5hbWVzIn0=}]
[2013-01-29 10:47:32,743][TRACE][com.couchbase.capi.servlet.CAPIServlet] Got HEAD request for default/262;00000000000000000000000000000000
[2013-01-29 10:47:32,747][TRACE][com.couchbase.capi.servlet.CAPIServlet] Got HEAD request for default/master;00000000000000000000000000000000
[2013-01-29 10:47:32,752][TRACE][com.couchbase.capi.servlet.CAPIServlet] Got revs diff request for default/262;00000000000000000000000000000000
[2013-01-29 10:47:32,752][TRACE][com.couchbase.capi.servlet.CAPIServlet] revs diff request body was {"newdoc":"2-0046efc8ac0d3a240000000000000000"}
[2013-01-29 10:47:32,752][TRACE][com.couchbase.capi.servlet.CAPIServlet] revs diff parsed value is {newdoc=2-0046efc8ac0d3a240000000000000000}
[2013-01-29 10:47:32,752][TRACE][transport.couchbase.capi ] [Maestro] _revs_diff request for default/262;00000000000000000000000000000000 : {newdoc=2-0046efc8ac0d3a240000000000000000}
[2013-01-29 10:47:32,753][TRACE][transport.couchbase.capi ] [Maestro] _revs_diff response for default/262;00000000000000000000000000000000 is: {newdoc={missing=2-0046efc8ac0d3a240000000000000000}}
[2013-01-29 10:47:32,755][TRACE][transport.couchbase.capi ] [Maestro] _revs_diff response AFTER conflict resolution {newdoc={missing=2-0046efc8ac0d3a240000000000000000}}
[2013-01-29 10:47:32,801][TRACE][com.couchbase.capi.servlet.CAPIServlet] Got bulk docs request for default/262;00000000000000000000000000000000
[2013-01-29 10:47:32,802][TRACE][com.couchbase.capi.servlet.CAPIServlet] parsed value is {new_edits=false, docs=[{meta={id=newdoc, rev=2-0046efc8ac0d3a240000000000000000, att_reason=non-JSON mode, expiration=0, flags=0, deleted=true}, base64=8CkrawsAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=}]}
[2013-01-29 10:47:32,802][DEBUG][transport.couchbase.capi ] [Maestro] Bulk doc entry is [{meta={id=newdoc, rev=2-0046efc8ac0d3a240000000000000000, att_reason=non-JSON mode, expiration=0, flags=0, deleted=true}, base64=8CkrawsAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=}]

Generated at Sun Jul 13 03:23:51 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.