[PCBC-78] PHP value compression incompatible with Java/.NET Created: 26/Jun/12  Updated: 10/Aug/12  Resolved: 10/Aug/12

Status: Closed
Project: Couchbase PHP client library
Component/s: library
Affects Version/s: None
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Major
Reporter: Tim Smith (Inactive) Assignee: Matt Ingenthron
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
A customer has reported a bug in Couchbase's PHP client code. The code referred to is actually a compiled C module that PHP talks to. This bug is related to uncompressed JSON content originating from Java/.NET/ETC clients.

Background: When attempting to test PHP fetching of JAVA-originating JSON content from Couchbase, he received errors from the C code that initiates decompression of Couchbase payloads.

After introducing a hack to simply ignore decompression and after fixing a string length bug, everything worked fine.

Here's where the problem lies and what he did. (Note: All those fprintf() calls are his, and the whitespace formatting problems are due to e-mail loss):

Line 680 of "couchbase.c"

static int php_couchbase_zval_from_payload(zval *value, char *payload, size_t payload_len, unsigned int flags, int serializer TSRMLS_DC) /* {{{ */ {
int compressor;
zend_bool payload_emalloc = 0;
#ifdef HAVE_COMPRESSION
char *buffer = NULL;
#endif

fprintf(stdout, "C Code: Entering php_couchbase_zval_from_payload: line 687\n");
fprintf(stdout, "C Code: Raw Payload: %s: line 688\n", payload);

if (payload == NULL && payload_len > 0) {
php_error_docref(NULL TSRMLS_CC, E_WARNING,
"could not handle non-existing value of length %zu", payload_len);
return 0;
} else if (payload == NULL) {
if ((flags & 127) == IS_BOOL) {
ZVAL_FALSE(value);
} else {
ZVAL_EMPTY_STRING(value);
}
return 1;
}

if ((compressor = COUCHBASE_GET_COMPRESSION(flags))) { #ifdef HAVE_COMPRESSION
size_t len, length;
zend_bool decompress_status = 0;
/* This is copied from pecl-memcached */
////memcpy(&len, payload, sizeof(size_t));
len = strlen(payload);
compressor = 123;

fprintf(stdout, "C Code: Attempting to allocate memory bytes: %i: line 711\n", len+1);
buffer = emalloc(len + 1);

////payload_len -= sizeof(size_t);
////payload += sizeof(size_t);
length = len;

switch (compressor) {
case 123:
////Hack job
decompress_status = 1;
length = len;
memcpy(buffer, payload, len + 1);
fprintf(stdout, "\n\nHack fix on line 758:\n\n%s\n\n", buffer);

break;
case COUCHBASE_COMPRESSION_FASTLZ:
#ifdef HAVE_COMPRESSION_FASTLZ
decompress_status = ((length = fastlz_decompress(payload, payload_len, buffer, len)) > 0);
#else
php_error_docref(NULL TSRMLS_CC, E_WARNING, "could not decompress value, no fastlz lib support");
return 0;
#endif
break;
case COUCHBASE_COMPRESSION_ZLIB:
#ifdef HAVE_COMPRESSION_ZLIB
decompress_status = (uncompress((Bytef *)buffer, &length, (Bytef *)payload, payload_len) == Z_OK);
/* Fall back to 'old style decompression' */
if (!decompress_status) {
unsigned int factor = 1, maxfactor = 16;
int status;

do {
length = (unsigned long)payload_len * (1 << factor++);
buffer = erealloc(buffer, length + 1);
memset(buffer, 0, length + 1);
status = uncompress((Bytef *)buffer, (uLongf *)&length, (const Bytef *)payload, payload_len);
} while ((status == Z_BUF_ERROR) && (factor < maxfactor));

if (status == Z_OK) {
decompress_status = 1;
}
}



I believe the ideal mode would be that all Couchbase-supplied clients will default to using compatible methods for storing and retrieving values. Same compression thresholds, and same compression algorithms, by default. That way all of our products will work with each other.

All Couchbase client libraries should have documentation saying how to be compatible with old memcached clients from that language platform that may use a different algorithm. For example, the PHP client should describe what options to set in order to be able to read values from the pecl-memcached extension. That way, upgrading to the Couchbase library will allow readiing data that was saved from an older version. This should be under an "Compatibility with [the non-Couchbase library]" section, and referenced from the "Ugrade" pages as well.


 Comments   
Comment by Matt Ingenthron [ 26/Jun/12 ]
We would like to iterate all of the Couchbase Client libraries forward to use, where possible, common flags to understand data types. There are legacies we will be living with for some time though, so it won't be an overnight transition. Each client library picked flags for itself, without coordination. We'll standardize flags in a future release, but then we'll still have to be able to switch between the two.

The PHP client is supposed to be compatible with pecl-memcached, including compression, out of the box other than where noted in documentation. If there are parts we're missing, we need to file documentation bugs.
Comment by Matt Ingenthron [ 26/Jun/12 ]
See also PCBC-77
Comment by Matt Ingenthron [ 10/Aug/12 ]
Added as a good approach for now. We need to come up with a better approach and that's covered in PCBC-97
Comment by Matt Ingenthron [ 10/Aug/12 ]
Correction, that's covered in PCBC-96. :)
Generated at Thu Aug 21 05:25:28 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.