.NET Client - Change default key transformer
Hi All, I am trying to store some documents in a CouchBase server using the .NET client and hit a problem with configuring the client to use a different key transformer. I've looked around and it seems that the main posts or information is around configuring the MemCached client. I guess this is correct, however when I try making any MemCached settings they don't take hold..
Any help is greatly appreciated..
Config, so far I've tried setting the enyim.com keyTransformer here with no luck):
<couchbase>
<servers bucket="LIW_Advertisements" bucketPassword="LIW_Advertisements">
<add uri="http://localhost:8091/pools/default" />
</servers>
</couchbase>
<enyim.com>
<memcached>
<keyTransformer type="Enyim.Caching.Memcached.TigerHashKeyTransformer, Enyim.Caching" />
</memcached>
</enyim.com>My code for storing a document looks like this:
client.Store(StoreMode.Set, doc.Keyword, doc);
The 'doc' class has the following definition:
public class KeywordDocument
{
public int KeywordId { get; set; }
public string Keyword { get; set; }
public List<long> Creatives { get; set; }
}Thanks John,
For now I simply convert to a base64 string, and when selecting I do the same to find a document. I'll give this a try and see how it goes.
What I am looking at are keyword searches for documents, a separate 'keyword' schema from the document schema would record less data as I could record the keyword once, and a list of document id's for each. Then the document itself need not store the keywords, which would save a huge amount of disk/memory space.
I think this would be more ideal than using Views as I don't like the idea of storing the same keywords thousands of times over. Unless I have a conceptual misunderstanding, that is what would happen right?
Thanks again, Aaron.
Hey Aaron,
If I follow your goals correctly, you basically want to have tagged documents like in the following examples:
{
"text" : "The basis of all views are the map and reduce functions that select, format, and if necessary summarise
the information. There's a lot of power in those very basic components, and that's what makes Views so
useful and, for people coming from the SQL and RDBMS, more difficult to understand and capture their power.",
"keywords" : ["views", "map", "reduce"]
}
{
"text" : "In addition to managing your ASP.NET session state with Couchbase Server, you can now use Couchbase Server
as the backing store for your application's output cache. The latest commits to the Couchbase.AspNet
project on Couchbase Labs includes the CouchbaseOutputCacheProvider.",
"keywords" : ["asp.net", "cache"]
}And you want to be able query those documents by their keywords or tags?
If that's the case, you could create a view to emit an index on the keywords:
function (doc, meta) { if (doc.keywords) { for (var kwIdx in doc.keywords) { emit(doc.keywords[kwIdx], null); } } }
The code above will create an index on the keyword. When queried, the view will return the ID of the document which you could then use to look up the original document by ID (using the key/value API).
This would, as you point out, require that you store the keywords across multiple documents and in the index. The converse though is that you'd be storing document IDs along with keywords, so instead of storing a keyword 10 times, you might have three document IDs stored 3 or 4 times each. Maybe that's a much smaller storage footprint though?
Also, wondering about the Base64 encoding. Are your documents binary objects?
Hi John,
First point, correct, I have documents I want to find by matching keyword(s).
Second point, yes, over tens of thousands of documents (in future expanding to 100's of thousands) with between 30-150 keywords per document, and a rather homogeneous keyword data set, I think creating two different buckets would be more efficient as the keyword document key (an int32 or int64) would be much smaller than the variable length string for the keyword stored many times over per document. The additional storage of course would be an array of document ids (again int32 or int64) referring to the document by its id.
Am I correct then in assuming the concept is one-bucket, one-document type/schema? In which case I would need two buckets? For scalability, that makes sense, where we can scale one bucket based on its metrics alone and not the others.
The use of base64 encoding was on the keyword itself to permit using the keyword as the key to the document. Is that a good/bad/dumb idea..? Which also brings us full-circle to my original question..
Thanks again,
- Aaron.
Hey Aaron,
There is no imposed schema with Couchbase Server, so you could store all of your documents (regardless of schema) all in one bucket. Though your assertion about scaling resources differently is worth considering. Though keep in mind that the client connects to a single bucket, so operations would be going against two separate connection pools (which shouldn't be a problem). Also, views are bucket specific, so if you were to consider using a view at some point, you'd have to have the data all together in a single bucket.
Base64 should be unnecessary, since you can use any UTF8 string as a key. If you're using integers for keys, you could simply ToString() them and save a few bytes here and there...
-- John
You can set the keyTransformer as a child element of the couchbase element:
<couchbase> <servers bucket="default"> <add uri="http://localhost:8091/pools/default"/> </servers> <keyTransformer type="Enyim.Caching.Memcached.TigerHashKeyTransformer, Enyim.Caching" </couchbase>I'm going to be working soon on updating the appendix on configuring the client, and will port over the Enyim details.
http://www.couchbase.com/docs/couchbase-sdk-net-1.1/couchbase-sdk-net-co....