GSOC 2013 Project Ideas

Version 6 by anil
on Mar 27, 2013 19:31.

compared with
Version 7 by anil
on Mar 27, 2013 19:34.

This line was removed.
This word was removed. This word was added.
This line was added.

Changes (15)

View Page History
*1. Couchbase Memory Allocator*
1. *Couchbase Memory Allocator*
*Description*: Couchbase currently uses the "the system allocator" to allocate continuous memory chunks and [tcmalloc|] on certain supported platforms. This works well for allocating a lot of variable sized objects but not very well with large objects. Internally, we don't need continuous memory allocations so this task is about replacing the use of the system allocator with a "block allocator" and enhancing tcmalloc to work in such scenarios.

*Difficulty* : Hard

*2. *Add data structure support in Couchbase*
*Description*: Application use different data structures for different functions. For example, queues can be implemented using lists and state can be stored using hash signatures. In this project, you'll extend Couchbase to support advanced data structures such as Lists, Sets, Sorted Sets and Hashes. These data structures need to be in-memory and persisted on disk.
*Expected results*: After this project is completed, application should be able to store different data structures in Couchbase.
*Difficulty* : Medium

*3. *In-Memory compression technique for Couchbase*
*Description*: Couchbase currently stores keys, metadata and potentially document contents in memory. This project involves coming up with efficient technique for compressing objects stored in memory.
*Expected results*: After this project is completed, data in-memory should be compressed efficiently.
*Difficulty* : Medium

*4. Integrate Google Breakpad*
4. *Integrate Google Breakpad*
*Description*: Couchbase relies on coredumps being available on the system in order to gather crash information back to engineering. In some deployments, users prefer to not have this enabled  since it takes for instance a fair amount of time to dump core of a binary with a 64GB memory footprint, and a fair amount of diskspace. In these circumstances it would be better to gather some simple diagnostics instead of just the simple "the program crashed".

*Difficulty* : Easy

*5. CBFS Blob Chunking*
5. *CBFS Blob Chunking*
*Description*: [CBFS|] (a.k.a Couchbase Large Object Store) is built on top of Couchbase Server.
File content in CBFS is represented as a single sequence of bits.  Large files require large blobs to move around during replication.  Small changes to large blobs require full duplication of the common parts.  Simple block-based chunking will make it a lot easier to move bits around and make appends (for example) cheaper.
*Difficulty* : Medium

*6. Couchbase Drupal Adapter*
6. *Couchbase Drupal Adapter*
*Description*: Write an adapter to run the [drupal|] platform on Couchbase.
*Expected results*: After this project is completed, Couchbase should be the primary backend platform for Drupal.
*Difficulty*: Medium

*7. *Mobile Application on Couchbase*
*Description*: Over the summer, you'll get a chance to team up with a member of the Couchbase mobile team to come up with a multidata driven mobile application built using [Couchbase Lite|].
*Expected results*: An application that uses Couchbase Lite for storage and sync. Potential applications include (but not limited to):
*Difficulty*: Medium

*8. Couchbase Worker Queue*
8. *Couchbase Worker Queue*
*Description*: An important part of a distributed application is a means of asynchronously processing work.  Many users have attempted to create their own work queues, but often in problematic ways.  By providing this as a service, users can build reliable, scalable applications significantly faster.

*Difficulty* : Medium

*9. *Big Data Testing for Couchbase*
*Description*: Testing large volumes of data for correctness is a challenging tasks. Here are a few options you can think of \-  (1) sampling smaller datasets and testing the validity, (2) building indexes on existing smaller datasets and verifying the data after the tests are run, (3) tracing a smaller dataset injected from start and end of test, monitoring it, (4) compressing/hashing data and compare hash values at end of large scale testing, and (5) identifying similar patterns in data and identifying outliers. The challenge here is having deterministic and predictable measures from this large volume of data with random sampling.

*Difficulty* : Medium

*10. *DataMapper Adapter for Couchbase's Ruby Programming Language*
*Description*: Datamapper is a good ORM framework. Currently it supports plenty of storage types but does not have one for Couchbase.