Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

Getting dump of all the membase keys and values

1 reply [Last post]
  • Login or register to post comments
Fri, 06/03/2011 - 16:38
dsoti
Offline
Joined: 06/03/2011
Groups: None

For one of my projects I need to get a dump of millions of key-value pairs stored in running membase. I read TAP is the way to go and I found one java client as well but I saw some posts where people have mentioned that jtap outputs duplicate records. Can anyone share their experiences on this. I tried tap.py also and it hangs. If I kill it in between using CTRL+C then I do see some output but it is not in readable format. Is there more documentation around tap except for the couchbase link. Once I get the keys and values I have to uncompress the values and do something with them so I want to be able to recognize the boundaries of keys, values and records very well so I can consume that data programmatically.

Top
  • Login or register to post comments
Sat, 06/04/2011 - 15:11
mikew
Offline
Joined: 03/14/2011
Groups:

1) The issue with duplicate records in jtap is not a bug. This happens because when a tap stream is created the server sends all key-value pair in memory and also all key-value pairs on disk in Membase 1.6.5. This means that since an item can be on disk and in memory some items will get sent twice. If your Membase server can keep all of it's keys cached in memory than you will see everything sent twice. The philosophy behind tap dumps are that all keys are sent at least once. This tap strategy has changed in Membase 1.7, but it is still posible to get duplicates be very less frequently.

2) If you want to get a list of all keys without duplicates you can use jtap and have it output the results to the file. Then run this command on the file:

sort myfile.txt | uniq

This will remove all of the duplicates.

3) There is also a pre-release of spymemcached 2.7 on the spy pre-release wiki page that has a tap interface. This version allows you to connect a tap stream to all servers at once and also has the ability to track topology changes in your cluster. This will replace the jtap project and both provide similar functionality. You can use either.

4) I'm not as familiar with tap.py, but you are much better off using jtap or spymemcached 2.7. If really want to use the python version though reply to this post saying you want to know how tap.py works and I will take a look at it for you.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker