Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

Membased keeps crashing - How to get logs?

15 replies [Last post]
  • Login or register to post comments
Tue, 06/14/2011 - 12:46
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Hi,

We having the problem where Membase server (RedHat 64-bit 1.7) stops responding during high load (network load ~10Mb/sec) and became into totally unrecoverable mode. Only un-installing/installing helps.
The 'service membase-server restart' doesn't help.

Also, it looks like logs locations were changed in 1.7.

Would anybody know how to get logs?

Thanks,
Yaroslav

Top
  • Login or register to post comments
Tue, 06/14/2011 - 12:53
mikew
Offline
Joined: 03/14/2011
Groups:

You can do this one of two ways.

1) Log into the web ui. Click on logs in the menu on the left side of the page. Then click "Generate Diagnostic Report" at the top of the screen.

2) go to http://host:8091/diag

Top
  • Login or register to post comments
Tue, 06/14/2011 - 13:03
alkondratenko
alkondratenko's picture
Offline
Joined: 12/01/2010
Groups: None

And if that doesn't work (maybe REST is not operational). Then you can always use /opt/membase/bin/mbbrowse_logs. It'll output logs to standard output, so feel free to redirect it and compress before sending.

We're very interested in investigating your case. Especially if you'll find any mentions of mnesia crashes in logs.

Top
  • Login or register to post comments
Tue, 06/14/2011 - 13:22
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Thanks for help. I got the ERROR REPORT.
Would you know where could be the problem?

INFO REPORT <5865.3989.0> 2011-06-14 14:53:17
===============================================================================

ns_1@127.0.0.1:<5865.3989.0>:ns_doctor:86: Current node statuses:
[{'ns_1@127.0.0.1',
[{last_heard,{1308,81195,893808}},
{active_buckets,
["2024533825","526073312","2065562229","450510688","961970259",
"1312345239","372087324","1289532462","689367529","1270610252",
"611875953","925308374","150849161","274420529","293367125",
"2040537661","892908939","78977041","347778411","1278521626",
"807741344","1433124005","178921995","1870259058","2065552831",
"629384034","795896643","1705438677","1992160224","1520582243",
"2065533328","1433748454","1382230166","800866463","1579860062",
"1931124676","36686171","1738060323","1426238935","11511542",
"166701701","218675437","1564600726","1271712315","668440688",
"1294169211","611973538","1427788785","1225587272","1551454742",
"942932103","2124042833","1402309635","1885026157","1261449675",
"863932940","487637568","1798100439","1884585055","1107762095",
"1338314146","973241774","599711141","1205997512","1722341718",
"670784308","2065470006","315940234","2004215042","961052648",
"766990485","669985598","1945034342","876407633","183468321",
"542393271","411340964","1507667825","1610116869","1703377065",
"1282566341","538975779","1055286788","204557393","1420013172",
"1427846795","1281898772","1056109004","1198134579","918644692",
"938247851","303638623","2065470008","1397515540","300598727",
"286431701","223933190","2008531099","1813031363","test",
"2121832081","2065348094","314839826","2065359214","996086360",
"919165340","1809897274","1427720965","1108828915","1586152628",
"2065338598","1437800007","293192818","1240174047","701827162",
"959268259","1984424065","1713732086","1980383565","1197171018",
"2138968960","586795174","1904430191","1014193498","2140093275",
"1804167506","1647165803","390529037","2113147203","948940135",
"653368563","70292529","2140694356","1384633945","321228108",
"199570422","1161438366","2079752159","793316311","1638472500",
"930217798"]},
{ready_buckets,
["2024533825","526073312","2065562229","450510688","961970259",
"1312345239","372087324","1289532462","689367529","1270610252",
"611875953","925308374","150849161","274420529","293367125",
"2040537661","892908939","78977041","347778411","1278521626",
"807741344","1433124005","178921995","1870259058","2065552831",
"629384034","795896643","1705438677","1992160224","1520582243",
"2065533328","1433748454","1382230166","800866463","1579860062",
"1931124676","36686171","1738060323","1426238935","11511542",
"166701701","218675437","1564600726","1271712315","668440688",
"1294169211","611973538","1427788785","1225587272","1551454742",
"942932103","2124042833","1402309635","1885026157","1261449675",
"863932940","487637568","1798100439","1884585055","1107762095",
"1338314146","973241774","599711141","1205997512","1722341718",
"670784308","2065470006","315940234","2004215042","961052648",
"766990485","669985598","1945034342","876407633","183468321",
"542393271","411340964","1507667825","1610116869","1703377065",
"1282566341","538975779","1055286788","204557393","1420013172",
"1427846795","1281898772","1056109004","1198134579","918644692",
"938247851","303638623","2065470008","1397515540","300598727",
"286431701","223933190","2008531099","1813031363","test",
"2121832081","2065348094","314839826","2065359214","996086360",
"919165340","1809897274","1427720965","1108828915","1586152628",
"2065338598","1437800007","293192818","1240174047","701827162",
"959268259","1984424065","1713732086","1980383565","1197171018",
"2138968960","586795174","1904430191","1014193498","2140093275",
"1804167506","1647165803","390529037","2113147203","948940135",
"653368563","70292529","2140694356","1384633945","321228108",
"199570422","1161438366","2079752159","793316311","1638472500",
"930217798"]},
{replication,
[{"2065470006",1.0},
{"689367529",1.0},
{"1433124005",1.0},
{"938247851",1.0},
{"183468321",1.0},
{"1426238935",1.0},
{"1722341718",1.0},
{"1804167506",1.0},
{"2008531099",1.0},
{"1437800007",1.0},
{"411340964",1.0},
{"1108828915",1.0},
{"1271712315",1.0},
{"293192818",1.0},
{"1884585055",1.0},
{"793316311",1.0},
{"538975779",1.0},
{"300598727",1.0},
{"2079752159",1.0},
{"166701701",1.0},
{"487637568",1.0},
{"1384633945",1.0},
{"390529037",1.0},
{"653368563",1.0},
{"1382230166",1.0},
{"178921995",1.0},
{"2065533328",1.0},
{"1610116869",1.0},
{"1507667825",1.0},
{"2121832081",1.0},
{"1198134579",1.0},
{"925308374",1.0},
{"863932940",1.0},
{"919165340",1.0},
{"1703377065",1.0},
{"1240174047",1.0},
{"347778411",1.0},
{"1312345239",1.0},
{"1270610252",1.0},
{"372087324",1.0},
{"2124042833",1.0},
{"1278521626",1.0},
{"1225587272",1.0},
{"78977041",1.0},
{"942932103",1.0},
{"800866463",1.0},
{"996086360",1.0},
{"1579860062",1.0},
{"1564600726",1.0},
{"303638623",1.0},
{"599711141",1.0},
{"1980383565",1.0},
{"2024533825",1.0},
{"807741344",1.0},
{"2065348094",1.0},
{"961052648",1.0},
{"1205997512",1.0},
{"2065470008",1.0},
{"959268259",1.0},
{"1870259058",1.0},
{"204557393",1.0},
{"293367125",1.0},
{"1931124676",1.0},
{"450510688",1.0},
{"314839826",1.0},
{"668440688",1.0},
{"150849161",1.0},
{"315940234",1.0},
{"1282566341",1.0},
{"286431701",1.0},
{"1294169211",1.0},
{"199570422",1.0},
{"2065552831",1.0},
{"218675437",1.0},
{"223933190",1.0},
{"321228108",1.0},
{"1056109004",1.0},
{"2113147203",1.0},
{"961970259",1.0},
{"948940135",1.0},
{"1289532462",1.0},
{"670784308",1.0},
{"1647165803",1.0},
{"1809897274",1.0},
{"1904430191",1.0},
{"1520582243",1.0},
{"2138968960",1.0},
{"1055286788",1.0},
{"1705438677",1.0},
{"1638472500",1.0},
{"1586152628",1.0},
{"1992160224",1.0},
{"2065359214",1.0},
{"274420529",1.0},
{"2140093275",1.0},
{"1427720965",1.0},
{"542393271",1.0},
{"1885026157",1.0},
{"930217798",1.0},
{"1402309635",1.0},
{"1713732086",1.0},
{"629384034",1.0},
{"11511542",1.0},
{"36686171",1.0},
{"2065562229",1.0},
{"1945034342",1.0},
{"1107762095",1.0},
{"1397515540",1.0},
{"892908939",1.0},
{"1420013172",1.0},
{"701827162",1.0},
{"766990485",1.0},
{"1813031363",1.0},
{"1738060323",1.0},
{"1014193498",1.0},
{"2140694356",1.0},
{"586795174",1.0},
{"1427846795",1.0},
{"1984424065",1.0},
{"1338314146",1.0},
{"1798100439",1.0},
{"2040537661",1.0},
{"2004215042",1.0},
{"918644692",1.0},
{"876407633",1.0},
{"2065338598",1.0},
{"1197171018",1.0},
{"669985598",1.0},
{"1427788785",1.0},
{"1281898772",1.0},
{"973241774",1.0},
{"611875953",1.0},
{"526073312",1.0},
{"1551454742",1.0},
{"795896643",1.0},
{"1261449675",1.0},
{"1161438366",1.0},
{"611973538",1.0},
{"70292529",1.0},
{"1433748454",1.0},
{"test",1.0}]},
{memory,
[{total,4535272224},
{processes,4396935040},
{processes_used,4396457472},
{system,138337184},
{atom,950561},
{atom_used,917759},
{binary,10837208},
{code,7644730},
{ets,100805832}]},
{system_stats,
[{cpu_utilization_rate,97.72893772893772},
{swap_total,2146787328},
{swap_used,0}]},
{interesting_stats,
[{curr_items,0},{curr_items_tot,0},{vb_replica_curr_items,0}]},
{cluster_compatibility_version,1},
{version,
[{os_mon,"2.2.5"},
{mnesia,"4.4.17"},
{kernel,"2.14.3"},
{sasl,"2.1.9.3"},
{ns_server,"1.7.0"},
{stdlib,"1.17.3"}]},
{system_arch,"x86_64-unknown-linux-gnu"},
{wall_clock,61},
{memory_data,{16892911616,9545502720,{<5865.4004.0>,39954920}}},
{disk_data,
[{"/",136124392,4},{"/boot",101086,19},{"/dev/shm",8248492,0}]},
{meminfo,
<<"MemTotal: 16496984 kB\nMemFree: 7569012 kB\nBuffers: 4840008 kB\nCached: 1238144 kB\nSwapCached: 0 kB\nActive: 3367848 kB\nInactive: 5263676 kB\nHighTotal: 0 kB\nHighFree: 0 kB\nLowTotal: 16496984 kB\nLowFree: 7569012 kB\nSwapTotal: 2096472 kB\nSwapFree: 2096472 kB\nDirty: 42616 kB\nWriteback: 0 kB\nAnonPages: 2553460 kB\nMapped: 54596 kB\nSlab: 249188 kB\nPageTables: 15280 kB\nNFS_Unstable: 0 kB\nBounce: 0 kB\nCommitLimit: 10344964 kB\nCommitted_AS: 5653140 kB\nVmallocTotal: 34359738367 kB\nVmallocUsed: 264000 kB\nVmallocChunk: 34359474147 kB\nHugePages_Total: 0\nHugePages_Free: 0\nHugePages_Rsvd: 0\nHugepagesize: 2048 kB\n">>},
{system_memory_data,
[{system_total_memory,16892911616},
{free_swap,2146787328},
{total_swap,2146787328},
{cached_memory,1263378432},
{buffered_memory,4956164096},
{free_memory,7923769344},
{total_memory,16892911616}]},
{statistics,
[{wall_clock,{53321,1}},
{context_switches,{2223914,0}},
{garbage_collection,{135331,1086465720,0}},
{io,{{input,126544095},{output,86911502}}},
{reductions,{287781019,55981197}},
{run_queue,6},
{runtime,{378050,70490}}]}]}]

ERROR REPORT <5865.66.0> 2011-06-14 14:53:17
===============================================================================

ns_1@127.0.0.1:<5865.66.0>:mb_mnesia:176: Mnesia detected overload during dump_log because of write_threshold

ERROR REPORT <5865.72.0> 2011-06-14 14:53:17
===============================================================================

Mnesia('ns_1@127.0.0.1'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}

ERROR REPORT <5865.4650.0> 2011-06-14 14:53:20
===============================================================================

ns_1@127.0.0.1:<5865.4650.0>:stats_collector:121: Dropped 1 ticks

Top
  • Login or register to post comments
Tue, 06/14/2011 - 13:41
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Hi,

This is how it has started:

CRASH REPORT <5865.31444.86> 2011-06-13 20:19:56
===============================================================================
Crashing process
initial_call {ns_janitor,cleanup,['Argument__1']}
pid <5865.31444.86>
registered_name []
error_info
{error,badarg,
[{erlang,hd,[[]]},
{mb_map,balance,3},
{ns_janitor,cleanup,1},
{proc_lib,init_p_do_apply,3}]}
ancestors
[<5865.187.0>,mb_master_sup,mb_master,ns_server_sup,
ns_server_cluster_sup,<5865.51.0>]
messages []
links [<5865.187.0>]
dictionary []
trap_exit false
status running
heap_size 17711
stack_size 24
reductions 1359

INFO REPORT <5865.187.0> 2011-06-13 20:19:56
===============================================================================

ns_1@127.0.0.1:<5865.187.0>:ns_orchestrator:178: Janitor run exited for bucket "611973538" with reason badarg

Please help if anybody knows?

Thanks,
Yaroslav

Top
  • Login or register to post comments
Tue, 06/14/2011 - 13:41
mikew
Offline
Joined: 03/14/2011
Groups:

I have filed a bug for this issue (MB-3982). We will have an engineer look into the issue and follow up with you.

Top
  • Login or register to post comments
Tue, 06/14/2011 - 17:14
perry
Offline
Joined: 10/11/2010
Groups:

Yaroslav, I think the issue here has to do with how many buckets you have configured. We do have some known issues about supporting large numbers of buckets. Would it be possible to try the system with less than 10 buckets and ensure that it works properly for you?

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Wed, 06/15/2011 - 06:54
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Hi,

We have one bucket per website and using memcached buckets only. We use one bucket per site mostly because it`s only one way to invalidate all keys that belongs to one site. Or we would need to keep site <-> bucket/keys relationships somewhere else and sync every time when something gets changed.

The documentation says that we could have up to 1024 buckets and we assumed that few hundreds should be just fine. Is it the membase server limitation or memcached?

Would be it possible to tune up some configuration parameters to fix this issue?
I saw some suggestions about http://streamhacker.com/2008/12/10/how-to-eliminate-mnesia-overload-events/. Is it relevant?

Thanks,
Yaroslav

Top
  • Login or register to post comments
Thu, 06/16/2011 - 06:26
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Hi Perry,

We can't try to setup even 10 buckets now because the server is down. I guess this is the biggest problem because membase server crashed unrecoverable. It wouldn't be the huge problem if it would work after service.

We could try to setup 10 buckets only but then we need re-install packages and the current service state will be lost.

Would you need any data prior we uninstall it?

Thanks,
Yaroslav

Top
  • Login or register to post comments
Thu, 06/16/2011 - 06:54
minimedj
Offline
Joined: 06/15/2011
Groups: None

Hi Yaroslav,

Being scared my your post about crashing membase I did the following. I set up cluster of two machines and gave a little load on it. I did about 1 000 000 000 get-sets over 10 hours. For me everything runs smooth and cool. I use Ubuntu 10.10 Server.

Could you please describe you env. so people will know where problem could arise?

Top
  • Login or register to post comments
Thu, 06/16/2011 - 06:58
alkondratenko
alkondratenko's picture
Offline
Joined: 12/01/2010
Groups: None

If you can send us output of /opt/membase/bin/mbbrowse_logs from then node that fails to start, that'll help.

Top
  • Login or register to post comments
Thu, 06/16/2011 - 08:04
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Hi,

We run just one node on dedicated server DELL PowerEdge 2970:

Dell Memory: 16 GB DELL RAM, GB Memory: 16
Dell Servers: Dual Socket Quad Core AMD Opteron 2374HE 2.2 GHz, #Processors: 2, #Cores per Proc: 4
Hard Drive: 146GB SAS 15K RPM Drive, HDD RPM: 15000, GB Hard Drive: 146
Hard Drive: 146GB SAS 15K RPM Drive, HDD RPM: 15000, GB Hard Drive: 146
Hard Drive Size: 3.5 in. Hard Drives
IP Allocation: 1 IP, # IPs: 1
Linux OS: Red Hat Enterprise Linux 5 - 64 bit
RAID Configuration: RAID 1
advanced_networking: 1000Mb Port
Antivirus: Sophos
Membase Server Community Edition: 64-bit Red Hat Linux (RPM) 1.7

Thanks,
Yaroslav

Top
  • Login or register to post comments
Thu, 06/16/2011 - 07:56
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Hi,

I have uploaded full report to https://rapidshare.com/files/2809827931/browse_logs.log

Please let me know if you need anything else?

Thanks,
Yaroslav

Top
  • Login or register to post comments
Thu, 06/16/2011 - 16:16
perry
Offline
Joined: 10/11/2010
Groups:

Thanks Yaroslac, I'll take a look at those, but I'm pretty sure you'll have to reinstall the software in order to properly test it.

I believe you may have been mistakenly reading the documentation. For Membase buckets, we create 1024 "vbuckets" which are the underlying datastructures used for "auto-sharding" and rebalancing.

As I mentioned, we have some known issues with supporting large number of buckets and I think even a few hundred would be quite problematic. We've identified some areas for optimization/improvement but at this time, the software is not going to work well for you like that.

In terms of being able to invalidate the items for specific datasets (websites) you might consider using key versioning instead:
-Basically, you have a "version" key for a website, let's call it "website1_version" and set it to "0"
-In your app code, you can do a get on this key first to retrieve the "0"
-When accessing/creating a particular key for this website, simply add the version field to the keyname: "website1_users_". At first, that will be "website1_users_0"
-Now, when you want to "flush" that particular website, you can simply incremement the version key to "1".
-When you go to access "website1_users_1" the key won't exist, and your application will regenerate it like you would anyway

While this does create a "second hop" for getting to the actual data, memcached is so fast that it really doesn't matter. This method is used almost across the board even for very large websites in order to be very granular in their control and versioning of keys.

You may still want multiple buckets, but you don't need one for each website which also helps when deploying new sites by not forcing you to create and configure a new bucket each time.

How does that sound?

You can also search for "memcache key versioning" and get more tutorials online.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Fri, 06/17/2011 - 12:34
delphimaster
Offline
Joined: 06/14/2011
Groups: None

Perry,

We will try versioning. What is the maximum number of keys and suggested keyname length?

Thanks,
Yaroslav

Top
  • Login or register to post comments
Mon, 06/20/2011 - 10:38
perry
Offline
Joined: 10/11/2010
Groups:

There is no technical limit on the number of keys...it's just related to how much RAM you have.

There is a 255 byte limit on the keyname. Shorter ones will take up less space, longer ones more.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker