Analytics no longer working
Hello,
Analytics were working fine for our test cluster, now I am seeing "NaN" on all counters in all buckets. This seemed to have happened once we created a cluster. The cluster itself seems to work fine (We can place and get object into the cache). Even the Test Server is not showing anything in the test_application bucket.
Thanks,
Chanan.
Hi Matt,
Yes, I can spend time on this. Let me know what you need.
The computers are not firewalled from each other. I Just rebooted mine which has one of the two computers on the cluster and I still do not see analytics. I can reboot the other once the user comes in.
I did not see any error messages in the log, however I do see a repetition of a few lines regarding the configuration of the default pool. Not sure if that is normal.
Let me know what you need done.
Regards,
Chanan
Matt,
More info. When I try to expand one of the two nodes in the cluster view I get the following error:
I also get a popup dialog saying that there were too many errors. I can take a screen shot of that one if you need it.
Chanan
Sorry one more thing! :) Both machines are on XP, so they most likely both have the Windows Firewall turned on...
Hi Chanan,
In the documentation for the product, you'll see that there are several ports which must be open between the servers. You would also need the memcached, proxy and web services ports available between the servers and the clients.
If analytics worked before, then the ports must have been open at some point, unless you were looking at a single node configuration.
Please confirm your firewall settings mach the documentation. I'll also send you a private-message to get some other diagnostic info. I won't need the screenshots, thanks though.
Thanks,
- Matt
Hi Matt and Chanan
I would be interested if you were able to solve this problem, since I got a very similar one: On development I tested with one northscale memcached server. No problem. Then I went live, using two servers with northscale memcached server. Everything went good, I saw activity. Then, some 90 seconds later the analytics and all Buckets curves froze. I could do nothing. Relogin into the console, change views, nothing helped.
I found this thread and restarted the server which started the log server. First it didn't come back up since I wriggled with the config as in thread [url]http://forums.northscale.com/showthread.php?21-Memcache-failing[/url]. I changed the config back and the service was able to be started. I joined the cluster without problems. But the analytics still show NaN and in the cluster overview I get the message already mentioned by Chanan about "An error was encountered when..." But in the log only now (around 15 minutes later) something popped up, maybe because the server rejoined the cluster and wrote over the log? I stored a diagnostic report and could private-message it on demand.
Firewall is disabled on both machines since they are in a internal network. Since the whole caching still works and since analytics worked for something around 90 seconds I doubt the problem to be around there.
Nevertheless I'm happy that the caching still works. But I would be glad to be able to see some statistics and analytics...
Can anyone of you (or any in the forum) give me some hints? I realy would like to see Northscale memcache server give me some analytics again.
Thanks,
Christian
Hi Christian,
I got it "somewhat" working. I had to delete the contents config folder. This of course reset all the configuration - Cluster & Buckets - So proceed at your own risk. I still don't think that everything is correct though. It seems to me that "Total Items" always shows 0.
Hi again
I followed Chanans instructions (thanks!) and was able to enable analytics again, but by the cost of re-initializing the cluster (deleting the ns_x folder from install-dir config directory, booting the cache-servers, configuring buckets and security and finally joining the cluster again with the second server).
Analytics run again. But again for just around 90 seconds. Then they froze again and show all the time the same state. Bucket list shows that 1 bucket is 101% full. This can't be, since in 90 secs there are not cache actions to fill 4GB cache with 72 objects in cache and all being smaller than 1MB. Also both task managers show me 120MB per server, For two servers this makes 240MB and not 4GB... At least, creating and deleting a bucket still works and creates log entries. But around these log entries from after the freeze, I found following bug entry:
I wonder why and how that can happen, since caching itself is running perfectly. What I don't like is the way I'm flying blindfolded now... I would like to have some control over the cache.
Any more hints?
Thanks, regards,
Christian
Chanan, Christian,
Thanks for the continued info on this. We're attempting to reproduce in our labs and I'll report soon on these attempts.
The firewall parts are important, but sounds like you've hopefully been able to turn them off (or pop open the right ports through the firewall), if analytics was able to run for at least several seconds.
By the way, what browser are you using?
Steve
I am mainly on Chrome. Sometimes I check IE to make sure that all is ok.
My problem is no longer the analytics disappearing (I think due to the fact that I no longer use my machine which was the cause of the problem as past of the cluster any more). However, Total Items always shows as 0 for me.
Hi!
The Total Items graph is actually graphing Total Item changes or deltas from the previous sampling. So, if you're not sending new items into your memcached bucket, the graph should be showing 0.
The docs and labels/description on that graph should be fixed, and I've recorded a bug against this.
Cheers,
Steve
Ok, in that case, as long as I don't use my computer that causes the problem, I am all good.
Hi, quick followup on reproducing the issue of Analytics freezing up....
- QA reports some similar issues with Windows Internet Explorer browsers (7 and 8). After running on the analytics/graphs screens for awhile (> 90 seconds, sometimes 4 to 5 minutes), IE 7/8 sometimes (incorrectly) jumps over to the main console start page. Once on the main web console page, though, you can still click on Analytics and get back to seeing working graphs, at least for awhile. So, it might be a somewhat different issue than what you're running into, if I understand correctly, that Analytics seems to stop permanently for you? This has been bugged and raised to developers for more analysis.
- Firefox, however, doesn't seem to have these issues. Our QA folk have not been able to replicate on Firefox.
- Google Chrome (unfortunately) isn't on the "supported browsers list", and we haven't gotten around to trying this on Chrome.
Thanks,
Steve
I don't think what I was getting was a browser issue since the problem showed up in the logs. I sent my logs to Matt a week or two back.
For me it isn't a browser problem either. I use IE and Firefox. On the server causing problems and from my workstation. Logging into the web console on both dedicated memcached machines (i.e. giving URL to both of them and navigating to the analytics or the bucket list and selecting any bucket) gives the same result: a frozen graph. not moving any more. And displaying the error message in the log I posted above. I could also deliver a diagnostic report if that could help.
Thanks for any hint,
Christian
Hi Christian,
Let me private message you to get the diagnostic report and logs.
Thanks,
Steve
Hi Chanan,
The cluster is designed such that analytics collection can fail without it affecting cluster operations, so it is possible there is a failure with analytics collections.
On the cluster overview, do all of your nodes show as up? If a node is firewalled or doing something other than refusing connections, this could cause an issue with the analytics. We may have to get some additional information to determine what is causing the analytics to misbehave.
If you look at the logs, you should see a message about the node on which logs are being gathered. If you restart that node, there is a possibility that the the analytics will recover.
If you're willing to spend another minute or two on it, I'd like to get some additional information from your cluster. If you have the time, let me know and I can describe to you how we'll gather the additional information.
Thanks,
- Matt