CBlite size, bandwidth and replication considerations for 1M docs, 100.000 of them updated every 10 sec

Hi @heretyk,

the spatial features in Couchbase are currently a bit limited, but you could surely use it to get a subset of people around you. Currently I find things a bit fuzzy, so if you’ve any more specific questions, feel free to ask.

Cheers,
Volker

Hello @vmx,

Thanks for your feedback.

What i would like to know is this :

We developed an app which uses Couchbase Mobile.
We choosed this solution to ensure fast access to information, even for “big documents”…and to handle “out of network” situations.
The project is in his final stage of testing and is still confidential. But here is what i can tell you :

· In our Couchbase Server, each document is a user profile
· 1 channel (doc.type == profile) assigned dynamically through sync gateway
· 1.000.000 total users (Documents)but “only” 100.000 users connected simultaneously
· Each user is on either Android or Iphone,
· Those 100.000 users update 1 Json Key (location) in their own documentevery 10 seconds (maybe not at the exact same moment though)
· Each update is about 100 bytes.
· The document average size is 1 MB
· A continuous replication between datacenter and every clients (couchbase mobile)

So to make a summary, there is 100.000 users connected simultaneously
and each user is updating a 100 byte key String in HIS OWN document every
10 sec…We want the 99.999 other users to receive this update (using the
push/pull replication) in a time frame inferior of 1 minute (in normal
conditions with good connectivity).

What i basically need to know is :

-Can Couchbase mobile allow this scenario to works or not ?
-The bandwitdh needed by device in this scenario would be 1Mbit/s…is this correct ?
-The replication system with Couchbase mobile replicate ALL database each TIME ? Or only the differential between client database and servers? How much data will be replicated between each device and the Couchbase servers ?
-What would be the average database size on mobile’s phones ? (THERE is 1.000.000 documents of 1MB on Server)
-We want each user (still in this scenario) to receive all others user’s updates in a time frame less than 1 minutes. Is this ok or not ?

@jens helped already helped and i thank him for that.He advised to used a dedicated app server to pre-process data and control what is replicated. It is in the plan…but only in 1 year or 2. For now, we need to used Couchbase Mobile.

So can you help on theses questions please ?

Regards

I feel like we’re going around in circles. We’ve already established that the system as you’re describing it is infeasible:

  • 1e2 bytes/msg x 1e-1 msgs/sec x 1e5 other users = 1e6 bytes/sec (that’s a mega-byte/sec, not a megabit.) Not a realistic expectation for a mobile user’s available bandwidth except under ideal conditions in an area with the latest cell network.
  • On the server side, 1e6 bytes/sec/user x 1e5 users = 1e11 bytes/sec, i.e. the server is sending out 100 gigabytes/sec or 1 terabit/sec. That sounds pretty expensive, both in terms of hardware requirements and bandwidth costs. Also, as I pointed out before, if your users are in the same place, like a conference or festival, there’s absolutely no way the facility can send all that data to them.

Unless you change the architecture of your system to reduce the bandwidth requirements, there’s not really much point in discussing this any further, IMHO.

This just means that users won’t receive the updates in the delay i want in this scenario and, for some users( or a lot), their database might not be updated as fast as the “1-minute” delay i want.

But actual telecoms protocol/standards such as HSDPA (even older EDGE) and W-CDMA at Layer 1/2 will be able to ‘split’ frames and handle traffic so that even if radio tower don’t deliver at 1 MB/s…User will receive their data in the end.
No ?

Also i clearly said

meaning that in worst case i want : 'CB Lite local Database on a device' = 'CB server Database' - 1 minute

Why it is “unfeasible” ? I don’t understand

No, this is a bandwidth problem not a latency problem. As you’ve specified it, the data simply can’t be delivered (in real-world conditions.)

If you want to relax the constraints by sacrificing latency, you’ll need to change that “10 second” number to something larger.

You’ve said your product is “in the final stage of testing”, so you must have an implementation that’s running already, and you must have already done some extrapolation to figure out how it will scale, at least just at the bandwidth/latency level without taking server overhead into account.

There must be more details than what you’ve given us so far, otherwise you’d have arrived at the same results as my quick calculations above and changed the architecture to something more realistic. I don’t think there’s more insight we can provide without more info from you about what’s being sent and received.

Well, “Performance/Load/Stress Testing” are that “final stage of testing”. We could only test it on 10 users at the moment.
Larger test (beta testing) are planned for next weeks.

But, to be honest , i thought that even with a scenario of "bandwidth available < bandwidth needed" on the phone, the replication time between CB Server and Device would be longer yes…user would have to deal with less accurate information, yes…but data would arrive anyway.

Can you help me on this :
What would happen on the phone/user side so ?
Replication would crash ? App would crash ? Phone would take fire ?

Most likely the client would get farther and farther behind, and keep getting time-outs and dropped sockets. In this particular case with lots of very small updates, the volume of data on the changes feed would be at least as large as the size of the updates. Since the server would be trying to send data across the changes feed faster than the network can transmit it, the socket buffers would overflow on the server side. I don’t know offhand what will break, but as Americans say, “you can’t stuff ten pounds of crap into a five-pound bag”. Either the kernel will close the socket, or the data queue inside the process will hit a limit and abort the socket, or the server process will just start using more and more RAM to hold the queue. Meanwhile the client gets farther and farther behind because the rate at which it learns about updates is a lot less than the rate at which the updates happen.

Understood, but you still have to do the initial design with an understanding of how it needs to scale. Like, if you were going to build a skyscraper you’d plan from the start to put elevators in it, even if your two-story test building didn’t need them. Or if you were designing a nightclub you’d work out how many amps the sound and light system would need before you install it, so that it doesn’t blow the circuit breakers the first time you turn it on. If you’re making an MMO game you work out how much bandwidth and server CPU each player will consume because that affects how you shard the instances. Etc.

Why do you think the title of the topic is “cblite-size-bandwidth-and-replication-considerations-for-1m-docs-100-000-of-them-updated-every-10-sec” ?

This thread is slowly decaying into noise and banter. The summary is this:

@jens 's response is that your design is not feasible, as there is too much data to reliably transmit over a cellular network. He advised you to reconsider your design to use less bandwidth. One suggestion is to relax the requirement of 10 seconds to something less frequent. Another suggestion from another user was to only have a client send out an update if their location has changed significantly since the last update. What are your thoughts about these proposals?

My thoughts are that even if these options are valid and that we are considering them, i want to know all the options. It is not in my way of doing things that choosing options, without knowing all the options actually.

I do not agree with you…It is rather interesting.
Please, read the following.

Now, we are getting somewhere…

In reading this, i understand that this “overflow” situation would be handled server-side.
It is rather logical to me…So okay, servers would close socket and/or client would receive time-out.

But there is something that i don’t understand :

I am sure that such situation must have been “think of” when Couchbase Server has been designed… hasnt’ it ?
I mean situation where lot of clients do request at the same moment a replication and, that eventually, the server cannot send all the data needed due to bandwidth issue…

This kind of scenario is pretty common in Database solution, don’t you agree ?

So can you, or anyone in your company answer this : “What happens if CouchBase Server cannot process all the replication requests at the same moment ?”

It would have been a great thread to start when you were starting your implementation, or maybe when it was at the alpha stage. “Final testing” seems really late, though. Which is exactly what I was trying to convey by the paragraph you replied to.

I am sure that such situation must have been “think of” when Couchbase Server has been designed… hasnt’ it ?
I mean situation where lot of clients do request at the same moment a replication and, that eventually, the server cannot send all the data needed due to bandwidth issue…

Sure. If you push anything past its limits it will fail. I’m not sure what your point is. If you want to know exactly how it will fail, that’s harder to answer because it’s a complex system with a lot of asynchronous behaviors, and also because details like network behavior and server CPU/memory will make a difference. But the important thing is that it will fail. In that case the appropriate response is to modify your design to make it realistic. Arguing with people won’t change the behavior of the system.

If you have some concrete ideas about how to modify your requirements to make them implementable (with any sort of server; this isn’t Couchbase-specific), I can respond to those. Otherwise I’ll sit this out.

My point :

Is it possible that Couchbase implement a ‘system’ where when it cannot push all the changes ( for example, due to heavy-charge and not enough bandwidth to client), it keep them in a queue system and wait until bandwidth to client gets higher ? or use whatever algorithm/method to segment data or split them …?

If yes, then my point is that in my scenario of 100.000 updates arriving simultaneously to the server…which is not able to deliver these changes through replication to 75% of users in the minute delay i need (because of bandwidth) …THEN it would deliver the updates to 25% USERS (that have the ‘best’ connection/bandwidth) and cache/queue the other 75% so that it they will be sent as soon as possible ?

Do you follow me ?

By the way…side question… does the replication uses a delta ? or it replicates the entire document when a changes occurs ?

Hello @Jens,

Following our discussions, following actions has been taken :

I modified the code design so that updates happens only if user has travelled at least 10M since the last location.
OR
If his location has not been uppdated since 3 Min.
I talked with some telecommunications engineers and We believe it should reduce the bandwidth usage by 5 to 10 times depending of context.

What’s your opinion on this ?

On another point, if you can provide some answer to the previous post (last one), it could really help.

We are aware that the scenario we debated earlier is very “extreme”, but it is probable that in 2 or 3 years, we have to manage this amount of simultaneous updates… (now it won’t be every 10 sec anyway).
Yes, we will plan to deploy an app server to reduce bandwidth usage, and yes we agree it is the best solution.

But i need to know what would happen if by any chance, there is too many updates to replicate for the bandwitdth available…

Thanks
Regards,

That will help somewhat. But downloading 100,000 documents to a device (on first sync, or after you’ve been offline for a while) will still take a long time. It’s not just 100 bytes per user; there’s metadata associated with documents and revisions too, and there’s overhead for things like JSON parsing and database operations. I don’t have an exact figure because it depends on a lot of variables like network conditions, the device’s CPU and storage speed, and server load. (It wouldn’t be hard to run a simulation, though, if you want to try it out.)

At a higher level, what the heck is the device going to do with 100,000 data points? The user certainly can’t make sense of them all. The two choices seem to be (a) display some sort of aggregate heat-map, or (b) let the user search for people s/he knows and ‘favorite’ them, then only display those users. Both of those can better be done without loading all the data onto the device. Have the server generate a heat-map and store it in a document that gets updated periodically. Use a server-side search API to let clients find people, and use channels to subscribe to the people the user wants to track.

Thanks.

We agree that the 2 solutions you suggests would be good ones. Definitely.

But as said before, the scenario of “1.000.000 docs in database (with 10% of them being updated almost at the same moment)” won’t happen before 2 or 3 years.
During this period, we would like to use CBLite as described before…We will have so like 2 years to deploy either an API, an App Server, or to change the channels assignement depending on whatever criteras (users city for example).
On the server side even at the beggining, we will have top class servers with SSDs and lot of bandwitdh. That’s not a problem for us.

2 questions:

-Basically, you are saying that with the new update rate, it “solves” the bandwidth critical situations and so our design will work. But later, we will have to “re-think” which data are replicated. Am i correct ?

-What would be the size of a CBlite DB on a device for 1M docs of 1MB (server-side) ?

Regards
Regards

Hello @Jens,

Can you provide an estimate (based on maybe some of your customers or experience) of the size of an average “update” that would be pushed on a mobile ? (Let’s say that 1 doc of 1MB is updated on a String KEY which size is 100 bytes)…how much data this updates generates?

Regards

If you’re curious, you can find some informal docs of the replication algorithm here. The actual REST calls made are part of the general REST API.

First off, doc updates are not transmitted as deltas, though we have been doing some work to enable that in the future. So if someone updates 100 bytes of a 1MB doc, the full 1MB is uploaded to the server and then downloaded by the other clients. (The one exception to this is attachments: any attachments that didn’t change as part of a doc update do not get re-transmitted.)

During a pull, the server’s _changes feed sends the client the docIDs of all documents that changed, plus the current revID of each one (revIDs are about 34-40 bytes long), plus a few extra bytes for JSON syntax. The client then identifies which of those revisions are new to it and sends back _bulk_get requests for them in batches of 100-200. Each of those requests includes a list of (docID, revID) pairs. The response to a _bulk_get is MIME multipart; for each doc there’s something like 50 bytes of overhead for multipart delimiters and headers, and then the JSON, again including the docID and revID, plus a list of ancestor revIDs that the client doesn’t have yet.

As you can see this is not optimized for huge numbers of tiny updates. Delta-encoding of the JSON will help a lot, and I’ve done some work on a new replication protocol that’s more compact, but it’s always going to be less efficient than a custom broadcast protocol for the specific task of sending huge numbers of tiny updates.

Thanks a lot, very instructing.
Indeed, the new replication protocol would help a lot in my situation…even if later we implement a solution to “pre-process” data before replication (trough an app servers eventually).

What is the compression rate between Main DB & CB Lite ?
What size would represent this document of 1MB once replicated to the device ?

Thanks