Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 2.0
-
Fix Version/s: 2.0-beta-2
-
Component/s: cross-datacenter-replication
-
Security Level: Public
-
Labels:
Description
The replicator has a logic that if there are less than 500 items int the queue it will wait a bit (which maybe is too much) until more items shows up.
This is why the replication rate when the queue is about 600K and less is dropping.
It seems that when there are 1M items in the queue and more the replication rate is still 5-10K even if there is load on the destination.
We need to find some solution for that because that logic means that the closer the destination gets to the source, the lower the replication rate is (because there are less items on the XDCR replication queue per vbucket).
At least the good news is that the front load on the source does not have major impact on the destination.
Damien is looking at the code in question: I don't see the where the batch size would cause this slowdown. I believe this is a problem in ep-engine where it gets into a state where waking the flusher doesn't work for some reason, so it it must wait for it wake itself.
I'm going to add some code to see how long we wait for full commits to happen, vs how long we spend doing the other replication work.
This is why the replication rate when the queue is about 600K and less is dropping.
It seems that when there are 1M items in the queue and more the replication rate is still 5-10K even if there is load on the destination.
We need to find some solution for that because that logic means that the closer the destination gets to the source, the lower the replication rate is (because there are less items on the XDCR replication queue per vbucket).
At least the good news is that the front load on the source does not have major impact on the destination.
Damien is looking at the code in question: I don't see the where the batch size would cause this slowdown. I believe this is a problem in ep-engine where it gets into a state where waking the flusher doesn't work for some reason, so it it must wait for it wake itself.
I'm going to add some code to see how long we wait for full commits to happen, vs how long we spend doing the other replication work.
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews
Peter Wansch
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Assignee | Junyi Xie [ junyi ] | Damien Katz [ damien ] |
Peter Wansch
made changes -
| Priority | Major [ 3 ] | Blocker [ 1 ] |
Farshid Ghods
made changes -
| Fix Version/s | 2.0-beta [ 10113 ] |
Peter Wansch
made changes -
| Fix Version/s | 2.0-beta [ 10113 ] |
Peter Wansch
made changes -
| Fix Version/s | 2.0-beta-refresh [ 10385 ] | |
| Fix Version/s | 2.0 [ 10114 ] | |
| Priority | Blocker [ 1 ] | Critical [ 2 ] |
Peter Wansch
made changes -
| Summary | XDCR ops/sec is low or at 0 for long period of times on destination | Replication rate is dropping when the queue size becomes small |
Farshid Ghods
made changes -
| Summary | Replication rate is dropping when the queue size becomes small | Replication rate is dropping when the queue size becomes less than 500 items |
Farshid Ghods
made changes -
| Labels | 2.0-beta-release-notes |
Dipti Borkar
made changes -
| Summary | Replication rate is dropping when the queue size becomes less than 500 items | Replication rate may drop when the XDCR replication queue size becomes less than 500 items |
Junyi Xie
made changes -
| Assignee | Damien Katz [ damien ] | Junyi Xie [ junyi ] |
Junyi Xie
made changes -
| Summary | Replication rate may drop when the XDCR replication queue size becomes less than 500 items | Replication rate may drop when the XDCR replication queue size becomes less than 500k items |
| Attachment | Screen Shot 2012-09-19 at 12.09.13 PM.png [ 15108 ] |
Peter Wansch
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
Peter Wansch
made changes -
| Reporter | Peter Wansch [ peter ] | Ketaki Gangal [ ketaki ] |
Farshid Ghods
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |