Import 6 Million Record


I am using Spring Batch to read from cvs files and save them to couchbase , what is the best and fastest way to save these , currently i am reaching 5K ops but i am looking for more .

1- What are the most important setting to check on couchbase server
2- What is the best way to save documents

Best Regards

Are you also using spring-data-couchbase? First, you need to decide if you want to use the 1.4 SDK or the new 2.0 one. If you want batch import, I’d probably say with async reactive ops in the new one you’ll be better off.

Can you share the code you use so we can take a look?
Also, what are your hardware specs and document sizes you want to store? Json? binary?


I am using the new 2.0.2 couch client , as for my testing environment i am using mac mini 2.8 i5 - fusion HD - giving 2 GB for the couchbase . (Note , i can test it on heavier and bigger machine if needed)

for the code , i am using the below in item writer with batch 5000

    public void write(List<? extends JsonDocument> list) throws Exception {
        .flatMap(new Func1<JsonDocument, Observable<JsonDocument>>() {
            public Observable<JsonDocument> call(final JsonDocument docToInsert) {
                return bucket.async().upsert(docToInsert);

Document is json and below is a sample

   "disconnectDate": "01/03/2015",
   "recordSequenceNumber": "1986102064",
   "egressSignalingType": "00",
   "egressIPCircuitEndPoints": "",
   "routeSelectedEgressTrunkGroupName": "TG_XXXX_UK_IP",
   "routeSelected": "LONXXX01:TG_XXXX_UK_IP",
   "callingNumber": "4164824896",
   "vendorId": "199",
   "billingNumber": "6473084200",
   "dialedNumberNOA": "02",
   "routeSelectedEgressGateway": "LONXXX01",
   "psxIndex": "1",
   "psxProcessingTime": "06",
   "calledNumber": "6473084200",
   "terminatedWithScript": "0",
   "egressCodecType": "",
   "callServiceDuration": 0,
   "ingressSignalingType": "012",
   "callDisconnectReasonTXEgress": "0",
   "ingressPSTNCircuitEndPoints": "",
   "vendorName": "XXXXX",
   "callDisconnectReason": "041",
   "customerName": "EAD",
   "disconnectInitiator": "02",
   "serviceProvider": "WHOLESALE",
   "scriptName": "TANDEM",
   "selectedRouteType": "7",
   "callDisconnectReasonTXIngress": "0503",
   "startDate": "01/03/2015",
   "egressRemoteSignalingIPAddr": "XXXXXXX",
   "chargeFlag": "0",
   "routeSelectedEgressData": "LONGSX01:TG_XXXX_UK_IP",
   "routeIndexUsed": "01",
   "ingressCodecType": "",
   "dialedNumber": "6473084200",
   "egressLocalSignalingIPAddr": "XXXXXXX",
   "ingressTrunkGroupName": "TG_XXX_UK_IP",
   "callDisconnectLocation": "9",
   "overloadStatus": "0",
   "customerId": "279",
   "startTime": "00:14:19.8",
   "gsxCallID": "0x7E0D77E9",
   "calledPartyNOA": "03",
   "egressProtocolVariantSpecData": "SIP,2114811881_130391831@XXXXXX,<sip:4164824896@>;tag=gK0d4d737a,<sip:6473084200@>,0,,,,sip:6473084200@,,,,sip:4164824896@XXXXXXX5060,,,,,,503,,0,0,,0,0,,,,,,,,1,0,0,0,,,,",
   "callingPartyNOA": "03",
   "egressTrunkGroupName": "TG_XXXX_UK_IP",
   "ingressIPCircuitEndPoints": "XXXXXXXX:22324/",
   "gatewayName": "LONGSX01",
   "timeElapsedRXAlert": "00",
   "recordType": "ATTEMPT",
   "disconnectTime": "00:14:20.2",
   "routeAttemptNumber": "1",
   "ingressRemoteSignalingIPAddr": "XXXXXXXX",
   "ingressLocalSignalingIPAddr": "XXXXXXXX",
   "timeElapsedDiscRXCompofCall": "30",
   "callSetupDelay": "22,327,3,352",
   "incomingCallingNumberNOA": "02",
   "callingName": "",
   "routeLabel": "RL_WS_64",
   "egressPSTNCircuitEndPoints": "",
   "timeElapsedSetupMsgRXLastCallRteAtt": "20",
   "incomingCallingNumber": "4164824896",
   "ingressProtocolVariantSpecData": "SIP,154200731-1-376759418@,<sip:4164824896@XXXXXXXX>;tag=sansay336316389rdb3714,<sip:6473084200@XXXXXX>;tag=gK0dcd7228,0,,,,sip:6473084200@,4164824896@,,,sip:4164824896@XXXXXXXX:5060,,,,,,503,,0,0,,0,0,,,,,,,,1,0,0,0,,,,",
   "timeElapsedRXPSXRsp": "10"


Hello Michael

You think the above method are the optimal way to do bulk insert.