Cbimport --field_separator

Hi !
I have an issue with the field_separator field in commandline for cbimport. My csv file is a regular one line separated by tabs and values by spaces or semicolons. I cannot get the right command to get the first line to be the keys in the JSON representation of the record.
Do you know how to tune this ?
Thank you very much for your answer.

Do you mean your Comma-Separated-Value file is a single line (no newline characters) with each “row” separated by a TAB character and within each “row”, values have spaces or semi-colons between them ? ("row"s to be converted to documents.)

Could you provide a (fabricated?) example of the file?

ANN;JAN;FEB;MAR tab (tab is newline), semicolon are separators, the first line is the keys for the JSON dictionnary
1989;10;20;30 tab
1990;11;25;28 tab

It works with replacing by commas, but not with spaces or with other separators in the commands I used

What version are you using; I’ve used 7.2.0 for these:

$ cat t.csv
a;b;c
1;2;3
4;5;6

$ xxd t.csv
00000000: 613b 623b 630a 313b 323b 330a 343b 353b  a;b;c.1;2;3.4;5;
00000010: 360a                                     6.

$ cbimport csv -c localhost -u Administrator -p password -b default -d file:///tmp/t.csv -g '#UUID#' --field-separator ';'
CSV `/tmp/t.csv` imported to `localhost` successfully
Documents imported: 2 Documents failed: 0

$ curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select * from default'
{
"requestID": "1b27f235-5708-49f0-b3da-28dfbb99aa96",
"signature": {"*":"*"},
"results": [
{"default":{"a":"1","b":"2","c":"3"}},
{"default":{"a":"4","b":"5","c":"6"}}
],
"status": "success",
"metrics": {"elapsedTime": "61.525682ms","executionTime": "61.40333ms","resultCount": 2,"resultSize": 74,"serviceLoad": 3}
}
$ cat t.csv
a b c
1 2 3
4 5 6

$ xxd t.csv
00000000: 6120 6220 630a 3120 3220 330a 3420 3520  a b c.1 2 3.4 5 
00000010: 360a                                     6.

$ cbimport csv -c localhost -u Administrator -p password -b default -d file:///tmp/t.csv -g '#UUID#' --field-separator ' '
CSV `/tmp/t.csv` imported to `localhost` successfully
Documents imported: 2 Documents failed: 0

$ curl -su Administrator:password http://localhost:8093/query/service -d 'statement=select * from default'
{
"requestID": "854e8518-4f68-4747-95b0-4009e2baf5be",
"signature": {"*":"*"},
"results": [
{"default":{"a":"1","b":"2","c":"3"}},
{"default":{"a":"4","b":"5","c":"6"}}
],
"status": "success",
"metrics": {"elapsedTime": "64.229753ms","executionTime": "63.97735ms","resultCount": 2,"resultSize": 74,"serviceLoad": 3}
}

I don’t know what version, but it returns this error: missing field-separator

Could you:

cat /opt/couchbase/VERSION.txt

or run:

SELECT ds_version();

to let us know the version?

I had a scratch through the source and “field-separator” was added as an option in 2016 - so present since at least version 5. This makes me think it is a command-line syntax error rather than the option (or a really out-of-date version).

Could you share your command line (along with the version obtained from a method noted above) ?

No, because it works with other field_separators, just the csv is not loaded with the right format

Do my commands with their example data not work for you? I included the hex dumps so you could see the precise content of the files - does that hold any clues as to why your files won’t load and report the “missing field-separator” error.

I’ve assumed you’ve specified “(tab is newline)” to indicate clearly where newline characters exist in your data. Is this not the case? Can you hex-dump your data and verify only the expected (0x0d, 0x0a or 0x0d0a) sequences terminate each line?

Have you got any embedded new line characters in your data?

I included my commands to show how I passed the --field-separator argument. Does this differ from your specification?

It is not a problem of tab and indeed it is the right interpretation. The commands don’t work with my file

OK, so you’d need to compare your file content - If it is as demonstrated then it should load. You’ll have to investigate to see where the specified format is broken in your file.

(I have tried breaking the file in various ways but have not succeeded in reproducing the error you’ve noted.)

I presume you’ve tried using just the first two lines of your file for testing - “head -2 src.csv > test.csv” or similar ?

Not yet, but I was wondering if it could be syntax as when I put the semicolon, the ’ is changing from a straight one to a italic one. Just for semi-colon

That will be something in your shell - how different delimiters are entered and interpreted. And yes, the meaning will change. You can use " " and ";" too. (If you’re on Windows, then double quotes (ASCII character code 0x22) are a must.)

Even with double quotes

OK, so the cbimport command requires the single-character value be passed as the argument value. How precisely this is supplied to cbimport is a function of your shell.

What is your OS & shell ?

If you can’t get these characters passed by your shell (have you tried a copy-n-paste of the commands from the examples?) try a different shell, if one is available.

MacOS terminal, I tried copy and paste, but the style changes anyways

Do you experience this character change when an editor running in your terminal or only at the command line? (Which shell are you using? - ps is the simplest way to tell.) If it is just at the command line, try switching shell.

I’m not a Mac user so I’ve asked about and with iTerm2 there is apparently no issue with input of ';' etc. – perhaps that’s an option if it is your terminal emulator itself.

An alternative may be to use a GUI editor to save the precise (i.e. without the character changes) command to a file and just run that instead.

HTH.

Actually, it does not appear in the editor (nano for instance). However if I copy paste from alternative GUI editor, there is a change when the command is paste

OK, so likely your terminal emulator itself. Pasting will be processed like key presses typically.

So I’d suggest examining your terminal settings and/or writing the command in nano and saving it to a file, then execute the file (set permissions and execute or simply sh /path/to/file).

That is a good idea ! Thanks I’ll try that !