Using YCSB to Benchmark JSON Databases

브루스 린제이 는 이렇게 말했습니다."데이터베이스 세계에는 세 가지 중요한 것이 있습니다: 성능, 성능, 성능"이라고 말합니다. 대부분의 엔터프라이즈 아키텍트는 데이터베이스 기능과 아키텍처가 발전함에 따라 총소유비용을 안정적으로 비교할 수 있도록 개방적인 방식으로 성능을 측정하는 것이 중요하다는 것을 알고 있습니다.

YCSB 는 '클라우드 OLTP' 애플리케이션을 지원하는 데이터 저장소를 벤치마킹하는 훌륭한 작업을 수행했습니다. 이러한 데이터 저장소는 간단한 가져오기, 넣기, 삭제 작업으로 간단했습니다. 원본 YCSB 벤치마크 는 10개의 키 값으로 구성된 간단한 문서에 대한 간단한 삽입, 업데이트, 삭제, 스캔 작업으로 구성되며, 워크로드는 이러한 작업들을 다양한 비율로 혼합하여 정의됩니다.

JSON 다음과 같은 데이터베이스 카우치베이스 그리고 MongoDB 에는 스칼라, 중첩된 객체, 배열, 객체 배열, 배열 및 객체 배열이 포함된 고급 데이터 모델이 있습니다. JSON 데이터베이스는 또한 더 정교한 쿼리 언어, 인덱스 및 기능을 지원합니다. 애플리케이션은 CRUD 작업 외에도 이러한 데이터베이스의 선언적 쿼리 언어를 일상적으로 사용하여 보고서를 검색, 페이지 매기기 및 실행합니다. 따라서 설계자가 플랫폼을 효과적으로 평가할 수 있도록 돕기 위해서는 기본 CRUD 작업 외에 이러한 기능을 측정할 수 있는 추가 벤치마크가 필요합니다. 이 YCSB 튜토리얼에서는 그 간극을 메우는 기능을 설명합니다.

YCSB 종이 상태입니다: 또한 벤치마크 도구를 오픈 소스로 제공함으로써 다른 종류의 애플리케이션을 대표하는 추가적인 클라우드 벤치마크 제품군의 개발을 촉진하고자 합니다. 이와 관련하여 YCSB 프레임워크/도구의 핵심 기능은 새로운 시스템을 쉽게 벤치마킹할 수 있을 뿐만 아니라 새로운 워크로드를 쉽게 정의할 수 있도록 지원하는 확장성이라는 점입니다.

이 벤치마크는 기존 작업을 JSON으로 확장한 다음 새로운 작업과 새로운 워크로드를 정의하여 YCSB를 JSON 데이터베이스로 확장합니다.

개요는 다음과 같습니다.

소개
데이터 모델
벤치마크 운영
워크로드 벤치마크
YCSB-JSON 구현
YCSB-JSON은 어떻게 실행하나요?
참조

1. 소개

YCSB는 확장 가능한 NoSQL 키-값 데이터 저장소의 성능을 측정하기 위해 개발되었습니다. YCSB 인프라는 이 작업을 잘 수행합니다. YCSB는 단순한 플랫 키-값을 사용합니다. Couchbase는 고객이 대규모 대화형 애플리케이션에 사용하는 JSON 모델을 사용합니다. 저희는 고객이 이러한 애플리케이션을 효과적으로 구축할 수 있도록 제품에 기능을 구축해왔고 구축 중입니다. 이러한 사용 사례에 대한 성능 측정이 필요합니다.

JSON 모델을 지원하는 추가 데이터베이스가 있습니다: MongoDB, DocumentDB, DynamoDB, RethinkDB, Oracle NoSQL. JSON 데이터베이스(Couchbase, MongoDB 등)에서 YCSB를 실행할 때, 드라이버는 단순히 JSON 키-값 구조의 문자열을 저장하고 검색합니다. 이러한 모든 데이터베이스에는 JSON의 풍부한 구조(중첩된 개체, 배열)와 페이징, 그룹화, 집계와 같은 작업의 처리를 측정하기 위한 새로운 벤치마크가 필요합니다.

YCSB-JSON의 목적은 이 두 가지를 포함하도록 JSON 데이터베이스 기능을 측정하기 위해 YCSB 벤치마크를 확장하는 것입니다:

대규모 인터랙티브 애플리케이션을 대표하는 작업입니다.
- 중첩된 개체, 배열을 포함한 JSON 데이터 모델에 대한 작업.
이러한 애플리케이션의 작업을 나타내는 워크로드를 생성합니다.

이러한 고객 사용 사례를 참조하세요:

Marriott 는 IBM 메인프레임과 DB2에 예약 시스템을 구축했습니다. 하지만 점점 더 많은 고객이 이용 가능한 재고를 검색하려고 시도하면서 비용과 성능 문제에 직면하게 되었습니다. DB2의 시스템은 원래 전화 시스템이나 상담원으로부터 예약을 받도록 구축되었습니다. 조회 대 예약 비율이 낮습니다. 오늘날에는 조회 요청이 기하급수적으로 증가하면서 이 비율이 높아졌습니다. 이로 인해 데이터베이스 비용도 크게 증가했습니다. Marriott는 메인프레임 시스템에서 지속적인 동기화를 통해 모든 인벤토리 데이터를 Couchbase로 옮겼으며, 웹 애플리케이션은 조회/검색 작업에 Couchbase를 사용합니다.
Cars.com 는 자동차를 등록하고 판매하는 포털입니다. 그들은 오라클에 리스팅 데이터를 보유하고 있습니다. 웹에서 자동차 정보를 제공할 때는 기본적인 자동차 정보뿐만 아니라 얼마나 많은 사용자가 자동차를 살펴보고 있는지, 위시리스트에 저장했는지 등의 추가 인사이트도 제공해야 합니다. 이를 통해 참여도와 긴박감을 높일 수 있습니다. 이러한 대화형 작업에 필요한 모든 데이터는 Couchbase에 저장됩니다.

일반적으로 대규모 대화형 애플리케이션에는 다음이 포함됩니다:

객실 예약 가능 여부, 가격 세부 정보, 편의 시설 찾아보기 (최종 고객별 조회)
자동차 제조사/모델 또는 정비소 정보 찾아보기 (웹 규모 소비자 및 파트너 활성화)
고객에게 상황에 맞는 정보 제공 (위치 기반 서비스)
마스터 데이터와 트랜잭션 데이터 모두 제공 (규모에 따라)

이러한 요구 사항을 지원하기 위해 애플리케이션 및 데이터베이스는 다음을 수행합니다:

고비용 시스템 오브 레코드(메인프레임, 오라클) 데이터베이스의 쿼리 오프로드
- (예약 및 수익 앱)
웹/모바일 액세스를 위한 백오피스 기능 개방
- (웹 사용자가 객실 세부 정보를 확인할 수 있도록 설정)
더 나은 TCO로 데이터베이스/쿼리 확장
- (상품 서버로 메인프레임 확장)
새로운 협업/참여 애플리케이션에서 요구하는 기능으로 레거시 시스템을 현대화하세요.
- (재고, 항공편, 객실 현황, 부서별 분석 검색)

새로운 벤치마크는 이러한 작업을 구현하는 쿼리의 성능을 측정해야 합니다.

2. 데이터 모델

고객과 주문을 서로 다른 두 개의 JSON 문서 모음으로 간주했습니다. 각 주문에는 고객에 대한 참조가 있습니다.

아래는 샘플 고객 및 주문 문서입니다. 이는 fakeit 데이터 생성기를 통해 생성되었습니다. 이 도구는 다음에서 사용할 수 있습니다: https://github.com/bentonam/fakeit

데이터 모델과 도메인을 정의하는 데 사용되는 YAML 파일은 부록을 참조하세요.



Sample customer document
Document Key: 100_advjson
{
  "_id": "100_advjson",
  "doc_id": 100,
  "gid": "48a8e177-15e5-5116-95d0-41478601bbdd",
  "first_name": "Stella",
  "middle_name": "Jackson",
  "last_name": "Toy",
  "ballance_current": "$1084.94",
  "dob": "2016-05-11",
  "email": "Alysson83@yahoo.com",
  "isActive": true,
  "linear_score": 31,
  "weighted_score": 40,
  "phone_country": "fr",
  "phone_by_country": "01 80 03 25 39",
  "age_group": "child",
  "age_by_group": 12,
  "url_protocol": "http",
  "url_site": "twitter",
  "url_domain": "gov",
  "url": "https://www.twitter.gov/Stella",
  "devices": [
    "EE-245",
    "FF-012",
    "GG-789",
    "HH-246"
  ],
  "linked_devices": [
    [
      "AA-038",
      "BB-577"
    ],
    [
      "OO-565",
      "KK-448",
      "FF-281"
    ],
    [
      "BB-495",
      "AA-374"
    ],
    [
      "BB-609",
      "VV-899",
      "LL-675",
      "BB-291"
    ],
    [
      "CC-048"
    ]
  ],
  "address": {
    "street": "6392 Crona Rue Curve",
    "city": "Simeonland",
    "zip": "98316",
    "country": "Bahrain",
    "prev_address": {
      "street": "9063 Johns Islands Divide",
      "city": "South Jayme",
      "zip": "34950-8194",
      "country": "Bulgaria",
      "property_current_owner": {
        "first_name": "Weston",
        "middle_name": "Clyde",
        "last_name": "Considine",
        "phone": "(665) 343-9468"
      }
    }
  },
  "children": [
    {
      "first_name": "Darrel",
      "gender": null,
      "age": 10
    },
    {
      "first_name": "Shea",
      "gender": null,
      "age": 6
    }
  ],
  "visited_places": [
    {
      "country": "Iran",
      "cities": [
        "Heidenreichshire",
        "West Luciano",
        "Haroldmouth",
        "West Jakeburgh"
      ]
    },
    {
      "country": "Comoros",
      "cities": [
        "New Valliemouth",
        "East Kaleighland"
      ]
    },
    {
      "country": "Israel",
      "cities": [
        "East Kali",
        "Pabloport"
      ]
    },
    {
      "country": "French Guiana",
      "cities": [
        "North Zachary",
        "Kielmouth"
      ]
    }
  ]
}

See the appendix for the YAML file used to define the data model and domain.

Sample customer document

Document Key: 100_advjson

{

"_id": "100_advjson",

"doc_id": 100,

"gid": "48a8e177-15e5-5116-95d0-41478601bbdd",

"first_name": "Stella",

"middle_name": "Jackson",

"last_name": "Toy",

"ballance_current": "$1084.94",

"dob": "2016-05-11",

"email": "Alysson83@yahoo.com",

"isActive": true,

"linear_score": 31,

"weighted_score": 40,

"phone_country": "fr",

"phone_by_country": "01 80 03 25 39",

"age_group": "child",

3. 벤치마크 작업:

처음 네 가지 작업은 JSON 문서에 대한 작업이라는 점을 제외하면 표준 YCSB와 동일합니다. 나머지 연산은 새로운 연산입니다.

삽입: 새 JSON 문서를 삽입합니다.
업데이트: 하나의 스칼라 필드 값을 대체하여 JSON 문서를 업데이트합니다.
읽기: 무작위로 선택한 하나의 필드 또는 모든 필드의 JSON 문서를 읽습니다.
삭제: 주어진 키로 JSON 문서를 삭제합니다.
스캔: 무작위로 선택한 레코드 키에서 시작하여 순서대로 JSON 문서를 스캔합니다. 스캔할 레코드 수는 무작위로 선택됩니다(LIMIT).
검색: 3개의 필드에 대한 범위 술어를 기반으로 JSON 문서를 검색합니다(n개의 필드로 사용자 지정 가능).
페이지: 문서의 필드에 술어가 있는 쿼리의 결과 집합을 페이지 매김합니다.
- 모든 고객은 SQL, N1QL에서 무작위로 선택된 오프셋과 제한을 사용하여 zip으로 저장합니다.
NestScan: 1레벨 중첩 필드의 술어를 기반으로 JSON 문서를 쿼리합니다.
ArrayScan: 단일 수준 배열 필드 내의 술어를 기반으로 JSON 문서를 쿼리합니다.
ArrayDeepScan: 2단계 배열 필드(배열의 배열) 내의 술어를 기반으로 JSON 문서를 쿼리합니다.
신고: 특정 우편번호의 고객에 대한 고객 주문 세부 정보를 조회합니다.
- 각 고객에게는 여러 개의 주문이 있습니다.
- 주문 문서에는 주문 세부 정보가 있습니다.
Report2: 특정 날짜의 판매 주문 요약을 우편번호별로 그룹화하여 생성합니다.
로드: 데이터 로드 중입니다.
동기화: 다른 시스템에서 데이터 스트리밍 및 동기화.
집계: 그룹화 및 집계를 수행합니다.

카우치베이스용: 벤치마크 운영 구현 예제

처음 네 가지 작업은 JSON 문서에 대한 작업이라는 점을 제외하면 표준 YCSB와 동일합니다. 나머지 연산은 새로운 연산입니다.

카우치베이스는 두 가지 모드로 YCSB를 구현합니다.

KV=true. KV는 키-값을 의미합니다. 간단한 YCSB 작업인 INSERT, UPDATE, DELETE는 쿼리 대신 KV API를 통해 구현할 수 있습니다. KV=true로 설정하면 KV API를 사용한다는 의미이고, KV=false로 설정하면 N1QL (JSON용 SQL) 쿼리를 사용합니다. N1QL에 대한 튜토리얼은 다음 링크를 참조하세요. https://query-tutorial.couchbase.com

삽입: 새 JSON 문서를 삽입합니다.

KV=true: KV call to insert
KV=false: INSERT INTO customer VALUES(...)

1 2	KV=true: KV call to insert KV=false: INSERT INTO customer VALUES(...)

2. 업데이트: 하나의 스칼라 필드 값을 대체하여 JSON 문서를 업데이트합니다.


KV=true: KV call to UPDATE a single document.
KV=false: UPDATE customer SET field1 = value USE KEYS [documentkey]<span style="font-weight: 400"><strong>Read</strong>: Read a JSON document, either one randomly chosen field in the document or all the fields.</span>

KV=true: KV call to UPDATE a single document.

KV=false: UPDATE customer SET field1 = value USE KEYS [documentkey]<span style="font-weight: 400"><strong>Read</strong>: Read a JSON document, either one randomly chosen field in the document or all the fields.</span>


KV=true: KV call to fetch a single document.
KV=false: SELECT * FROM customer USE KEYS [documentkey]

KV=true: KV call to fetch a single document.

KV=false: SELECT * FROM customer USE KEYS [documentkey]

3. 읽기: 주어진 키로 JSON 문서를 가져옵니다.


KV=true: KV call to fetch a single document.
KV=false: SELECT * FROM customer USE KEYS [documentkey]

KV=true: KV call to fetch a single document.

KV=false: SELECT * FROM customer USE KEYS [documentkey]

4. 삭제: 지정된 키로 JSON 문서를 삭제합니다.


KV=true: KV call to fetch a single document.
KV=false: DELETE FROM customer USE KEYS [documentkey]

KV=true: KV call to fetch a single document.

KV=false: DELETE FROM customer USE KEYS [documentkey]

5. 스캔: 무작위로 선택한 레코드 키에서 시작하여 순서대로 JSON 문서를 스캔합니다. 스캔할 레코드 수는 무작위로 선택됩니다(LIMIT).


KV=TRUE:
SELECT META().id FROM customer WHERE META().id > “val” ORDER BY META().id LIMIT <num>
Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE META().id > “val” ORDER BY META().id LIMIT <num>

KV=TRUE:

SELECT META().id FROM customer WHERE META().id > “val” ORDER BY META().id LIMIT <num>

Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE META().id > “val” ORDER BY META().id LIMIT <num>

6. 페이지: 문서의 필드에 술어가 있는 쿼리의 결과 집합을 페이지 매김합니다.



All customers in address.zip with randomly chosen OFFSET and LIMIT in SQL, N1QL
KV=TRUE:
SELECT META().id FROM customer WHERE address.zip = “value” OFFSET <num> LIMIT <num>
Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE address.zip = “value” OFFSET <num> LIMIT <num>

All customers in address.zip with randomly chosen OFFSET and LIMIT in SQL, N1QL

KV=TRUE:

SELECT META().id FROM customer WHERE address.zip = “value” OFFSET <num> LIMIT <num>

Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE address.zip = “value” OFFSET <num> LIMIT <num>

7. 검색: 범위 술어를 기반으로 JSON 문서 검색 3개 필드 (n개의 필드로 사용자 지정 가능).



All customers WHERE (country = “value1” AND age_group = “value2” and YEAR(dob) = “value” )
All customers retrieved with randomly chosen OFFSET and LIMIT in SQL, N1QL

KV=TRUE:
SELECT META().id FROM customer WHERE country = “value1” AND age_group = “value2” and YEAR(dob) = “value” ORDER BY country, age_group, YEAR(dob) OFFSET <num> LIMIT <num>
Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE WHERE country = “value1” AND age_group = “value2” and YEAR(dob) = “value” ORDER BY country, age_group, YEAR(dob) OFFSET <num> LIMIT <num>

All customers WHERE (country = “value1” AND age_group = “value2” and YEAR(dob) = “value” )

All customers retrieved with randomly chosen OFFSET and LIMIT in SQL, N1QL

KV=TRUE:

SELECT META().id FROM customer WHERE country = “value1” AND age_group = “value2” and YEAR(dob) = “value” ORDER BY country, age_group, YEAR(dob) OFFSET <num> LIMIT <num>

Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE WHERE country = “value1” AND age_group = “value2” and YEAR(dob) = “value” ORDER BY country, age_group, YEAR(dob) OFFSET <num> LIMIT <num>

8. NestScan: 1레벨 중첩 필드의 술어를 기반으로 JSON 문서를 쿼리합니다.



KV=TRUE:
SELECT META().id FROM customer WHERE address.prev_address.zip = “value” LIMIT <num>
Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE address.prev_address.zip = “value” LIMIT <num>

KV=TRUE:

SELECT META().id FROM customer WHERE address.prev_address.zip = “value” LIMIT <num>

Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE address.prev_address.zip = “value” LIMIT <num>

9. ArrayScan: 단일 수준 배열 필드 내의 술어를 기반으로 JSON 문서를 쿼리합니다.



Find all customers who have devices with a value. E.g. FF-012
Sample devices field
 "devices": [
   "EE-245",
   "FF-012",
   "GG-789",
   "HH-246"
 ],
KV=TRUE:
SELECT META().id FROM customer WHERE ANY v IN devices SATISFIES v = “FF-012” END ORDER BY META().id LIMIT <num>
Fetch the actual documents directly using KV calls from the benchmark driver.
KV=false: SELECT * FROM customer WHERE ANY v IN devices SATISFIES v = “FF-012” ORDER BY META().id END LIMIT <num>

Find all customers who have devices with a value. E.g. FF-012

Sample devices field

"devices": [

"EE-245",

"FF-012",

"GG-789",

"HH-246"

KV=TRUE:

SELECT META().id FROM customer WHERE ANY v IN devices SATISFIES v = “FF-012” END ORDER BY META().id LIMIT <num>

Fetch the actual documents directly using KV calls from the benchmark driver.

KV=false: SELECT * FROM customer WHERE ANY v IN devices SATISFIES v = “FF-012” ORDER BY META().id END LIMIT <num>

10. ArrayDeepscan: 2단계 배열 필드(배열의 배열) 내의 술어를 기반으로 JSON 문서를 쿼리합니다.

프랑스 파리를 방문한 모든 고객 목록을 가져와 주세요.

KV=true:


SELECT META().id FROM customer
WHERE ANY v in visited_places SATISFIES
v.country = “France” AND
ANY c in v.cities SATISFIES c = “Paris” END
ORDER BY META().id
LIMIT <num>

SELECT META().id FROM customer

WHERE ANY v in visited_places SATISFIES

v.country = “France” AND

ANY c in v.cities SATISFIES c = “Paris” END

ORDER BY META().id

LIMIT <num>

벤치마크 드라이버에서 KV 호출을 사용하여 실제 문서를 직접 가져옵니다.

KV=false:


SELECT * FROM customer
WHERE ANY v in visited_places SATISFIES v.country = “France” AND
           ANY c in v.cities SATISFIES c = “Paris” END
      END
ORDER BY META().id
LIMIT <num>

SELECT * FROM customer

WHERE ANY v in visited_places SATISFIES v.country = “France” AND

ANY c in v.cities SATISFIES c = “Paris” END

END

ORDER BY META().id

LIMIT <num>

11. 신고: 특정 우편번호의 고객에 대한 고객 주문 세부 정보를 조회합니다.


Each customer has multiple orders.
Order document has order details.
KV=TRUE:
Not possible (easily without significant perf impact.
KV=false:

SELECT *
FROM customer c INNER JOIN orders o  
ON (META(id) IN c.order_list)
WHERE address.zip = "val"               

ANSI JOIN with HASH join:
SELECT *
FROM customer c INNER JOIN orders o USE HASH (probe)
ON (META(id) IN c.order_list)
WHERE address.zip = “val”

Each customer has multiple orders.

Order document has order details.

KV=TRUE:

Not possible (easily without significant perf impact.

KV=false:

SELECT *

FROM customer c INNER JOIN orders o

ON (META(id) IN c.order_list)

WHERE address.zip = "val"

ANSI JOIN with HASH join:

SELECT *

FROM customer c INNER JOIN orders o USE HASH (probe)

ON (META(id) IN c.order_list)

WHERE address.zip = “val”

12. Report2: 특정 날짜의 판매 주문 요약을 우편번호별로 그룹화하여 생성합니다.

KV=TRUE:
Need to write a program
KV=false:
SELECT  o.day, c.zip, SUM(o.salesamt)
FROM customer c INNER JOIN orders o  
ON (META(id) IN c.order_list)
WHERE c.zip = “value”
AND o.day = “value”
GROUP BY c.day, c.zip
ORDER BY SUM(o.sales_amt)



----ANSI join

SELECT  o.day, c.zip, SUM(o.salesamt)
FROM customer c INNER JOIN orders o
ON (META(id) IN c.order_list)
WHERE c.zip = “value”
AND o.day = “value”
GROUP BY c.day, c.zip
ORDER BY SUM(o.sales_amt)

------ANSI join with HASH join

SELECT  o.day, c.zip, SUM(o.salesamt)
FROM customer c INNER JOIN orders o USE HASH (probe)
ON (META(id) IN c.order_list)
WHERE c.zip = “value”
AND o.day = “value”
GROUP BY c.day, c.zip
ORDER BY SUM(o.sales_amt)

KV=TRUE:

Need to write a program

KV=false:

SELECT o.day, c.zip, SUM(o.salesamt)

FROM customer c INNER JOIN orders o

ON (META(id) IN c.order_list)

WHERE c.zip = “value”

AND o.day = “value”

GROUP BY c.day, c.zip

ORDER BY SUM(o.sales_amt)

----ANSI join

SELECT o.day, c.zip, SUM(o.salesamt)

FROM customer c INNER JOIN orders o

ON (META(id) IN c.order_list)

WHERE c.zip = “value”

AND o.day = “value”

GROUP BY c.day, c.zip

ORDER BY SUM(o.sales_amt)

------ANSI join with HASH join

SELECT o.day, c.zip, SUM(o.salesamt)

FROM customer c INNER JOIN orders o USE HASH (probe)

ON (META(id) IN c.order_list)

WHERE c.zip = “value”

AND o.day = “value”

GROUP BY c.day, c.zip

ORDER BY SUM(o.sales_amt)

13. Load: 데이터 로드 중입니다.

1백만 개의 문서를 로드하세요.
천만 개의 문서를 로드하세요.

14. 동기화: 다른 시스템에서 데이터 스트리밍 및 동기화

데이터 동기화 성능을 측정해야 합니다.
1. 1백만 개의 문서 동기화. 50% 업데이트, 50% 삽입.
2. 1,000만 개의 문서 동기화. 80% 업데이트, 20% 삽입.
이상적으로는 이 동기화는 Kafka 또는 다른 소스에서 데이터를 가져오는 다른 커넥터에서 수행됩니다.

15. 집계: 그룹화 및 집계를 수행합니다.

---Group Query 1

SELECT c.zip, COUNT(1)
FROM customer c
WHERE c.zip between "value1" and "value2"
GROUP BY c.zip

---Group Query 1

SELECT c.zip, COUNT(1)

FROM customer c

WHERE c.zip between "value1" and "value2"

GROUP BY c.zip



---GROUP BY query 2

SELECT o.day, SUM(o.salesamt)
FROM orders o
WHERE o.day  between “value1” and “value2”
GROUP BY o.day;

---GROUP BY query 2

SELECT o.day, SUM(o.salesamt)

FROM orders o

WHERE o.day between “value1” and “value2”

GROUP BY o.day;

4. 벤치마크 워크로드

워크로드는 이러한 작업의 조합입니다.

우선, 워크로드 정의는 워크로드-A부터 워크로드-E까지 YCSB 정의의 정의를 재사용할 수 있습니다. 자세한 내용은 다음에서 확인할 수 있습니다. https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads. 위에서 정의한 작업의 조합으로 추가 워크로드를 정의해야 합니다.

워크로드 SA는 새 모델의 워크로드 A와 동일합니다. 워크로드 B~F도 마찬가지입니다. 워크로드 B~F와 구분하기 위해 SB~SF라고 부릅니다.

워크로드	운영	레코드 선택	적용 사례
SA - 무거운 업데이트	읽기: 50% 50% 업데이트	집피안	사용자 세션의 최근 작업을 기록하는 세션 저장소
SB - 무거운 읽기	읽기: 95% 업데이트: 5%	집피안	사진 태그 추가; 태그 추가는 업데이트이지만 대부분의 작업은 다음과 같습니다. 업데이트: 5%는 태그를 읽어야 합니다.
SC - 읽기 전용	읽기: 100%	집피안	사용자 프로필 캐시, 프로필이 다른 곳(예: Hadoop)에서 구축되는 경우
SD - 최신 읽기	읽기: 95% 5% 삽입	최신	사용자 상태 업데이트; 사람들은 최신 상태를 읽고 싶어 합니다.
SE - 단거리	Scan: 95% 삽입: 5%	집피안/유니폼	스레드 대화: 각 스캔은 지정된 스레드에 있는 게시글을 대상으로 합니다(스레드 ID별로 클러스터링된 것으로 가정).
SF - 읽기, 수정, 쓰기	읽기: 50% 쓰기: 50%	집피안	사용자 데이터베이스 - 사용자가 사용자 기록을 읽고 수정하거나 사용자 활동을 기록하는 곳입니다.
SG - 페이지가 무거움	페이지: 90% 삽입: 5% 업데이트:5%	집피안	사용자 데이터베이스 - 새 사용자가 추가되고 기존 레코드가 업데이트되며 시스템에서 페이지 매김 쿼리가 수행되는 곳입니다.
SH - 무거운 검색	검색: 90% 삽입: 5% 업데이트: 5%	집피안	사용자 데이터베이스 - 새 사용자가 추가되고 기존 기록이 업데이트되며 시스템에서 쿼리를 검색하는 곳입니다.
SI - NestScan heavy	네스트스캔: 90% 삽입: 5% 업데이트: 5%	집피안	사용자 데이터베이스 - 새 사용자가 추가되고 기존 레코드가 업데이트되며 시스템에서 쿼리를 중첩하는 곳입니다.
SJ - 배열 스캔 헤비	배열 스캔: 90% 삽입: 5% 업데이트: 5%	집피안
SK - ArrayDeepscan heavy	ArrayDeepScan: 90% 삽입: 5% 업데이트: 5%	집피안
SL - 보고서	신고하기 100%
SL - 보고서2	Report2: 100%
SLoad - 로드	로드: 100%	모든 것	SoE 설정을 위한 데이터 로드
SN - 집계 (SN1, SN2)	집계: 90% 삽입: 5% 업데이트: 5%
SMIX - 혼합 워크로드	페이지:20% 검색:20% 네스트스캔:15% 배열 스캔:15% 배열딥스캔:10% 집계: 10% 신고하기 10%		아래를 참조하세요.
SSync - 동기화	동기화: 100% 병합/업데이트: 70% 신규/인서트: 30%		다른 시스템에서 참여 시스템으로 데이터를 지속적으로 동기화합니다. 아래를 참조하세요.

YCSB/JSON 워크로드에 대한 구성 예시



recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
Filternumlow = 2
Filternumhigh = 14
Sortnumlow = 3
Sortnumhigh = 6
page1propotion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100
scanlengthdistribution=uniform

recordcount=1000

operationcount=1000

workload=com.yahoo.ycsb.workloads.CoreWorkload

Filternumlow = 2

Filternumhigh = 14

Sortnumlow = 3

Sortnumhigh = 6

page1propotion=0.95

insertproportion=0.05

requestdistribution=zipfian

maxscanlength=100

scanlengthdistribution=uniform

감사

덕분에 라주 수라바르잘라, 이 작업을 추진한 카우치베이스의 QE 및 성능 담당 수석 디렉터와 이러한 노력을 지원해준 성능 팀 전체에 감사드립니다. YCSB-JSON 벤치마크는 다음과 협력하여 개발되었습니다. 알렉스 기릭, 카우치베이스 수석 성능 엔지니어. 그는 이 백서에 사용된 고객 및 주문에 대한 데이터 모델을 개발했으며, 운영 및 워크로드를 카우치베이스와 몽고DB용 YCSB-JSON에서 구현했습니다. YCSB-JSON 구현은 다음에서 확인할 수 있습니다: https://github.com/couchbaselabs/YCSB

덕분에 아론 벤튼, 카우치베이스 솔루션 아키텍트로, 사용하기 쉽고 효율적인 JSON 데이터 생성기인 페이크잇을 개발했습니다. 그는 카우치베이스에 합류하기 전에 이 기능을 개발했습니다. 다음에서 확인할 수 있습니다: https://github.com/bentonam/fakeit

다음 부분

YCSB-JSON에 대한 다음 글에서는 Alex가 Couchbase와 MongoDB에 대한 이 벤치마크의 구현에 대해 설명할 것입니다. 구현을 위한 소스 코드는 다음 링크에서 확인할 수 있습니다: https://github.com/couchbaselabs/YCSB

참조

YCSB로 클라우드 서비스 시스템 벤치마킹하기: https://www.cs.duke.edu/courses/fall13/cps296.4/838-CloudPapers/ycsb.pdf
JSON: https://json.org
JSON 생성기: https://www.json-generator.com/
YCSB-JSON 구현: https://github.com/couchbaselabs/YCSB

부록

YAML을 사용하여 고객 데이터 세트를 생성합니다.


name: AdvJSON
type: object
key: _id
data:
  fixed: 10000
properties:
  _id:
    type: string
    data:
      post_build: "return '' + this.doc_id + '_advjson';"
  doc_id:
    type: integer
    description: The document id
    data:
      build: "return document_index + 1"
  gid:
    type:
    description: "guid"
    data:
        build: "return chance.guid();"
  first_name:
    type: string
    description: "First name - string, linked to url as the personal page"
    data:
      fake: "{{name.firstName}}"
  middle_name:
    type: string
    description: "Middle name - string"
    data:
      build: "return chance.bool() ? chance.name({middle: true}).split(' ')[1] : null;"
  last_name:
    type: string
    description: "Last name - string"
    data:
      fake: "{{name.lastName}}"
  ballance_current:
    type: string
    description: "currency"
    data:
      build: "return chance.dollar();"
  dob:
    type: string
    description: "Date"
    data:
      build: "return chance.bool() ? new Date(faker.date.past()).toISOString().split('T')[0] : null;"
  email:
    type: string
    description: "email"
    data:
      fake: "{{internet.email}}"
  isActive:
    type: boolean
    description: "active boolean"
    data:
      build: "return chance.bool();"
  linear_score:
    type: integer
    description: "integer 0 - 100"
    data:
      build: "return chance.integer({min: 0, max: 100});"
  weighted_score:
    type: integer
    description: "integer 0 - 100 with zipf distribution"
    data:
      build: "return chance.weighted([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 0.4, 0.3, 0.25, 0.2, 0.17, 0.13, 0.11, 0.1, 0.09]) * 10 + chance.integer({min: 0, max: 10});"
  phone_country:
    type: string
    description: "field linked to phone, choices: us, uk, fr"
    data:
      build: "return  chance.pickone(['us', 'uk', 'fr']);"
  phone_by_country:
    type: string
    description: "phone number by country code, linked to phone_country field"
    data:
      post_build: "return chance.phone({country: this.phone_country});"
  age_group:
    type: string
    description: "field linked to age, choices: child, teen, adult, senior"
    data:
      build: "return  chance.pickone(['child', 'teen', 'adult', 'senior']);"
  age_by_group:
    type: integer
    description: "age by group, linked to age_group field"
    data:
      post_build: "return chance.age({type: this.age_group});"
  url_protocol:
    type: string
    description: "lined to url"
    data:
      build: "return  chance.pickone(['http', 'https']);"
  url_site:
    type: string
    description: "lined to url"
    data:
      build: "return  chance.pickone(['twitter', 'facebook', 'flixter', 'instagram', 'last', 'linkedin', 'xing', 'google', 'snapchat', 'tumblr', 'pinterest', 'youtube', 'vine', 'whatsapp']);"
  url_domain:
    type: string
    description: "lined to url"
    data:
      build: "return  chance.pickone(['com', 'org', 'net', 'int', 'edu', 'gov', 'mil', 'us', 'uk', 'ft', 'it', 'de']);"
  url:
    type: string
    description: "user profile url, linked to other document fields"
    data:
      post_build: "return '' + this.url_protocol + '://www.' + this.url_site + '.' + this.url_domain + '/' + this.first_name;"
  devices:
    type: array
    description: "Array of strings - device"
    items:
      $ref: '#/definitions/Device'
      data:
        min: 2
        max: 6
  linked_devices:
    type: array
    description: "Array of array of string"
    items:
      $ref: '#/definitions/Device'
      data:
        min: 3
        max: 6
        submin: 1
        submax: 4
  address:
    type: object
    description: An object of the Address
    schema:
      $ref: '#/definitions/Address'
  children:
    type: array
    description: "An array of Children objects"
    items:
      $ref: '#/definitions/Children'
      data:
        min: 0
        max: 5
  visited_places:
    type: array
    description: "Array of objects with arrays"
    items:
      $ref: '#/definitions/Visited_places'
      data:
        min: 1
        max: 4

definitions:
  Device:
    type: string
    description: "string AA-001 with zipf step distribution"
    data:
      build: "return chance.weighted(['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ', 'KK', 'LL', 'MM', 'NN', 'OO', 'PP', 'QQ', 'RR', 'SS', 'TT', 'UU', 'VV', 'WW', 'XX', 'YY', 'ZZ'], [1, 0.5, 0.333, 0.25, 0.2, 0.167, 0.143, 0.125, 0.111, 0.1, 0.091, 0.083, 0.077, 0.071, 0.067, 0.063, 0.059, 0.056, 0.053, 0.050, 0.048, 0.045, 0.043, 0.042, 0.04, 0.038]).concat('-').concat(chance.string({length: 3, pool: '0123456789'}));"
  Address:
    type: object
    properties:
      street:
        type: string
        description: The address 1
        data:
          build: "return faker.address.streetAddress() + ' ' + faker.address.streetSuffix();"
      city:
        type: string
        description: The locality
        data:
          build: "return faker.address.city();"
      zip:
        type: string
        description: The zip code / postal code
        data:
          build: "return faker.address.zipCode();"
      country:
        type: string
        description: The country
        data:
          build: "return faker.address.country();"
      prev_address:
        type: object
        description: An object of the Address
        schema:
          $ref: '#/definitions/Previous_address'
  Previous_address:
    type: object
    properties:
      street:
        type: string
        description: The address 1
        data:
          build: "return faker.address.streetAddress() + ' ' + faker.address.streetSuffix();"
      city:
        type: string
        description: The locality
        data:
          build: "return faker.address.city();"
      zip:
        type: string
        description: The zip code / postal code
        data:
          build: "return faker.address.zipCode();"
      country:
        type: string
        description: The country
        data:
          build: "return faker.address.country();"
      property_current_owner:
        type: object
        description: "owner object"
        schema:
          $ref: '#/definitions/Property_owner'
  Children:
    type: object
    properties:
      first_name:
        type: string
        description: "first name - string"
        data:
          fake: "{{name.firstName}}"
      gender:
        type: string
        description: "gender M or F"
        data:
          build: "return chance.bool({likelihood: 50})? faker.random.arrayElement(['M', 'F']) : null;"
      age:
        type: integer
        description: "age - 1 to 17"
        data:
          build: "return chance.integer({min: 1, max: 17})"
  Visited_cities:
    type: string
    description: "city"
    data:
      build: "return faker.address.city();"
  Visited_places:
    type: object
    properties:
      country:
        type: string
        data:
          build: "return faker.address.country();"
      cities:
        type: array
        description: "Array of strings - device id"
        items:
          $ref: '#/definitions/Visited_cities'
          data:
            min: 1
            max: 5
  Property_owner:
    type: object
    properties:
      first_name:
        type: string
        description: "First name - string, linked to url as the personal page"
        data:
          fake: "{{name.firstName}}"
      middle_name:
        type: string
        description: "Middle name - string"
        data:
          build: "return chance.bool() ? chance.name({middle: true}).split(' ')[1] : null;"
      last_name:
        type: string
        description: "Last name - string"
        data:
          fake: "{{name.lastName}}"
      phone:
        type: string
        description: "phone"
        data:
          build: "return chance.phone();"

name: AdvJSON

type: object

key: _id

data:

fixed: 10000

properties:

_id:

type: string

data:

post_build: "return '' + this.doc_id + '_advjson';"

doc_id:

type: integer

description: The document id

data:

build: "return document_index + 1"

gid:

type:

description: "guid"

data:

build: "return chance.guid();"

first_name:

type: string

description: "First name - string, linked to url as the personal page"

data:

케샤브 머시

이 문서 공유하기

댓글 하나

헤이파라데이 2월 5, 2019에서 5:56 오전

주문 데이터 집합을 생성하기 위한 주문용 YAML이 있나요?

로그인 하여 답글 남기기
1. 3bst0r 8월 26, 2021에서 8:32 오전
  
  저도 이걸 찾고 있습니다. 부록의 YAML에 "order_list" 키가 누락되어 있습니다.
  
  로그인 하여 답글 남기기
3bst0r 7월 14, 2021에서 9:03 오전

안녕하세요, 수고하셨습니다! 여기에 언급된 구현에 도달하는 방법에 대한 자세한 지침을 제공해 주시겠습니까? 방금 다음에서 마스터 브랜치를 확인했습니다. https://github.com/couchbaselabs/YCSB 여기에 언급된 워크로드나 새로운 작업의 구현을 찾을 수 없는 것 같습니다.

로그인 하여 답글 남기기
케샤브 머시 7월 14, 2021에서 9:12 오전

자세한 내용은 후속 글에서 확인하시기 바랍니다: https://www.couchbase.com/ycsb-json-implementation-for-couchbase-and-mongodb/

로그인 하여 답글 남기기
1. 3bst0r 7월 21, 2021에서 12:49 오전
  
  멋지네요, 감사합니다!
  
  로그인 하여 답글 남기기
알플라히 8월 9, 2021에서 2:34 오후

감사합니다,
질문이 있습니다. 새로운 요구 사항을 기반으로 새로운 워크로드를 생성하는 방법은 무엇입니까? 제발, 예제가 필요합니다.

로그인 하여 답글 남기기

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

YCSB를 사용하여 JSON 데이터베이스 벤치마크하기

1. 소개

2. 데이터 모델

3. 벤치마크 작업:

카우치베이스용: 벤치마크 운영 구현 예제

4. 벤치마크 워크로드

다음 부분

YCSB-JSON에 대한 다음 글에서는 Alex가 Couchbase와 MongoDB에 대한 이 벤치마크의 구현에 대해 설명할 것입니다. 구현을 위한 소스 코드는 다음 링크에서 확인할 수 있습니다: https://github.com/couchbaselabs/YCSB

참조

부록

받은 편지함에서 카우치베이스 블로그 업데이트 받기

작성자

게시자 케샤브 머시

댓글 하나

댓글 남기기 응답 취소

카우치베이스 카펠라를 시작할 준비가 되셨나요?

구축 시작

카펠라 무료 사용

연락하기