Monitoring Couchbase Cluster Environment

개요

몇 년 전에 저는 얼마나 많은 고객이 Couchbase 클러스터를 모니터링하는지를 반영하는 블로그를 작성했습니다(https://www.couchbase.com/blog/monitoring-couchbase-cluster). 날짜가 오래되었지만 여전히 관련성이 있는 정보이므로 원본을 업데이트할 예정입니다. 이 일반 개요에 설명된 통계와 메트릭은 특정 사용 사례에 대한 Couchbase의 상태와 성능을 측정하기 위한 모니터링 프레임워크에서 자주 사용됩니다. 그 시점부터 우리는 종종 이러한 통계를 사용하는 구현 예제를 요청하는 사람들을 만나게 됩니다. 일반적으로 개발팀에서 실제로 필요로 하는 것은 광범위한 모니터링 프레임워크가 아니라 활용할 수 있는 간단한 스크립트입니다. 여러 기능의 팀이 '워룸' 설정에서 문제를 진단하고 클러스터 내에서 최근 이벤트를 간단하게 처리할 수 있는 방법을 원하는 경우가 많습니다.

여기서 목표는 포괄적인 모델을 제공하는 것이 아니라 REST API와 명령줄 도구를 통해 Couchbase 메트릭을 소비하는 간단한 구현 방법을 설명하는 것입니다. 이러한 메트릭을 정기적으로 소화할 수 있는 무언가를 마련한 후에는 이러한 메트릭을 어딘가에 저장해야 합니다. 이 단계에서는 이러한 메트릭을 Couchbase 버킷에 저장하고 N1QL을 사용하여 특정 클러스터에서 어떤 일이 일어나고 있는지 이해하겠습니다. 이러한 유형의 정보를 시각화하는 다른 방법도 있지만(https://www.couchbase.com/blog/2016/march/http-packages.couchbase.com-releases-4.5.0-dp1-couchbase-server-enterprise_4.5.0-dp1-windows_amd64.exe.md5), 이 예에서는 Couchbase 모니터링 및 관련 N1QL 쿼리에 중점을 두겠습니다.

정보 소화

저는 팀이 이해하고 사용할 수 있는 관련성 있고 유용한 예제를 제공하고 싶었습니다. 종종 파이썬을 활용하는 것은 사람들에게 친숙하고 다양한 시스템에서 쉽게 구현할 수 있습니다. 물론 셸이나 거의 모든 언어에서 동일한 방법을 활용할 수 있지만, 일반화된 논의에는 Python이 자연스럽게 적합해 보였습니다.

전체 코드는 여기에서 확인할 수 있습니다... pythonlab/MonitorStats.py ... 하지만 Couchbase에서 정보를 수집하는 다양한 방법과 몇 가지 세부 사항을 살펴보겠습니다.

가장 먼저 설정해야 할 것은 binPath가 제공하는 로컬 Couchbase 바이너리를 찾을 수 있는 위치입니다.

# binPath = "C:Program FilesCouchbase Server"
binPath = "/Applications/couchbase-server-enterprise_4/Couchbase Server.app/Contents/Resources/couchbase-core/bin"
# binPath = "/opt/couchbase/bin"

# binPath = "C:Program FilesCouchbase Server"

binPath = "/Applications/couchbase-server-enterprise_4/Couchbase Server.app/Contents/Resources/couchbase-core/bin"

# binPath = "/opt/couchbase/bin"

관리 UI는 Couchbase의 훌륭한 기능으로 Couchbase 클러스터의 모든 노드에 기본적으로 설치되지만, 이 인터페이스는 클러스터의 "현재 상황"에 대한 보기를 제공하도록 설계되었습니다. 통계는 클러스터에 의해 시간에 따라 집계되며 클러스터에서 '무슨 일이 일어났는지'를 검토하기 위한 세부적인 정보는 제공하지 않습니다. 왜냐하면Couchbase 클러스터에서 실시간으로 생성되는 통계를 사용하려면 결과를 저장할 곳이 필요합니다. 다른 옵션도 있을 수 있지만, 여기서는 과거 통계를 저장하는 데 Couchbase를 사용하겠습니다. 통계를 저장하는 데 사용할 클러스터는 seedNode와 seedBucket으로 제어됩니다(Couchbase SDK는 클러스터의 모든 노드에 대한 연결을 생성하는 데 하나의 노드와 버킷 이름만 필요합니다).

seedNode = "192.168.61.101"
seedBucket = "testload"

1 2	seedNode = "192.168.61.101" seedBucket = "testload"

이 스크립트는 클러스터의 각 노드에서 로컬로 실행되며 해당 노드의 상태와 클러스터 보기에 대한 통계 프로필을 캡처합니다. 결과적으로 모니터링 중인 클러스터는 "localhost"로 정의되고 Couchbase REST API를 사용하여 로컬 호스트의 이름을 확인합니다.

clusterNode = "localhost"

1	clusterNode = "localhost"

모든 것은 스크립트가 실행 중인 노드와 클러스터를 구성하는 노드 수를 이해하는 데 기반합니다. 여기서는 이 정보를 캡처하여 모니터링 스크립트의 나머지 부분을 구동할 것입니다.

numNodes = int(commands.getoutput("curl -s -u Administrator:password https://" + clusterNode + ":8091/pools/default |jq '.nodes | length'"))
for i in range(0,numNodes-1):
    rtn = requests.get('https://Administrator:password@' + clusterNode + ':8091/pools/default')
    if rtn.status_code != 200:
        # This means something went wrong.
        print("oh crap " + rtn.status_code)
    z = json.loads(rtn.text)
    ctr = str((json.dumps(z['nodes'][i]['thisNode'])))
    #ctr = (str(commands.getoutput("curl -s -u Administrator:password https://" + clusterNode + ":8091/pools/default |jq .nodes[" + str(i) + "].thisNode")))
    if ctr == "true":
        thisNode = (str(commands.getoutput("curl -s -u Administrator:password https://" + clusterNode + ":8091/pools/default |jq .nodes[" + str(i) + "].otpNode")))
        thisNode = thisNode.split("@")[1]
        thisNode = thisNode.split(""")[0]
        #print ("this node" + str(thisNode))

numNodes = int(commands.getoutput("curl -s -u Administrator:password https://" + clusterNode + ":8091/pools/default |jq '.nodes | length'"))

for i in range(0,numNodes-1):

rtn = requests.get('https://Administrator:password@' + clusterNode + ':8091/pools/default')

if rtn.status_code != 200:

# This means something went wrong.

print("oh crap " + rtn.status_code)

z = json.loads(rtn.text)

ctr = str((json.dumps(z['nodes'][i]['thisNode'])))

#ctr = (str(commands.getoutput("curl -s -u Administrator:password https://" + clusterNode + ":8091/pools/default |jq .nodes[" + str(i) + "].thisNode")))

if ctr == "true":

thisNode = (str(commands.getoutput("curl -s -u Administrator:password https://" + clusterNode + ":8091/pools/default |jq .nodes[" + str(i) + "].otpNode")))

thisNode = thisNode.split("@")[1]

thisNode = thisNode.split(""")[0]

#print ("this node" + str(thisNode))

나머지 스크립트에서는 이제 몇 가지 다른 방법으로 특정 통계를 사용할 수 있습니다. 여기서는 Couchbase 유틸리티 cbstats에서 정보를 가져오거나 REST API에서 직접 가져올 것입니다. 여기서는 cbstats를 사용하여 노드의 메모리 사용률을 얻고 Python REST 클라이언트(요청)를 사용하여 드래인 큐를 가져와서 클러스터가 디스크에 데이터를 지속하는 방법을 측정합니다. 이는 Couchbase 클러스터에서 모니터링해야 할 가장 중요한 통계 중 일부입니다.

memUsed = int(commands.getoutput(binPath + '/cbstats ' + str(thisNode) + ':11210 -b ' + clusterBucket + ' all -j |jq .mem_used'))

# diskDrain = commands.getoutput("curl -s -u Administrator:password https://" + seedNode + ":8091/pools/default/buckets/testload/stats |jq .op.samples.ep_diskqueue_drain[59]")
# Drain Queue via REST API
resp = requests.get('https://Administrator:password@' + str(thisNode) + ':8091/pools/default/buckets/' + clusterBucket + '/stats')
if resp.status_code != 200:
    # This means something went wrong.
    print("oh crap " + resp.status_code)
a = json.loads(resp.text)
diskDrain = int((json.dumps(a['op']['samples']['ep_diskqueue_drain'][0], indent=4, separators=(',', ': '))))

memUsed = int(commands.getoutput(binPath + '/cbstats ' + str(thisNode) + ':11210 -b ' + clusterBucket + ' all -j |jq .mem_used'))

# diskDrain = commands.getoutput("curl -s -u Administrator:password https://" + seedNode + ":8091/pools/default/buckets/testload/stats |jq .op.samples.ep_diskqueue_drain[59]")

# Drain Queue via REST API

resp = requests.get('https://Administrator:password@' + str(thisNode) + ':8091/pools/default/buckets/' + clusterBucket + '/stats')

if resp.status_code != 200:

# This means something went wrong.

print("oh crap " + resp.status_code)

a = json.loads(resp.text)

diskDrain = int((json.dumps(a['op']['samples']['ep_diskqueue_drain'][0], indent=4, separators=(',', ': '))))

실제 스크립트는 추가 정보를 보여 주지만 궁극적으로 모든 것을 JSON 문서로 캡처하고 TTL 를 30일 동안 사용하세요.

json_str = {
        'type': "stats",
        'flush': flushFail,
        'drain': diskDrain,
        'OOM': tempOOM,
        'miss': cacheMiss,
        'memory': memUsed,
        'operations': opsPer,
        'nodes': nodeHealth
    }

cb.upsert(str(thisNode) + "::" + nowStamp, json_str, ttl=2505600)

json_str = {

'type': "stats",

'flush': flushFail,

'drain': diskDrain,

'OOM': tempOOM,

'miss': cacheMiss,

'memory': memUsed,

'operations': opsPer,

'nodes': nodeHealth

}

cb.upsert(str(thisNode) + "::" + nowStamp, json_str, ttl=2505600)

Couchbase의 모니터링 데이터를 사용하면 N1QL 쿼리 언어를 통해 어노멀리를 찾기 위해 데이터를 사용할 수 있습니다. 데이터 자체는 타임스탬프에 기반한 시계열 데이터 모델을 사용하고 있으므로 클러스터에서 일어나는 일을 쿼리할 수 있으며, KEY(meta.id)는 관심 있는 기간을 표시해 줍니다. 분석을 지원하는 데 필요한 인덱스에 대해 현명해야 할 수도 있지만, Couchbase 쿼리 언어인 N1QL을 활용하면 매우 쉽게 할 수 있습니다.

SELECT meta().id, operations FROM `testload` WHERE OOM = 0

1	SELECT meta().id, operations FROM `testload` WHERE OOM = 0

결론:

이것은 프로덕션에 적용해야 하는 것은 아니지만 Couchbase 환경을 모니터링하는 방법에 대한 지침을 제공해야 합니다. REST 인터페이스와 cbstats를 통해 많은 것을 사용할 수 있으며, 모두 비슷한 방식으로 소비하고 모니터링할 수 있습니다.

저스틴 마이클스, 솔루션 엔지니어, Couchbase

이 문서 공유하기

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

카우치베이스 모니터링

받은 편지함에서 카우치베이스 블로그 업데이트 받기

작성자

게시자 저스틴 마이클스

댓글 남기기 응답 취소

카우치베이스 카펠라를 시작할 준비가 되셨나요?

구축 시작

카펠라 무료 사용

연락하기