Analyze Donald Trump Tweets with Couchbase and N1QL

AWS 서버리스 람다 예약 이벤트를 통해 트윗을 CouchBase에 저장하기 에서 AWS 서버리스 람다를 사용해 Couchbase에 트윗을 저장하는 방법을 설명했습니다. 이제 이 람다 함수는 며칠 동안 실행되어 다음에서 269개의 트윗을 수집했습니다. 리얼도널드트럼프. 이 블로그는 다음에서 영감을 받았습니다. 트위터의 SQL: N1QL을 사용한 손쉬운 분석에서는 N1QL을 사용하여 이러한 트윗을 분석하는 방법을 보여줍니다.

N1QL 의 SQL과 유사한 쿼리 언어입니다. 카우치베이스 JSON 문서에서 작동합니다. N1QL과 SQL의 차이점 는 N1QL과 SQL의 차이점을 설명합니다. N1QL을 사용하여 다음과 같은 흥미로운 정보를 공개해 보겠습니다. 리얼도널드트럼프의 트윗을 확인하세요. 다음 분들께 감사드립니다. N1QL 팀의 Sitaram 를 사용하여 쿼리를 해킹할 수 있습니다.

트윗 수

첫 번째 쿼리는 데이터베이스에 얼마나 많은 트윗이 있는지 확인하는 것입니다. 쿼리는 매우 간단합니다: 쿼리:

SELECT COUNT(*) tweet_count 
FROM twitter;

1 2	SELECT COUNT(*) tweet_count FROM twitter;

보시다시피 구문은 SQL과 매우 유사합니다. 선택, COUNT 그리고 FROM 절은 SQL 구문에서 이미 익숙한 것입니다. 트윗_수 는 반환된 결과에 대해 정의된 별칭입니다. 트위터 는 모든 JSON 문서가 저장되는 버킷입니다. 결과:

[
  {
    "tweet_count": 269
  }
]

[

{

"tweet_count": 269

}

]

결과도 JSON 문서가 됩니다.

트윗 샘플 JSON 문서

JSON 문서에 쿼리를 작성하려면 문서의 구조를 알아야 합니다. 다음 쿼리를 통해 이를 알 수 있습니다. 쿼리:

SELECT * 
FROM twitter 
LIMIT 1;

SELECT *

FROM twitter

LIMIT 1;

여기에 도입된 새로운 조항은 다음과 같습니다. LIMIT. 이를 통해 결과 집합에서 반환되는 객체의 수를 제한할 수 있습니다. 선택. 결과:

[
  {
    "twitter": {
      "accessLevel": "0",
      "contributors": [],
      "createdAt": "1480828438000",
      "currentUserRetweetId": "-1",
      "displayTextRangeEnd": "-1",
      "displayTextRangeStart": "-1",
      "favoriteCount": "116356",
      "favorited": false,
      "geoLocation": null,
      "hashtagEntities": [],
      "id": "805278955150471168",
      "inReplyToScreenName": null,
      "inReplyToStatusId": "-1",
      "inReplyToUserId": "-1",
      "lang": "en",
      "mediaEntities": [],
      "place": null,
      "possiblySensitive": false,
      "quotedStatus": null,
      "quotedStatusId": "-1",
      "rateLimitStatus": null,
      "retweet": false,
      "retweetCount": "28330",
      "retweeted": false,
      "retweetedByMe": false,
      "retweetedStatus": null,
      "scopes": null,
      "source": "<a href="https://twitter.com/download/android" rel="nofollow">Twitter for Android</a>",
      "symbolEntities": [],
      "text": "Just tried watching Saturday Night Live - unwatchable! Totally biased, not funny and the Baldwin impersonation just can't get any worse. Sad",
      "truncated": false,
      "urlentities": [],
      "user": {
        "accessLevel": "0",
        "biggerProfileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_bigger.jpg",
        "biggerProfileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_bigger.jpg",
        "contributorsEnabled": false,
        "createdAt": "1237383998000",
        "defaultProfile": false,
        "defaultProfileImage": false,
        "description": "President-elect of the United States",
        "descriptionURLEntities": [],
        "email": null,
        "favouritesCount": "46",
        "followRequestSent": false,
        "followersCount": "19294404",
        "friendsCount": "42",
        "geoEnabled": true,
        "id": "25073877",
        "lang": "en",
        "listedCount": "52499",
        "location": "New York, NY",
        "miniProfileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_mini.jpg",
        "miniProfileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_mini.jpg",
        "name": "Donald J. Trump",
        "originalProfileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2.jpg",
        "originalProfileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2.jpg",
        "profileBackgroundColor": "6D5C18",
        "profileBackgroundImageURL": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg",
        "profileBackgroundImageUrlHttps": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg",
        "profileBackgroundTiled": true,
        "profileBannerIPadRetinaURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/ipad_retina",
        "profileBannerIPadURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/ipad",
        "profileBannerMobileRetinaURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/mobile_retina",
        "profileBannerMobileURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/mobile",
        "profileBannerRetinaURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/web_retina",
        "profileBannerURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/web",
        "profileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg",
        "profileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg",
        "profileLinkColor": "0D5B73",
        "profileSidebarBorderColor": "BDDCAD",
        "profileSidebarFillColor": "C5CEC0",
        "profileTextColor": "333333",
        "profileUseBackgroundImage": true,
        "protected": false,
        "rateLimitStatus": null,
        "screenName": "realDonaldTrump",
        "showAllInlineMedia": false,
        "status": null,
        "statusesCount": "34269",
        "timeZone": "Eastern Time (US & Canada)",
        "translator": false,
        "url": "https://t.co/mZB2hymxC9",
        "urlentity": {
          "displayURL": "https://t.co/mZB2hymxC9",
          "end": "23",
          "expandedURL": "https://t.co/mZB2hymxC9",
          "start": "0",
          "text": "https://t.co/mZB2hymxC9",
          "url": "https://t.co/mZB2hymxC9"
        },
        "utcOffset": "-18000",
        "verified": true,
        "withheldInCountries": null
      },
      "userMentionEntities": [],
      "withheldInCountries": null
    }
  }
]

100

101

102

103

[

{

"twitter": {

"accessLevel": "0",

"contributors": [],

"createdAt": "1480828438000",

"currentUserRetweetId": "-1",

"displayTextRangeEnd": "-1",

"displayTextRangeStart": "-1",

"favoriteCount": "116356",

"favorited": false,

"geoLocation": null,

"hashtagEntities": [],

"id": "805278955150471168",

"inReplyToScreenName": null,

"inReplyToStatusId": "-1",

"inReplyToUserId": "-1",

"lang": "en",

"mediaEntities": [],

"place": null,

"possiblySensitive": false,

"quotedStatus": null,

"quotedStatusId": "-1",

"rateLimitStatus": null,

"retweet": false,

"retweetCount": "28330",

"retweeted": false,

"retweetedByMe": false,

"retweetedStatus": null,

"scopes": null,

"source": "<a href="https://twitter.com/download/android" rel="nofollow">Twitter for Android</a>",

"symbolEntities": [],

"text": "Just tried watching Saturday Night Live - unwatchable! Totally biased, not funny and the Baldwin impersonation just can't get any worse. Sad",

"truncated": false,

"urlentities": [],

"user": {

"accessLevel": "0",

"biggerProfileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_bigger.jpg",

"biggerProfileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_bigger.jpg",

"contributorsEnabled": false,

"createdAt": "1237383998000",

"defaultProfile": false,

"defaultProfileImage": false,

"description": "President-elect of the United States",

"descriptionURLEntities": [],

"email": null,

"favouritesCount": "46",

"followRequestSent": false,

"followersCount": "19294404",

"friendsCount": "42",

"geoEnabled": true,

"id": "25073877",

"lang": "en",

"listedCount": "52499",

"location": "New York, NY",

"miniProfileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_mini.jpg",

"miniProfileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_mini.jpg",

"name": "Donald J. Trump",

"originalProfileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2.jpg",

"originalProfileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2.jpg",

"profileBackgroundColor": "6D5C18",

"profileBackgroundImageURL": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg",

"profileBackgroundImageUrlHttps": "https://pbs.twimg.com/profile_background_images/530021613/trump_scotland__43_of_70_cc.jpg",

"profileBackgroundTiled": true,

"profileBannerIPadRetinaURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/ipad_retina",

"profileBannerIPadURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/ipad",

"profileBannerMobileRetinaURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/mobile_retina",

"profileBannerMobileURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/mobile",

"profileBannerRetinaURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/web_retina",

"profileBannerURL": "https://pbs.twimg.com/profile_banners/25073877/1479776952/web",

"profileImageURL": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg",

"profileImageURLHttps": "https://pbs.twimg.com/profile_images/1980294624/DJT_Headshot_V2_normal.jpg",

"profileLinkColor": "0D5B73",

"profileSidebarBorderColor": "BDDCAD",

"profileSidebarFillColor": "C5CEC0",

"profileTextColor": "333333",

"profileUseBackgroundImage": true,

"protected": false,

"rateLimitStatus": null,

"screenName": "realDonaldTrump",

"showAllInlineMedia": false,

"status": null,

"statusesCount": "34269",

"timeZone": "Eastern Time (US & Canada)",

"translator": false,

"url": "https://t.co/mZB2hymxC9",

"urlentity": {

"displayURL": "https://t.co/mZB2hymxC9",

"end": "23",

"expandedURL": "https://t.co/mZB2hymxC9",

"start": "0",

"text": "https://t.co/mZB2hymxC9",

"url": "https://t.co/mZB2hymxC9"

"utcOffset": "-18000",

"verified": true,

"withheldInCountries": null

"userMentionEntities": [],

"withheldInCountries": null

}

]

트윗 상위 5일

기본적인 쿼리가 끝나면 이제 몇 가지 흥미로운 데이터를 살펴보겠습니다. 상위 5일 중 리얼도널드트럼프 트윗한 트윗 수와 트윗 횟수는? 쿼리:

SELECT SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10) tweet_date, 
       COUNT(1) tweet_count
FROM   twitter 
GROUP  BY SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10) 
ORDER  BY COUNT(1) DESC 
LIMIT  5;

SELECT SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10) tweet_date,

COUNT(1) tweet_count

FROM twitter

GROUP BY SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 0, 10)

ORDER BY COUNT(1) DESC

LIMIT 5;

보통 그룹 기준 그리고 주문 기준 SQL 절은 동일한 기능을 수행합니다. N1QL 기능 값에 함수를 적용합니다. 값에 createdAt 필드에 숫자를 문자열로 반환합니다. TO_NUM 함수는 문자열을 숫자로 변환합니다. MILLIS_TO_STR 함수는 문자열을 날짜로 변환합니다. 마지막으로 SUBSTR 함수는 날짜의 관련 부분을 추출합니다. 결과:

[
  {
    "tweet_count": 13,
    "tweet_date": "2017-01-17"
  },
  {
    "tweet_count": 12,
    "tweet_date": "2017-01-06"
  },
  {
    "tweet_count": 11,
    "tweet_date": "2016-12-04"
  },
  {
    "tweet_count": 10,
    "tweet_date": "2017-01-03"
  },
  {
    "tweet_count": 10,
    "tweet_date": "2017-01-04"
  }
]

[

{

"tweet_count": 13,

"tweet_date": "2017-01-17"

{

"tweet_count": 12,

"tweet_date": "2017-01-06"

{

"tweet_count": 11,

"tweet_date": "2016-12-04"

{

"tweet_count": 10,

"tweet_date": "2017-01-03"

{

"tweet_count": 10,

"tweet_date": "2017-01-04"

}

]

2017년 1월 17일이 가장 많이 트윗된 날입니다. 물론 이 결과는 데이터베이스에 저장된 JSON 문서의 데이터에 국한된 것입니다. 더 포괄적인 @realDonaldTrump 트윗 데이터베이스를 가지고 계신 분 있나요?

트윗 빈도

좋아요, 데이터베이스에 따르면 하루 최대 트윗 수는 13건입니다. 리얼도널드트럼프가 특정 날짜에 트윗한 횟수를 확인하려면 어떻게 해야 하나요? 쿼리:

SELECT a.tweet_count, count(1) days FROM (
SELECT SUBSTR(millis_to_str(to_num(createdAt)), 0, 10) tweet_date, 
       COUNT(1) tweet_count
FROM   twitter 
GROUP  BY SUBSTR(millis_to_str(to_num(createdAt)), 0, 10)
) a
GROUP BY a.tweet_count
ORDER BY a.tweet_count DESC;

SELECT a.tweet_count, count(1) days FROM (

SELECT SUBSTR(millis_to_str(to_num(createdAt)), 0, 10) tweet_date,

COUNT(1) tweet_count

FROM twitter

GROUP BY SUBSTR(millis_to_str(to_num(createdAt)), 0, 10)

) a

GROUP BY a.tweet_count

ORDER BY a.tweet_count DESC;

이 작업은 다음을 사용하여 쉽게 수행할 수 있습니다. N1QL 중첩 쿼리. 결과:

[
  {
    "days": 1,
    "tweet_count": 13
  },
  {
    "days": 1,
    "tweet_count": 12
  },
  {
    "days": 1,
    "tweet_count": 11
  },
  {
    "days": 2,
    "tweet_count": 10
  },
  {
    "days": 1,
    "tweet_count": 9
  },
  {
    "days": 7,
    "tweet_count": 8
  },
  {
    "days": 3,
    "tweet_count": 7
  },
  {
    "days": 7,
    "tweet_count": 6
  },
  {
    "days": 5,
    "tweet_count": 5
  },
  {
    "days": 5,
    "tweet_count": 4
  },
  {
    "days": 11,
    "tweet_count": 3
  },
  {
    "days": 3,
    "tweet_count": 2
  },
  {
    "days": 1,
    "tweet_count": 1
  }
]

[

{

"days": 1,

"tweet_count": 13

{

"days": 1,

"tweet_count": 12

{

"days": 1,

"tweet_count": 11

{

"days": 2,

"tweet_count": 10

{

"days": 1,

"tweet_count": 9

{

"days": 7,

"tweet_count": 8

{

"days": 3,

"tweet_count": 7

{

"days": 7,

"tweet_count": 6

{

"days": 5,

"tweet_count": 5

{

"days": 5,

"tweet_count": 4

{

"days": 11,

"tweet_count": 3

{

"days": 3,

"tweet_count": 2

{

"days": 1,

"tweet_count": 1

}

]

47일 동안 트윗이 단 한 건도 없는 날은 단 하루뿐입니다. 총 합계는 트윗_수 를 보면 트윗이 없는 날이 하루도 없다는 것을 알 수 있습니다 :)

하루 중 가장 많이 트윗하는 시간

리얼도널드트럼프는 다음과 같이 알려져 있습니다. 새벽 3시에 트윗하기. 그가 가장 많이 트윗하는 시간이 언제인지 살펴봅시다. 쿼리:

SELECT SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 11, 2) tweet_hour, 
       COUNT(1) tweet_count
FROM   twitter 
GROUP  BY SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 11, 2) 
ORDER  BY tweet_count DESC 
LIMIT  5;

SELECT SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 11, 2) tweet_hour,

COUNT(1) tweet_count

FROM twitter

GROUP BY SUBSTR(MILLIS_TO_STR(TO_NUM(createdAt)), 11, 2)

ORDER BY tweet_count DESC

LIMIT 5;

결과:

[
  {
    "tweet_count": 39,
    "tweet_hour": "13"
  },
  {
    "tweet_count": 27,
    "tweet_hour": "12"
  },
  {
    "tweet_count": 26,
    "tweet_hour": "11"
  },
  {
    "tweet_count": 20,
    "tweet_hour": "14"
  },
  {
    "tweet_count": 15,
    "tweet_hour": "00"
  }
]

[

{

"tweet_count": 39,

"tweet_hour": "13"

{

"tweet_count": 27,

"tweet_hour": "12"

{

"tweet_count": 26,

"tweet_hour": "11"

{

"tweet_count": 20,

"tweet_hour": "14"

{

"tweet_count": 15,

"tweet_hour": "00"

}

]

이제 논란이 되는 트윗은 새벽 3시에 올라온 것 같습니다. 하지만 39개의 트윗은 점심 식사 직후 디저트를 먹고 있는 오후 1시(동부 표준시)에 올라오고 있습니다.

트윗할 일반적인 요일

트윗을 가장 많이 올리는 요일이 무엇인지 알아보세요. 쿼리:

SELECT DATE_PART_STR(MILLIS_TO_STR(TO_NUM(createdAt)), "day_of_week") day_of_week, 
       COUNT(1) tweet_count
FROM   twitter 
GROUP  BY DATE_PART_STR(MILLIS_TO_STR(TO_NUM(createdAt)), "day_of_week")
ORDER  BY tweet_count DESC;

SELECT DATE_PART_STR(MILLIS_TO_STR(TO_NUM(createdAt)), "day_of_week") day_of_week,

COUNT(1) tweet_count

FROM twitter

GROUP BY DATE_PART_STR(MILLIS_TO_STR(TO_NUM(createdAt)), "day_of_week")

ORDER BY tweet_count DESC;

DATE_PART_STR 는 날짜의 일부를 반환하는 새로운 함수입니다. 추가 요일_요일 속성을 사용하여 요일을 가져옵니다. 결과:

[
  {
    "day_of_week": 2,
    "tweet_count": 49
  },
  {
    "day_of_week": 3,
    "tweet_count": 40
  },
  {
    "day_of_week": 0,
    "tweet_count": 40
  },
  {
    "day_of_week": 5,
    "tweet_count": 38
  },
  {
    "day_of_week": 4,
    "tweet_count": 36
  },
  {
    "day_of_week": 6,
    "tweet_count": 33
  },
  {
    "day_of_week": 1,
    "tweet_count": 33
  }
]

[

{

"day_of_week": 2,

"tweet_count": 49

{

"day_of_week": 3,

"tweet_count": 40

{

"day_of_week": 0,

"tweet_count": 40

{

"day_of_week": 5,

"tweet_count": 38

{

"day_of_week": 4,

"tweet_count": 36

{

"day_of_week": 6,

"tweet_count": 33

{

"day_of_week": 1,

"tweet_count": 33

}

]

화요일이 가장 많이 트윗하는 날인 것 같습니다. 그 다음으로는 일요일과 수요일이 같은 수준입니다. 주말에 가까워질수록 실적이 떨어지는 경향이 있습니다.

#22417 는 평일 부분을 영어로 보고할 수 있어야 합니다.

트윗에서 언급된 상위 5개 멘션

쿼리:

SELECT COUNT(1) user_count, ue.screenName 
    FROM twitter 
    UNNEST userMentionEntities ue 
    GROUP by ue.screenName 
    ORDER by user_count DESC
    LIMIT 5;

SELECT COUNT(1) user_count, ue.screenName

FROM twitter

UNNEST userMentionEntities ue

GROUP by ue.screenName

ORDER by user_count DESC

LIMIT 5;

사용자 멘션 엔티티 는 JSON 문서의 중첩 배열입니다. UNNEST 는 개념적으로 중첩된 배열과 상위 객체의 조인을 수행합니다. 조인된 각 결과 객체는 쿼리의 입력이 됩니다. 결과:

[
  {
    "screenName": "realDonaldTrump",
    "user_count": 11
  },
  {
    "screenName": "FoxNews",
    "user_count": 7
  },
  {
    "screenName": "CNN",
    "user_count": 6
  },
  {
    "screenName": "NBCNews",
    "user_count": 5
  },
  {
    "screenName": "DanScavino",
    "user_count": 5
  }
]

[

{

"screenName": "realDonaldTrump",

"user_count": 11

{

"screenName": "FoxNews",

"user_count": 7

{

"screenName": "CNN",

"user_count": 6

{

"screenName": "NBCNews",

"user_count": 5

{

"screenName": "DanScavino",

"user_count": 5

}

]

말할 필요도 없이, 그는 트윗에서 자신의 이름을 가장 많이 언급합니다! 그리고 그가 가장 좋아하는 두 개의 TV 방송국 폭스 뉴스 그리고 CNN.

RT가 많은 트윗 상위 5개

람다 함수는 3시간마다 깨어나서 최신 트윗을 가져옵니다. 따라서 데이터베이스는 트윗과 RT 및 마음에 들어요와 같은 관련 정보의 스냅샷입니다. 따라서 트윗이 보관된 시점에 따라 RT 및 마음에 들어요가 정확하게 표시되지 않을 수 있습니다. 하지만 이 정보를 바탕으로 가장 많은 RT를 받은 트윗을 살펴봅시다. 쿼리:

SELECT retweetCount, text
FROM twitter
ORDER BY retweetCount
LIMIT 5;

SELECT retweetCount, text

FROM twitter

ORDER BY retweetCount

LIMIT 5;

매우 간단한 쿼리입니다. 결과:

[
  {
    "retweetCount": "10110",
    "text": "the American people. I have no doubt that we will, together, MAKE AMERICA GREAT AGAIN!"
  },
  {
    "retweetCount": "10140",
    "text": "Thank you to all of the men and women who protect & serve our communities 24/7/365! n#LawEnforcementAppreciationDay… https://t.co/aqUbDipSgv"
  },
  {
    "retweetCount": "10370",
    "text": "We had a great News Conference at Trump Tower today. A couple of FAKE NEWS organizations were there but the people truly get what's going on"
  },
  {
    "retweetCount": "10414",
    "text": "these companies are able to move between all 50 states, with no tax or tariff being charged. Please be forewarned prior to making a very ..."
  },
  {
    "retweetCount": "10416",
    "text": "Somebody hacked the DNC but why did they not have "hacking defense" like the RNC has and why have they not responded to the terrible......"
  }
]

[

{

"retweetCount": "10110",

"text": "the American people. I have no doubt that we will, together, MAKE AMERICA GREAT AGAIN!"

{

"retweetCount": "10140",

"text": "Thank you to all of the men and women who protect & serve our communities 24/7/365! n#LawEnforcementAppreciationDay… https://t.co/aqUbDipSgv"

{

"retweetCount": "10370",

"text": "We had a great News Conference at Trump Tower today. A couple of FAKE NEWS organizations were there but the people truly get what's going on"

{

"retweetCount": "10414",

"text": "these companies are able to move between all 50 states, with no tax or tariff being charged. Please be forewarned prior to making a very ..."

{

"retweetCount": "10416",

"text": "Somebody hacked the DNC but why did they not have "hacking defense" like the RNC has and why have they not responded to the terrible......"

}

]

원본과 RT

작성된 트윗과 리트윗된 트윗은 몇 건인가요? 쿼리:

SELECT retweet, count(1) count
FROM twitter
GROUP BY retweet;

SELECT retweet, count(1) count

FROM twitter

GROUP BY retweet;

결과:

[
  {
    "count": 253,
    "retweet": false
  },
  {
    "count": 15,
    "retweet": true
  }
]

[

{

"count": 253,

"retweet": false

{

"count": 15,

"retweet": true

}

]

대부분의 트윗은 리트윗이 거의 없는 원본 트윗입니다.

트윗에서 가장 많이 사용되는 단어

쿼리:

SELECT COUNT(1) count, word 
FROM twitter 
UNNEST SPLIT(text) word
GROUP BY word
ORDER BY count DESC;

SELECT COUNT(1) count, word

FROM twitter

UNNEST SPLIT(text) word

GROUP BY word

ORDER BY count DESC;

이 쿼리는 다음을 사용합니다. 분할 함수의 결과입니다:

[
  {
    "count": 189,
    "word": "the"
  },
  {
    "count": 151,
    "word": "to"
  },
  {
    "count": 115,
    "word": "and"
  },

  . . .

  {
    "count": 1,
    "word": "presented...Trump's"
  },
  {
    "count": 1,
    "word": "jobs."
  },
  {
    "count": 1,
    "word": "Doing"
  }
]

[

{

"count": 189,

"word": "the"

{

"count": 151,

"word": "to"

{

"count": 115,

"word": "and"

. . .

{

"count": 1,

"word": "presented...Trump's"

{

"count": 1,

"word": "jobs."

{

"count": 1,

"word": "Doing"

}

]

트윗에서 '미디어', '가짜', '미국'이라는 단어의 사용 빈도

쿼리:

SELECT COUNT(1) count, LOWER(w) word
FROM twitter  
UNNEST SPLIT(text) w  
WHERE LOWER(w) IN [ "media", "fake", "america"] 
GROUP by LOWER(w) 
ORDER BY count DESC;

SELECT COUNT(1) count, LOWER(w) word

FROM twitter

UNNEST SPLIT(text) w

WHERE LOWER(w) IN [ "media", "fake", "america"]

GROUP by LOWER(w)

ORDER BY count DESC;

LOWER 함수는 대소문자에 관계없이 단어를 비교하는 데 사용됩니다. 결과:

[
  {
    "count": 12,
    "word": "media"
  },
  {
    "count": 9,
    "word": "fake"
  },
  {
    "count": 8,
    "word": "america"
  }
]

[

{

"count": 12,

"word": "media"

{

"count": 9,

"word": "fake"

{

"count": 8,

"word": "america"

}

]

람다 함수 는 계속해서 트윗을 데이터베이스에 저장합니다.

이 쿼리를 직접 사용해 보시겠습니까?

Couchbase 서버 시작
아카이브 사용 twitter-backups-2017-01-20-06-07-49.tar 에서 설명한 대로 카우치베이스에 데이터 복원
사용 쿼리 워크벤치 를 사용하여 쿼리를 실행합니다.

N1QL 참조

Arun Gupta, 개발자 지원 부문 부사장, Couchbase

이 문서 공유하기

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

Couchbase와 N1QL로 도널드 트럼프 트윗 분석하기

트윗 수

트윗 샘플 JSON 문서

트윗 상위 5일

트윗 빈도

하루 중 가장 많이 트윗하는 시간

트윗할 일반적인 요일

트윗에서 언급된 상위 5개 멘션

RT가 많은 트윗 상위 5개

원본과 RT

트윗에서 가장 많이 사용되는 단어

트윗에서 '미디어', '가짜', '미국'이라는 단어의 사용 빈도

이 쿼리를 직접 사용해 보시겠습니까?

N1QL 참조

받은 편지함에서 카우치베이스 블로그 업데이트 받기

작성자

게시자 아룬 굽타

댓글 남기기 응답 취소

카우치베이스 카펠라를 시작할 준비가 되셨나요?

구축 시작

카펠라 무료 사용

연락하기