Ensuring High Availability with Automatic Failover for App Services

Nos aplicativos modernos, a conectividade contínua é fundamental, especialmente para aplicativos móveis que dependem de serviços de back-end. Neste blog, apresentaremos uma solução baseada em Python que monitora a integridade dos servidores de serviço do seu aplicativo e faz o failover automático para um servidor secundário, se necessário. Esse código de exemplo usa verificações de integridade HTTP e pontos de extremidade de conexão WebSocket para garantir que seu aplicativo sempre se conecte a um serviço íntegro.

Visão geral

A solução envolve dois tipos de pontos de extremidade:

1. URLs de verificação de integridade
  - Esses pontos de extremidade (por exemplo, https://.../_ping) são pesquisados usando HTTP CABEÇA solicitações.
  - Eles determinam se o servidor de serviço de aplicativos está íntegro.
2. Pontos de extremidade de conexão
  - Esses são os URLs do WebSocket (por exemplo, wss://.../primary) que seu aplicativo usa para interagir com o back-end.
  - O ponto de extremidade da conexão ativa é atualizado com base nos resultados da verificação de integridade.

Se a verificação de integridade do servidor primário falhar consecutivamente, a lógica de failover mudará a conexão do aplicativo para o servidor secundário.

O código em detalhes

Abaixo está o código completo com comentários em linha e explicações detalhadas:

import logging
import threading
import requests
from time import sleep

# Configure logging to show time-stamped messages at INFO level.
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s: %(message)s')

# --------------------------------------
# Health Check URLs (App Service Servers)
# --------------------------------------
# These URLs are used for health checking the servers by sending HEAD requests.
health_check_urls = {
    "primary": "https://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/_ping",
    "secondary": "https://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/_ping"
}

# -------------------------------------
# Connection Endpoints (WebSocket URLs)
# -------------------------------------
# These endpoints are what your application actually uses for connections.
connection_urls = {
    "primary": "wss://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/primary",
    "secondary": "wss://XXXXXXXXXXXXXX.apps.cloud.ucouchbase.com:4984/secondary"
}

# The variable `active_cluster` tracks which server is currently active.
active_cluster = "primary"

# This variable holds the actual WebSocket URL used by your application.
active_connection_url = connection_urls[active_cluster]

def is_cluster_healthy(url):
    """
    Perform a health check using a HEAD request against the provided URL.
    Returns True if the response status is 200; otherwise, returns False.
    Logs the status code and headers for troubleshooting.
    """
    try:
        response = requests.head(url, timeout=5)
        logging.info(f"Health Check Response for {url}")
        logging.info(f"  Status Code: {response.status_code}")
        logging.info("  Headers:")
        for header, value in response.headers.items():
            logging.info(f"    {header}: {value}")

        if response.status_code == 200:
            logging.info(f"{url} is healthy!")
            return True
        else:
            logging.warning(f"{url} might be unhealthy or unreachable.")
            return False
    except requests.exceptions.RequestException as e:
        logging.error(f"Health check failed for {url}: {e}")
        return False

def health_check_worker():
    """
    A background worker that checks the health of the active server every 3 seconds.
    If the active server fails health checks for more than 9 consecutive times,
    the worker attempts to switch to the other server.
    """
    global active_cluster
    global active_connection_url

    consecutive_failures = 0

    while True:
        sleep(3)  # Wait 3 seconds between checks.

        # Use the HTTP health check endpoint for the active cluster.
        current_health_url = health_check_urls[active_cluster]
        logging.info(f"Health check: Checking {active_cluster} at {current_health_url}...")

        if is_cluster_healthy(current_health_url):
            consecutive_failures = 0  # Reset counter if healthy.
        else:
            consecutive_failures += 1
            logging.warning(f"{active_cluster} health check failed {consecutive_failures} time(s).")

            # If failures exceed 9 consecutive attempts, try to fail over.
            if consecutive_failures > 9:
                logging.error(f"{active_cluster} is considered down. Attempting to fail over...")

                # Determine the new active cluster.
                new_cluster = "secondary" if active_cluster == "primary" else "primary"
                new_health_url = health_check_urls[new_cluster]

                # Check if the new cluster is healthy.
                if is_cluster_healthy(new_health_url):
                    active_cluster = new_cluster
                    # Update the connection endpoint.
                    active_connection_url = connection_urls[new_cluster]
                    logging.warning(f"Switched active cluster to {new_cluster}.")
                    logging.warning(f"New WebSocket connection endpoint: {active_connection_url}")
                else:
                    logging.critical("Both clusters appear to be down!")

                consecutive_failures = 0  # Reset the failure counter after the attempt.

def main():
    """
    Main function to start the health-check worker in a background thread.
    Keeps the script running indefinitely until interrupted.
    """
    thread = threading.Thread(target=health_check_worker, daemon=True)
    thread.start()

    logging.info("Health check worker started. Press Ctrl+C to exit.")
    logging.info(f"Application will initially connect to: {active_connection_url}")

    try:
        while True:
            sleep(1)  # Main thread remains alive.
    except KeyboardInterrupt:
        logging.info("Shutting down health check script.")

if __name__ == "__main__":
    main()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

import logging

import threading

import requests

from time import sleep

# Configure logging to show time-stamped messages at INFO level.

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s: %(message)s')

# --------------------------------------

# Health Check URLs (App Service Servers)

# --------------------------------------

# These URLs are used for health checking the servers by sending HEAD requests.

health_check_urls = {

"primary": "https://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/_ping",

"secondary": "https://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/_ping"

}

# -------------------------------------

# Connection Endpoints (WebSocket URLs)

# -------------------------------------

# These endpoints are what your application actually uses for connections.

connection_urls = {

"primary": "wss://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/primary",

"secondary": "wss://XXXXXXXXXXXXXX.apps.cloud.ucouchbase.com:4984/secondary"

}

# The variable `active_cluster` tracks which server is currently active.

active_cluster = "primary"

# This variable holds the actual WebSocket URL used by your application.

active_connection_url = connection_urls[active_cluster]

def is_cluster_healthy(url):

"""

Perform a health check using a HEAD request against the provided URL.

Returns True if the response status is 200; otherwise, returns False.

Logs the status code and headers for troubleshooting.

"""

try:

response = requests.head(url, timeout=5)

logging.info(f"Health Check Response for {url}")

logging.info(f" Status Code: {response.status_code}")

logging.info(" Headers:")

for header, value in response.headers.items():

logging.info(f" {header}: {value}")

if response.status_code == 200:

logging.info(f"{url} is healthy!")

return True

else:

logging.warning(f"{url} might be unhealthy or unreachable.")

return False

except requests.exceptions.RequestException as e:

logging.error(f"Health check failed for {url}: {e}")

return False

def health_check_worker():

"""

A background worker that checks the health of the active server every 3 seconds.

If the active server fails health checks for more than 9 consecutive times,

the worker attempts to switch to the other server.

"""

global active_cluster

global active_connection_url

consecutive_failures = 0

while True:

sleep(3) # Wait 3 seconds between checks.

# Use the HTTP health check endpoint for the active cluster.

current_health_url = health_check_urls[active_cluster]

logging.info(f"Health check: Checking {active_cluster} at {current_health_url}...")

if is_cluster_healthy(current_health_url):

consecutive_failures = 0 # Reset counter if healthy.

else:

consecutive_failures += 1

logging.warning(f"{active_cluster} health check failed {consecutive_failures} time(s).")

# If failures exceed 9 consecutive attempts, try to fail over.

if consecutive_failures > 9:

logging.error(f"{active_cluster} is considered down. Attempting to fail over...")

# Determine the new active cluster.

new_cluster = "secondary" if active_cluster == "primary" else "primary"

new_health_url = health_check_urls[new_cluster]

# Check if the new cluster is healthy.

if is_cluster_healthy(new_health_url):

active_cluster = new_cluster

# Update the connection endpoint.

active_connection_url = connection_urls[new_cluster]

logging.warning(f"Switched active cluster to {new_cluster}.")

logging.warning(f"New WebSocket connection endpoint: {active_connection_url}")

else:

logging.critical("Both clusters appear to be down!")

consecutive_failures = 0 # Reset the failure counter after the attempt.

def main():

"""

Main function to start the health-check worker in a background thread.

Keeps the script running indefinitely until interrupted.

"""

thread = threading.Thread(target=health_check_worker, daemon=True)

thread.start()

logging.info("Health check worker started. Press Ctrl+C to exit.")

logging.info(f"Application will initially connect to: {active_connection_url}")

try:

while True:

sleep(1) # Main thread remains alive.

except KeyboardInterrupt:

logging.info("Shutting down health check script.")

if __name__ == "__main__":

main()

Principais pontos técnicos

- Verificações de integridade nos servidores de serviço de aplicativos:
  O código separa o pontos de extremidade de verificação de integridade (usado para monitoramento) do pontos finais de conexão (usado pelo seu aplicativo). Isso permite que você verifique a integridade do servidor de forma independente e, ao mesmo tempo, mantenha um endpoint de conexão estável.
- Solicitações HTTP HEAD:
  Usando CABEÇA solicitações para o/_ping minimiza a transferência de dados e ainda fornece códigos de status e cabeçalhos para diagnóstico.
- Linha de fundo:
  O health_check_worker é executado em seu próprio thread de daemon, permitindo o monitoramento contínuo da integridade sem bloquear o thread principal do aplicativo.
- Lógica de failover:
  - Um contador (falhas_consecutivas) rastreia falhas consecutivas.
  - Se a contagem exceder um limite definido (9 falhas), o script tentará um failover verificando a integridade do servidor alternativo.
  - Após uma verificação de integridade bem-sucedida no servidor secundário, o ponto de extremidade da conexão ativa é atualizado.
- Registro em log:
  O registro detalhado fornece informações sobre o processo de verificação de integridade, incluindo status de resposta HTTP, cabeçalhos e eventos de failover. Isso ajuda na solução de problemas e no monitoramento.

Adaptação para sua aplicação

- Você pode facilmente traduzir e adaptar esse código à sua linguagem de programação preferida, como Swift e Kotlin, para atender às necessidades do seu aplicativo.
- Você pode integrar esse script ou lógica em seu código móvel (iOS/Android) ou um serviço de backend que atualiza o ativo ponto final.
- Se você estiver no iOS ou Android, considere com que frequência e onde você executa esse código. Por exemplo, tarefas em segundo plano ou notificações push podem acionar verificações de integridade em um contexto móvel.
- Se você tiver uma arquitetura de microsserviço, poderá executar essa lógica de failover em um pequeno serviço que expõe um URL ativo atual para os aplicativos móveis, para que eles sempre se conectem ao endpoint correto do WSS.

Conclusão

Este código de amostra fornece um mecanismo simples, porém eficiente, para garantir a alta disponibilidade dos aplicativos por meio de falha automática em um servidor de backup quando o principal se torna inacessível. Ao separar as verificações de integridade dos pontos de extremidade de conexão, o aplicativo garante que sempre se conecte a um servidor íntegro via WebSocket.

Em um ambiente de produção, talvez seja necessário adaptar e estender a lógica para atender aos seus requisitos específicos, às condições da rede e às políticas de segurança.

A implementação dessa lógica em seu aplicativo móvel ou serviço de back-end pode melhorar muito o tempo de atividade e a resiliência, garantindo que seus usuários permaneçam conectados mesmo durante interrupções inesperadas do serviço.

Compartilhe este artigo

Platform

Self-Managed

Services

Capabilities

Why Couchbase?

Migrate to Capella

By Use Case

By Industry

By Application Need

Popular Docs

By Developer Role

Quickstart

Resource Center

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

Garantia de alta disponibilidade com failover automático para serviços de aplicativos

Visão geral

O código em detalhes

Principais pontos técnicos

Adaptação para sua aplicação

Conclusão

Receba atualizações do blog do Couchbase em sua caixa de entrada

Autor

Postado por Nishant Bhatia - Arquiteto de nuvem

Deixe um comentário Cancelar resposta

Pronto para começar a usar o Couchbase Capella?

Iniciar a construção

Use o Capella gratuitamente

Entre em contato