In modern applications, continuous connectivity is key—especially for mobile apps relying on backend services. In this blog, we’ll walk through a Python-based solution that monitors the health of your app service servers and automatically fails over to a secondary server if needed. This sample code uses HTTP health checks and WebSocket connection endpoints to ensure that your application always connects to a healthy service.
Overview
The solution involves two types of endpoints:
-
- Health check URLs
- These endpoints (e.g.,
https://.../_ping
) are polled using HTTP HEAD requests. - They determine if the app service server is healthy.
- These endpoints (e.g.,
- Connection endpoints
- These are the WebSocket URLs (e.g.,
wss://.../primary
) that your application uses to interact with the backend. - The active connection endpoint is updated based on the health check results.
- These are the WebSocket URLs (e.g.,
- Health check URLs
If the primary server’s health check fails consecutively, the failover logic will switch the application’s connection to the secondary server.
The code in detail
Below is the complete code with inline comments and detailed explanations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
import logging import threading import requests from time import sleep # Configure logging to show time-stamped messages at INFO level. logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s: %(message)s') # -------------------------------------- # Health Check URLs (App Service Servers) # -------------------------------------- # These URLs are used for health checking the servers by sending HEAD requests. health_check_urls = { "primary": "https://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/_ping", "secondary": "https://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/_ping" } # ------------------------------------- # Connection Endpoints (WebSocket URLs) # ------------------------------------- # These endpoints are what your application actually uses for connections. connection_urls = { "primary": "wss://XXXXXXXXXXXXXX.apps.cloud.couchbase.com:4984/primary", "secondary": "wss://XXXXXXXXXXXXXX.apps.cloud.ucouchbase.com:4984/secondary" } # The variable `active_cluster` tracks which server is currently active. active_cluster = "primary" # This variable holds the actual WebSocket URL used by your application. active_connection_url = connection_urls[active_cluster] def is_cluster_healthy(url): """ Perform a health check using a HEAD request against the provided URL. Returns True if the response status is 200; otherwise, returns False. Logs the status code and headers for troubleshooting. """ try: response = requests.head(url, timeout=5) logging.info(f"Health Check Response for {url}") logging.info(f" Status Code: {response.status_code}") logging.info(" Headers:") for header, value in response.headers.items(): logging.info(f" {header}: {value}") if response.status_code == 200: logging.info(f"{url} is healthy!") return True else: logging.warning(f"{url} might be unhealthy or unreachable.") return False except requests.exceptions.RequestException as e: logging.error(f"Health check failed for {url}: {e}") return False def health_check_worker(): """ A background worker that checks the health of the active server every 3 seconds. If the active server fails health checks for more than 9 consecutive times, the worker attempts to switch to the other server. """ global active_cluster global active_connection_url consecutive_failures = 0 while True: sleep(3) # Wait 3 seconds between checks. # Use the HTTP health check endpoint for the active cluster. current_health_url = health_check_urls[active_cluster] logging.info(f"Health check: Checking {active_cluster} at {current_health_url}...") if is_cluster_healthy(current_health_url): consecutive_failures = 0 # Reset counter if healthy. else: consecutive_failures += 1 logging.warning(f"{active_cluster} health check failed {consecutive_failures} time(s).") # If failures exceed 9 consecutive attempts, try to fail over. if consecutive_failures > 9: logging.error(f"{active_cluster} is considered down. Attempting to fail over...") # Determine the new active cluster. new_cluster = "secondary" if active_cluster == "primary" else "primary" new_health_url = health_check_urls[new_cluster] # Check if the new cluster is healthy. if is_cluster_healthy(new_health_url): active_cluster = new_cluster # Update the connection endpoint. active_connection_url = connection_urls[new_cluster] logging.warning(f"Switched active cluster to {new_cluster}.") logging.warning(f"New WebSocket connection endpoint: {active_connection_url}") else: logging.critical("Both clusters appear to be down!") consecutive_failures = 0 # Reset the failure counter after the attempt. def main(): """ Main function to start the health-check worker in a background thread. Keeps the script running indefinitely until interrupted. """ thread = threading.Thread(target=health_check_worker, daemon=True) thread.start() logging.info("Health check worker started. Press Ctrl+C to exit.") logging.info(f"Application will initially connect to: {active_connection_url}") try: while True: sleep(1) # Main thread remains alive. except KeyboardInterrupt: logging.info("Shutting down health check script.") if __name__ == "__main__": main() |
Key technical points
-
- Health checks on App Service Servers:
The code separates the health-check endpoints (used for monitoring) from the connection endpoints (used by your application). This allows you to check server health independently while maintaining a stable connection endpoint. - HTTP HEAD Requests:
Using HEAD requests to the/_ping
endpoint minimizes data transfer while still providing status codes and headers for diagnostics. - Background Thread:
Thehealth_check_worker
runs in its own daemon thread, allowing continuous health monitoring without blocking the main application thread. - Failover Logic:
- A counter (
consecutive_failures
) tracks consecutive failures. - If the count exceeds a set threshold (9 failures), the script attempts a failover by checking the health of the alternate server.
- Upon a successful health check on the secondary server, the active connection endpoint is updated.
- A counter (
- Logging:
Detailed logging provides insights into the health check process, including HTTP response status, headers, and failover events. This aids in troubleshooting and monitoring.
- Health checks on App Service Servers:
Adapting for your application
-
- You can easily translate and adapt this code to your preferred programming language such as Swift and Kotlin to fit your application’s needs.
- You might integrate this script or logic into your mobile code (iOS/Android) or a backend service that updates the active endpoint.
- If you are on iOS or Android, consider how often and where you run this code. For example, background tasks or push notifications can trigger health checks in a mobile context.
- If you have a microservice architecture, you might run this failover logic in a small service that exposes a current active URL to the mobile apps, so they always connect to the correct WSS endpoint.
Conclusion
This sample code provides a straightforward yet powerful mechanism for ensuring high availability in applications by automatically failing over to a backup server when the primary becomes unreachable. By separating the health checks from the connection endpoints, the application ensures that it always connects to a healthy server via WebSocket.
In a production environment, you may need to adapt and extend the logic to suit your specific requirements, network conditions, and security policies.
Implementing this logic in your mobile application or backend service can greatly improve uptime and resilience, ensuring your users remain connected even during unexpected service interruptions.