We have a high availability setup in Azure for Passwordstate that consists of a Traffic Manager, and 2 sets of App Gateway load balanced clusters in different regions. We have encountered and issue with what happens when the DB is unreachable from a single region. Currently in Azure, the load balancing health checks available only look for "good" HTTP status codes and strings to determine health for the backend pools. In the case of Passwordstate, if one of the redundant servers/clusters behind the load balancing loses the ability to connect to the DB, the webserver still produces a good code, the health probe doesn't fail , and requests are still sent to the available server/cluster. There doesn't seem to be a way in Azure to probe, for example, a 200 webserver code and match a string from the DB error page to have the load balancer trip (again, it only operates on good results for a health check).
Any thoughts on how to solve for this? The limitations are in Azure, but I'm wondering if it makes more sense (or can have an option) to throw a true HTTP error code for that condition? A 500 or a 503 (I'd settle for a 418 just to get past this), just something outside of 200-399 range which is the default "good" range for the Azure App Gateways and Traffic Managers.
I'm digging into a string match, but as we use SAML and AAD for authentication, the number of redirects makes it difficult to determine what would be a consistent string in a page that won't be on the DB Error page that gets served. This is probably doable but seems like a long way to go.
Anyway, appreciate any thoughts here. Thanks.