ToolsLib services status

  1. 2017-02-15 09:35:00
    [Fixed] ToolsLib outage

    The whole platform is unreachable for a few minutes. We’re investigating and trying to recover asap.

    [09:35] First frontend to be unreachable.

    [09:40] All frontends are unreachable.

    [09:55] The hosting provider explains that the issue is due to a DHCPv6 flood - our monitoring doesn’t show an unusual DHCPv6 traffic.

    [10:05] Servers are booting in rescue-mode to investigate further.

    [10:20] They are unreachable even in rescue mode. The support is investigating. We start to migrate the DNS record to a new frontend.

    [10:51] The support ticket is escalated to an upper level, and we are asking to let the servers down without any ETA to get a proper fix.

    [11:05] Another independant frontend is deployed for HTTP only - services are running but in a degraded mode.

    [11:18] HTTPS deployed, all services are up and no longer running in a degraded mode.

    [07:30] All servers are reachable again, but only in rescue mode. We’re asked to disable IPv6.

    [08:30] Still issues on the internal network.

    [09:00] Issue resolved, all servers are up and running. Beginning the DNS migration back the default configuration, without IPv6 support.

    [10:00] Migration done. Still working with the hosting provider to get more details.

    [2017/02/20] All actions have been applied on both side to avoid any further downtime because of this.