Documentation
tomodachi
Documentation

The service life-cycle interface

Stopping the service

Stopping a service can be achieved by either sending a SIGINT <ctrl+c> or SIGTERM signal to to the tomodachi Python process, or by invoking the tomodachi.exit() function, which will initiate the termination processing flow. The tomodachi.exit() call can additionally take an optional exit code as an argument, which otherwise will default to use exit code 0.

  • SIGINT signal (equivalent to using <ctrl+c>)
  • SIGTERM signal
  • tomodachi.exit() or tomodachi.exit(exit_code)

The process' exit code can also be altered by changing the value of tomodachi.SERVICE_EXIT_CODE, however using tomodachi.exit with an integer argument will override any previous value set to tomodachi.SERVICE_EXIT_CODE.

Graceful shutdown

All above mentioned ways of initiating the termination flow of the service will perform a graceful shutdown of the service which will try to await open HTTP handlers and await currently running tasks using tomodachi's scheduling functionality as well as await tasks processing messages from queues such as AWS SQS or RabbitMQ.

Some tasks may timeout during termination according to used configuration (see options such as http.termination_grace_period_seconds) if they are long running tasks. Additionally container handlers may impose additional timeouts for how long termination are allowed to take. If no ongoing tasks are to be awaited and the service lifecycle can be cleanly terminated the shutdown usually happens within milliseconds.

Function hooks for service lifecycle changes

To be able to initialize connections to external resources or to perform graceful shutdown of connections made by a service, there's a few functions a service can specify to hook into lifecycle changes of a service.

Magic function nameWhen is the function called?What is suitable to put here
_start_serviceCalled before invokers / servers have started.Initialize connections to databases, etc.
_started_serviceCalled after invokers / server have started.Start reporting or start tasks to run once.
_stopping_serviceCalled on termination signal.Cancel eventual internal long-running tasks.
_stop_serviceCalled after tasks have gracefully finished.Close connections to databases, etc.

Changes to a service settings / configuration (by for example modifying the options values) should be done in the __init__ function instead of in any of the lifecycle function hooks.

Good practice – in general, make use of the _start_service (for setting up connections) in addition to the _stop_service (to close connections) lifecycle hooks. The other hooks may be used for more uncommon use-cases.

import tomodachi


class Service(tomodachi.Service):
    name = "example"

    async def _start_service(self):
        # The _start_service function is called during initialization,
        # before consumers or an eventual HTTP server has started.
        # It's suitable to setup or connect to external resources here.
        return

    async def _started_service(self):
        # The _started_service function is called after invoker
        # functions have been set up and the service is up and running.
        # The service is ready to process messages and requests.
        return

    async def _stopping_service(self):
        # The _stopping_service function is called the moment the
        # service is instructed to terminate - usually this happens
        # when a termination signal is received by the service.
        # This hook can be used to cancel ongoing tasks or similar.
        # Note that some tasks may be processing during this time.
        return

    async def _stop_service(self):
        # Finally the _stop_service function is called after HTTP server,
        # scheduled functions and consumers have gracefully stopped.
        # Previously ongoing tasks have been awaited for completion.
        # This is the place to close connections to external services and
        # clean up eventual tasks you may have started previously.
        return

Exceptions raised in _start_service or _started_service will gracefully terminate the service.

Graceful termination of a service (SIGINT / SIGTERM)

When the service process receives a SIGINT or SIGTERM signal (or tomodachi.exit() is called) the service begins the process for graceful termination, which in practice means:

  • The service' _stopping_service method, if implemented, is called immediately upon the received signal.
  • The service stops accepting new HTTP connections and closes keep-alive HTTP connections at the earliest.
  • Already established HTTP connections for which a handler call is awaited called are allowed to finish their work before the service stops (up to options.http.termination_grace_period_seconds seconds, after which the open TCP connections for those HTTP connections will be forcefully closed if still not completed).
  • Any AWS SQS / AMQP handlers (decorated with @aws_sns_sqs or @amqp) will stop receiving new messages. However handlers already processing a received message will be awaited to return their result. Unlike the HTTP handler connections there is no grace period for these queue consuming handlers.
  • Currently running scheduled handlers will also be awaited to fully complete their execution before the service will terminates. No new scheduled handlers will be started.
  • When all HTTP connections are closed, all scheduled handlers has completed and all pub-sub handlers have been awaited, the service' _stop_service method is finally called (if implemented), where for example database connections can be closed. When the _stop_service method returns (or immediately after completion of handler invocations if any _stop_service isn't implemented), the service will finally terminate.

It's recommended to use a http.termination_grace_period_seconds options value of around 30 seconds to allow for the graceful termination of HTTP connections. This value can be adjusted based on the expected time it takes for the service to complete the processing of incoming request.

Make sure that the orchestration engine (such as Kubernetes) waits at least 30 seconds from sending the SIGTERM to remove the pod. For extra compatibility when operating services in k8s and to get around most kind of edge-cases of intermittent timeouts and problems with ingress connections, (and unless your setup includes long running queue consuming handler calls which requires an even longer grace period), set the pod spec terminationGracePeriodSeconds to 90 seconds and use a preStop lifecycle hook of 20 seconds.

Keep the http.termination_grace_period_seconds options value lower than the pod spec's terminationGracePeriodSeconds value, as the latter is a hard limit for how long the pod will be allowed to run after receiving a SIGTERM signal.

In a setup where long running queue consuming handler calls commonly occurs, any grace period the orchestration engine uses will have to take that into account. It's generally advised to split work up into sizeable chunks that can quickly complete or if handlers are idempotent, apply the possibility to cancel long running handlers as part of the _stopping_service implementation.