The service life-cycle interface
Stopping the service
Stopping a service can be achieved by either sending a SIGINT
<ctrl+c> or SIGTERM
signal to to the tomodachi
Python process, or by invoking the tomodachi.exit()
function, which will initiate the termination processing flow. The tomodachi.exit()
call can additionally take an optional exit code as an argument, which otherwise will default to use exit code 0.
SIGINT
signal (equivalent to using <ctrl+c>)SIGTERM
signaltomodachi.exit()
ortomodachi.exit(exit_code)
The process' exit code can also be altered by changing the value of tomodachi.SERVICE_EXIT_CODE
, however using tomodachi.exit
with an integer argument will override any previous value set to tomodachi.SERVICE_EXIT_CODE
.
Graceful shutdown
All above mentioned ways of initiating the termination flow of the service will perform a graceful shutdown of the service which will try to await open HTTP handlers and await currently running tasks using tomodachi's scheduling functionality as well as await tasks processing messages from queues such as AWS SQS or RabbitMQ.
Some tasks may timeout during termination according to used configuration (see options such as http.termination_grace_period_seconds
) if they are long running tasks. Additionally container handlers may impose additional timeouts for how long termination are allowed to take. If no ongoing tasks are to be awaited and the service lifecycle can be cleanly terminated the shutdown usually happens within milliseconds.
Function hooks for service lifecycle changes
To be able to initialize connections to external resources or to perform graceful shutdown of connections made by a service, there's a few functions a service can specify to hook into lifecycle changes of a service.
Magic function name | When is the function called? | What is suitable to put here |
---|---|---|
_start_service | Called before invokers / servers have started. | Initialize connections to databases, etc. |
_started_service | Called after invokers / server have started. | Start reporting or start tasks to run once. |
_stopping_service | Called on termination signal. | Cancel eventual internal long-running tasks. |
_stop_service | Called after tasks have gracefully finished. | Close connections to databases, etc. |
Changes to a service settings / configuration (by for example modifying the options
values) should be done in the __init__
function instead of in any of the lifecycle function hooks.
Good practice – in general, make use of the _start_service
(for setting up connections) in addition to the _stop_service
(to close connections) lifecycle hooks. The other hooks may be used for more uncommon use-cases.
import tomodachi
class Service(tomodachi.Service):
name = "example"
async def _start_service(self):
# The _start_service function is called during initialization,
# before consumers or an eventual HTTP server has started.
# It's suitable to setup or connect to external resources here.
return
async def _started_service(self):
# The _started_service function is called after invoker
# functions have been set up and the service is up and running.
# The service is ready to process messages and requests.
return
async def _stopping_service(self):
# The _stopping_service function is called the moment the
# service is instructed to terminate - usually this happens
# when a termination signal is received by the service.
# This hook can be used to cancel ongoing tasks or similar.
# Note that some tasks may be processing during this time.
return
async def _stop_service(self):
# Finally the _stop_service function is called after HTTP server,
# scheduled functions and consumers have gracefully stopped.
# Previously ongoing tasks have been awaited for completion.
# This is the place to close connections to external services and
# clean up eventual tasks you may have started previously.
return
Exceptions raised in _start_service
or _started_service
will gracefully terminate the service.
Graceful termination of a service (SIGINT
/ SIGTERM
)
SIGINT
/ SIGTERM
)When the service process receives a SIGINT
or SIGTERM
signal (or tomodachi.exit()
is called) the service begins the process for graceful termination, which in practice means:
- The service'
_stopping_service
method, if implemented, is called immediately upon the received signal. - The service stops accepting new HTTP connections and closes keep-alive HTTP connections at the earliest.
- Already established HTTP connections for which a handler call is awaited called are allowed to finish their work before the service stops (up to
options.http.termination_grace_period_seconds
seconds, after which the open TCP connections for those HTTP connections will be forcefully closed if still not completed). - Any AWS SQS / AMQP handlers (decorated with
@aws_sns_sqs
or@amqp
) will stop receiving new messages. However handlers already processing a received message will be awaited to return their result. Unlike the HTTP handler connections there is no grace period for these queue consuming handlers. - Currently running scheduled handlers will also be awaited to fully complete their execution before the service will terminates. No new scheduled handlers will be started.
- When all HTTP connections are closed, all scheduled handlers has completed and all pub-sub handlers have been awaited, the service'
_stop_service
method is finally called (if implemented), where for example database connections can be closed. When the_stop_service
method returns (or immediately after completion of handler invocations if any_stop_service
isn't implemented), the service will finally terminate.
It's recommended to use a http.termination_grace_period_seconds
options value of around 30 seconds to allow for the graceful termination of HTTP connections. This value can be adjusted based on the expected time it takes for the service to complete the processing of incoming request.
Make sure that the orchestration engine (such as Kubernetes) waits at least 30 seconds from sending the SIGTERM
to remove the pod. For extra compatibility when operating services in k8s and to get around most kind of edge-cases of intermittent timeouts and problems with ingress connections, (and unless your setup includes long running queue consuming handler calls which requires an even longer grace period), set the pod spec terminationGracePeriodSeconds
to 90
seconds and use a preStop
lifecycle hook of 20 seconds.
Keep the http.termination_grace_period_seconds
options value lower than the pod spec's terminationGracePeriodSeconds
value, as the latter is a hard limit for how long the pod will be allowed to run after receiving a SIGTERM
signal.
In a setup where long running queue consuming handler calls commonly occurs, any grace period the orchestration engine uses will have to take that into account. It's generally advised to split work up into sizeable chunks that can quickly complete or if handlers are idempotent, apply the possibility to cancel long running handlers as part of the _stopping_service
implementation.
Updated 8 months ago