|
| 1 | +# Dynamic supervision |
| 2 | + |
| 3 | +In the previous chapter, we had a constant number of processes running simultaneously. In many use cases, we want to dynamically spawn processes as we need. For that, we can use `Supervisor.start_child`. Firstly, we spawn a supervisor. It doesn't have any child processes for now: |
| 4 | + |
| 5 | +```elixir |
| 6 | +child_specs = [] |
| 7 | +{:ok, supervisor} = Supervisor.start_link(child_specs, strategy: :one_for_one) |
| 8 | +``` |
| 9 | + |
| 10 | +Then, we spawn a task under the supervisor. `Task` module implements a `child_spec/1` function, so we can pass `{Task, fn -> ... end}` as a child spec: |
| 11 | + |
| 12 | +```elixir |
| 13 | +Supervisor.start_child(supervisor, { |
| 14 | + Task, |
| 15 | + fn -> |
| 16 | + IO.puts("Hello from a task spawned dynamically under a supervisor") |
| 17 | + end |
| 18 | +}) |
| 19 | +``` |
| 20 | + |
| 21 | +💡 Run the above snippet a few times to spawn more tasks. |
| 22 | + |
| 23 | +As you can see, the `Supervisor` allows dynamically spawning children. However, due to the performance characteristics, it's better to use `DynamicSupervisor` for such use cases, especially if there can be a lot of child processes at some point. From the API perspective, the `DynamicSupervisor` is similar to the `Supervisor`. Here are the main differences: |
| 24 | +- `DynamicSupervisor` doesn't allow spawning any children at startup - `DynamicSupervisor.start_child/2` is the only option. |
| 25 | +- The only supported strategy is `:one_for_one` - that's because other strategies don't make much sense and would reduce performance. |
| 26 | + |
| 27 | +💡 Change the above snippets to use `DynamicSupervisor`. Note that `DynamicSupervisor.start_link/1` doesn't accept the `child_specs` argument, and `DynamicSupervisor.start_child/2` must be used instead of `Supervisor.start_child/2`. |
| 28 | + |
| 29 | +## Example: Job queue |
| 30 | + |
| 31 | +As an example for dynamic supervision, we'll create a very simple job queue. It's going to be a GenServer receiving calls with jobs (which are just anonymous functions). For each job, the queue spawns a task, runs the job in there and sends the result back. |
| 32 | + |
| 33 | +```elixir |
| 34 | +defmodule JobQueue do |
| 35 | + use GenServer |
| 36 | + |
| 37 | + @type job_result :: any() |
| 38 | + @type job :: (() -> job_result()) |
| 39 | + |
| 40 | + def start_link(options) do |
| 41 | + # Using the module name as a name for the process |
| 42 | + # is a common pattern. |
| 43 | + GenServer.start_link(__MODULE__, options, name: __MODULE__) |
| 44 | + end |
| 45 | + |
| 46 | + @spec schedule_job(job()) :: job_result() |
| 47 | + def schedule_job(data) do |
| 48 | + GenServer.call(__MODULE__, {:schedule_job, data}) |
| 49 | + end |
| 50 | + |
| 51 | + @impl true |
| 52 | + def init(_options) do |
| 53 | + {:ok, %{}} |
| 54 | + end |
| 55 | + |
| 56 | + @impl true |
| 57 | + def handle_call({:schedule_job, job}, from, state) do |
| 58 | + # Prepare a spec for the task that will handle the job |
| 59 | + # and send the result back. |
| 60 | + # We don't want to do that in this GenServer, |
| 61 | + # as it could become a bottleneck. |
| 62 | + task_spec = {Task, fn -> |
| 63 | + # Note that we're passing `from`, the second argument |
| 64 | + # of handle_call/3. It allows replying the call |
| 65 | + # from another process. |
| 66 | + run_job(from, job) |
| 67 | + end} |
| 68 | + |
| 69 | + # Start the task under a dynamic supervisor |
| 70 | + DynamicSupervisor.start_child(JobSupervisor, task_spec) |
| 71 | + |
| 72 | + # Despite it's handle_call, we return :noreply tuple, |
| 73 | + # because run_job/2 takes care of replying. |
| 74 | + {:noreply, state} |
| 75 | + end |
| 76 | + |
| 77 | + defp run_job(from, job) do |
| 78 | + # Run the actual job |
| 79 | + result = job.() |
| 80 | + |
| 81 | + # This is equivalent of returning a :reply |
| 82 | + # tuple from handle_call/3, but we can call |
| 83 | + # it from anywhere. |
| 84 | + GenServer.reply(from, result) |
| 85 | + end |
| 86 | +end |
| 87 | +``` |
| 88 | + |
| 89 | +A real-world job queues have a lot of features we didn't implement, but the core idea is the same: there's a job scheduler that delegates work to short-lived processes. Thanks to the Erlang VM, this simple architecture scales very well. |
| 90 | + |
| 91 | +Let's start our queue: |
| 92 | + |
| 93 | +```elixir |
| 94 | +# Check if the supervisor is already running and if so, stop it. |
| 95 | +# This makes it possible to avoid a name conflict when you rerun this cell. |
| 96 | +if Process.whereis(:my_app_supervisor) do |
| 97 | + Supervisor.stop(:my_app_supervisor) |
| 98 | +end |
| 99 | + |
| 100 | +child_specs = [{DynamicSupervisor, name: JobSupervisor}, JobQueue] |
| 101 | +Supervisor.start_link(child_specs, strategy: :one_for_one, name: :my_app_supervisor) |
| 102 | +``` |
| 103 | + |
| 104 | +Note that job queue and job supervisor are spawned under another, top-level supervisor. Our architecture now forms a tree: |
| 105 | + |
| 106 | +```text |
| 107 | + my_app_supervisor |
| 108 | + | | |
| 109 | + V V |
| 110 | + JobSupervisor JobQueue |
| 111 | + | | | |
| 112 | + V V V |
| 113 | +Job1 Job2 Job3 ... |
| 114 | +``` |
| 115 | + |
| 116 | +Such a tree is called a _supervision tree_. The nodes are supervisors, the leafs are workers, and the edges represent supervision relationship. Supervision trees are common and convenient way of organizing Elixir applications in a fault-tolerant way. |
| 117 | + |
| 118 | +Since we started our queue, let's make it run some jobs: |
| 119 | + |
| 120 | +```elixir |
| 121 | +JobQueue.schedule_job( |
| 122 | + fn -> |
| 123 | + IO.puts("#{inspect(self())}: Running a job") |
| 124 | + Process.sleep(100) |
| 125 | + "Job result" |
| 126 | + end |
| 127 | +) |
| 128 | +``` |
| 129 | + |
| 130 | +This job is quite simple, but we could run more complex jobs, like querying a database, that could potentially fail. Let's simulate that: the cell below runs a job that has ~30% failure rate: |
| 131 | + |
| 132 | +<!-- langtour:{"test_replace_code":"JobQueue.schedule_job(fn -> \"Job result\" end)"} --> |
| 133 | +```elixir |
| 134 | +JobQueue.schedule_job( |
| 135 | + fn -> |
| 136 | + IO.puts("#{inspect(self())}: Running a job") |
| 137 | + |
| 138 | + # :rand.uniform() returns a value from 0 to 1 |
| 139 | + # from a uniform distribution |
| 140 | + if :rand.uniform() > 0.7 do |
| 141 | + raise "Job failure" |
| 142 | + end |
| 143 | + |
| 144 | + "Job result" |
| 145 | + end |
| 146 | +) |
| 147 | +``` |
| 148 | + |
| 149 | +💡 Keep re-running the cell above until it fails |
| 150 | + |
| 151 | +As you can see, when the job fails, the caller fails with a timeout. The task failed and the call was never replied to. It's not our desired behavior - we'd want the supervisor to restart the job. Do you have an idea why it didn't? |
| 152 | + |
| 153 | +The reason is `Task.child_spec/1` - it sets the restart mode to `:temporary`, which makes tasks not restarted by default. |
| 154 | + |
| 155 | +💡 Let's fix it by changing the code of the JobQueue where it spawns the task under the dynamic supervisor. Use [Supervisor.child_spec/2](https://hexdocs.pm/elixir/Supervisor.html#child_spec/2) to convert the task spec, so that restart mode is `:transient`. Rerun the above cell again, until the task fails - it should now be restarted as expected. |
0 commit comments