Understanding tokio::spawn and tokio::spawn_blocking

kdheepak · July 2, 2024, 11:33pm

I’m getting some unexpected results with tokio::spawn_blocking and wanted to open a post to discuss it.

I wanted an example to show how spawn_blocking and spawn work and when I should use one over the other but I’m getting code that doesn’t halt with tokio::spawn_blocking.

1 Spawn `fib` and run print to stdout in `main`:

    let r = tokio::task::spawn(async { fib(NUMBER) });

    while !r.is_finished() {
        let s: String = rand::thread_rng()
            .sample_iter(&Alphanumeric)
            .take(1000)
            .map(char::from)
            .collect();
        println!("{}", s);
    }

    let r = r.await?;

78.52s user 2.01s system 198% cpu 40.498 total

2 Spawn `fib` and sleep in `main`:

    let r = tokio::task::spawn(async { fib(NUMBER) });

    while !r.is_finished() {
        tokio::time::sleep(tokio::time::Duration::from_millis(20)).await;
    }

    let r = r.await?;

39.34s user 0.03s system 99% cpu 39.521 total

3 Spawn blocking `fib` and print to stdout in main:

    let r = tokio::task::spawn_blocking(|| fib(NUMBER));

    while !r.is_finished() {
        let s: String = rand::thread_rng()
            .sample_iter(&Alphanumeric)
            .take(1000)
            .map(char::from)
            .collect();
        println!("{}", s);
    }

    let r = r.await?;

78.41s user 2.04s system 199% cpu 40.325 total

4 Spawn blocking `fib` and sleep in `main`:

    let r = tokio::task::spawn_blocking(|| fib(NUMBER));

    while !r.is_finished() {
        tokio::time::sleep(tokio::time::Duration::from_millis(20)).await;
    }

    let r = r.await?;
    println!("{r}");

39.18s user 0.03s system 99% cpu 39.347 total

These are results that make sense to me so far.

The following results don’t make sense to me:

5 Spawn sleep loop and call `fib` in `main`

    let r = tokio::task::spawn(async {
        loop {
            tokio::time::sleep(tokio::time::Duration::from_millis(20)).await;
        }
    });

    fib(NUMBER);

    r.abort();

39.16s user 0.02s system 99% cpu 39.299 total

6 Spawn print loop and call `fib` in `main`

    let r = tokio::task::spawn(async {
        loop {
            let s: String = rand::thread_rng()
                .sample_iter(&Alphanumeric)
                .take(1000)
                .map(char::from)
                .collect();
            println!("{}", s);
        }
    });
    fib(NUMBER);

    r.abort();

7 Spawn blocking sleep loop and call `fib` in `main`

    let r = tokio::task::spawn_blocking(|| loop {
        std::thread::sleep(std::time::Duration::from_millis(20));
    });

    fib(NUMBER);

    r.abort();

DOES NOT HALT

8 Spawn blocking print loop and call `fib` in `main`

    let r = tokio::task::spawn_blocking(|| loop {
        let s: String = rand::thread_rng()
            .sample_iter(&Alphanumeric)
            .take(1000)
            .map(char::from)
            .collect();
        println!("{}", s);
    });

    fib(NUMBER);

    r.abort();

DOES NOT HALT

Here’s a link to the source code if someone wants to play around with it:

github.com

kdheepak/test-tokio/blob/28d0fc9ec1965c27bd0d5d34bccbb27256aa5f73/README.md

# test-tokio

## Results

### Spawn fib and print to stdout:

```rust
    let r = tokio::task::spawn(async { fib(NUMBER) });

    while !r.is_finished() {
        let s: String = rand::thread_rng()
            .sample_iter(&Alphanumeric)
            .take(1000)
            .map(char::from)
            .collect();
        println!("{}", s);
    }

    let r = r.await?;
```

This file has been truncated. show original

joshka · July 3, 2024, 10:19am

I added numbers to the above post to make it easier to call out, and created GitHub - joshka/spike-tokio

I Added a loop counter (either AtomicUsize or Arc<AtomicUsize>) to each example to show what’s going on a bit better. You’ll note that there’s two broadly similar counts, ~70 where the loop is waiting 20ms between iterations, and ~6300 where it is printing characters to the terminal as fast as possible. The amount of time spent is similar because this is controlled by the time it takes to run the fibonacci method (I’m using 42 for the number).

1 Fibonacci number: 267914296, loop_count: 6431

cargo run --bin item1 3.82s user 0.09s system 164% cpu 2.374 total

2 Fibonacci number: 267914296, loop_count: 73

cargo run --bin item2 1.82s user 0.02s system 92% cpu 1.977 total

3 Fibonacci number: 267914296, loop_count: 6385

cargo run --bin item3 3.84s user 0.10s system 167% cpu 2.355 total

4 Fibonacci number: 267914296, loop_count: 74

cargo run --bin item4 1.82s user 0.02s system 92% cpu 1.999 total

5 Fibonacci number: 267914296, loop_count: 73

cargo run --bin item5 1.82s user 0.02s system 93% cpu 1.975 total

I’m not sure what’s not understandable about this one. It spawns a task, which starts running while fib is running on the main thread.

6 Never returns

Spawned tasks may be cancelled using the JoinHandle::abort or AbortHandle::abort methods. When one of these methods are called, the task is signalled to shut down next time it yields at an .await point. If the task is already idle, then it will be shut down as soon as possible without running again before being shut down. Additionally, shutting down a Tokio runtime (e.g. by returning from #[tokio::main]) immediately cancels all tasks on it.

…

Be aware that calls to JoinHandle::abort just schedule the task for cancellation, and will return before the cancellation has completed. To wait for cancellation to complete, wait for the task to finish by awaiting the JoinHandle. Similarly, the JoinHandle::is_finished method does not return true until the cancellation has finished.

There’s no await point in the loop, so it never gets interrupted. Add yield_now().await; and it does. Also calling std::process::exit(1) makes things die.
Also, anything that runs in the code after that point runs at the same time as the async task, so if for instance you’re printing the result of fib, then it will be interspersed with the random junk, so it’s important to await the handle.

So adding:

            yield_now().await;
        }
    });
    let result = fib(NUMBER);
    handle.abort();
    if let Err(err) = handle.await {
        eprintln!("Error: {:?}", err);
    }

gives:
Error: JoinError::Cancelled(Id(13))
Fibonacci number: 267914296, loop_count: 6305
cargo run --bin item6 3.88s user 0.14s system 154% cpu 2.609 total

7 Never completes

But it does run any code after the abort (e.g. printing the result / loop count. The problem here is also mentioned at tokio::task - Rust

Be aware that tasks spawned using spawn_blocking cannot be aborted because they are not async. If you call abort on a spawn_blocking task, then this will not have any effect, and the task will continue running normally. The exception is if the task has not started running yet; in that case, calling abort may prevent the task from starting.

To get this one to succeed, you’d want to add some form of cancelation token / message that the loop can check.

    let (tx, mut rx) = tokio::sync::oneshot::channel::<()>();

// then in the loop

        if rx.try_recv().is_ok() {
            break;
        }

// and after

    tx.send(()).unwrap();

Fibonacci number: 267914296, loop_count: 77
cargo run --bin item7 1.84s user 0.02s system 59% cpu 3.150 total

8 never completes

same deal as 7. blocking task can’t be aborted
Applying the same change

Fibonacci number: 267914296, loop_count: 6448
cargo run --bin item8 3.84s user 0.10s system 156% cpu 2.526 total

I suspect given this note about canceling, you’d probably get the same effect to calling abort as just letting the method end.

Additionally, shutting down a Tokio runtime (e.g. by returning from #[tokio::main]) immediately cancels all tasks on it.

See fix: make things work right · joshka/spike-tokio@18e6c4f · GitHub for corrections

joshka · July 3, 2024, 10:57am

Getting to the why… (should you use the blocking tasks threadpool)

If you run non-async blocking code on a non blocking task, then it may cause the blocking task to never yield to the properly async tasks. This is especially problematic if you happen to find those tasks scheduled on the same thread - they may wait forever. You can simulate this by tweaking the number of threads so there’s only a single thread in the worker pool, and then running a blocking task and a non-blocking task simultaneously.

use std::{thread, time::Duration};

use color_eyre::Result;
use tokio::join;

fn main() -> Result<()> {
    color_eyre::install()?;
    tokio::runtime::Builder::new_multi_thread()
        .worker_threads(1)
        .enable_all()
        .build()?
        .block_on(run())
}

async fn run() -> Result<()> {
    let (tx, mut rx) = tokio::sync::oneshot::channel();
    let task1 = tokio::task::spawn(async move {
        println!("Task 1 waiting for message");
        loop {
            thread::sleep(Duration::from_millis(1));
            if let Ok(()) = rx.try_recv() {
                break;
            }
        }
        unreachable!("Task 1 received message");
    });
    let task2 = tokio::task::spawn(async move {
        println!("Task 2 sleeping for 20ms");
        tokio::time::sleep(Duration::from_millis(20)).await; // ensure the other task runs first
        unreachable!("Task 2 sending message");
        tx.send(()).unwrap();
        unreachable!("Task 2 sent message");
    });
    let _ = join!(task1, task2);
    Ok(())
}

This is likely something that in a real app would come up only if the tasks happen to be scheduled on the same thread like this.I’m not sure whether that will happen. Probably not on a small scale, but it’s the sort of thing that if it does happen you’ll never be able to easily reproduce it. I’m also not certain whether the scheduler might avoid threads that are blocked like this.

The assumption for a TUI app that’s async is that you’ll have some portion of async code and the main loop which is sync and blocking. In the main loop, you’ll have points where you might have to block and wait for mutexes / locks / channels. It’s best to minimize those waits.

One thing that bevy app loops do is copy the data needed to render from the main “world” into the render “world” before rendering. This means there’s no conflict / waiting for mutexes when rendering, and the main world can continue working on the application data without waiting on the rendering. For ui data which is cheap to clone, this is a possible way to do things. There are many though.

joshka · July 11, 2024, 9:10am

Async: What is blocking? – Alice Ryhl is worth a read for some general guidelines like:

Async code should never spend a long time without reaching an .await.

By using tokio::join!, all three tasks are guaranteed to run on the same thread, but if you replace it with tokio::spawn and use a multi-threaded runtime, you will be able to run multiple blocking tasks until you run out of threads. The default Tokio runtime spawns one thread per CPU core, and you will typically have around 8 CPU cores. This is enough that you can miss the issue when testing locally, but sufficiently few that you will very quickly run out of threads when running the code for real.

To give a sense of scale of how much time is too much, a good rule of thumb is no more than 10 to 100 microseconds between each .await. That said, this depends on the kind of application you are writing.

The article presents an alternative to spawn_blocking that is to just put the IO on another thread. This could be the best approach for async Ratatui Apps

kdheepak · July 11, 2024, 11:07am

If a blocking operation keeps running forever, you should run it on a dedicated thread. For example consider a thread that manages a database connection using a channel to receive database operations to perform. Since this thread is listening on that channel in a loop, it never exits.

In the templates we currently have a task that never exits:

github.com

ratatui-org/templates/blob/5e823efc871107345d59e5deff9284235c1f0bbc/component/template/src/tui.rs#L98-L151


      
          self.task = tokio::spawn(async move {
            let mut reader = crossterm::event::EventStream::new();
            let mut tick_interval = tokio::time::interval(tick_delay);
            let mut render_interval = tokio::time::interval(render_delay);
            _event_tx.send(Event::Init).unwrap();
            loop {
              let tick_delay = tick_interval.tick();
              let render_delay = render_interval.tick();
              let crossterm_event = reader.next().fuse();
              tokio::select! {
                _ = _cancellation_token.cancelled() => {
                  break;
                }
                maybe_event = crossterm_event => {
                  match maybe_event {
                    Some(Ok(evt)) => {
                      match evt {
                        CrosstermEvent::Key(key) => {
                          if key.kind == KeyEventKind::Press {
                            _event_tx.send(Event::Key(key)).unwrap();

This file has been truncated. show original

But it does hit an .await (inside tokio::select!) every few milliseconds because of the Tick or Render event. So that should be fine, right?

joshka · July 13, 2024, 9:59pm

Nothing in that loop is blocking, but there is a few errors / non-ideal code there. PR incoming.

Understanding tokio::spawn and tokio::spawn_blocking

1 Spawn fib and run print to stdout in main:

2 Spawn fib and sleep in main:

3 Spawn blocking fib and print to stdout in main:

4 Spawn blocking fib and sleep in main:

5 Spawn sleep loop and call fib in main

6 Spawn print loop and call fib in main

7 Spawn blocking sleep loop and call fib in main

8 Spawn blocking print loop and call fib in main