Async: Guidelines

It might be tempting to just throw async, await and Awaitable on all your code and go on with your life. And while it is OK to have more async functions than not -- in fact, you should generally not be afraid to make a function async since there is no performance penalty for doing so -- there are some guidelines you should follow in order to make the most efficient use of async.

Be Liberal, but Careful, with Async

If you are struggling as to whether your code should be async or not, you can generally start with the answer yes and find a reason to say no. For example, a simple hello world program can be made async with no performance penalty. You will likely not get any gain, but you will not get any loss -- and it is setup for any future changes that may require async.

These two programs are, for all intents and purposes, equivalent.

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\NonAsyncHello;

function get_hello(): string {
  return "Hello";
}

function run_na_hello(): void {
  var_dump(get_hello());
}

run_na_hello();
Output
string(5) "Hello"
<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\Hello;

async function get_hello(): Awaitable<string> {
  return "Hello";
}

async function run_a_hello(): Awaitable<void> {
  $x = await get_hello();
  var_dump($x);
}

run_a_hello();
Output
string(5) "Hello"

Just make sure you are following the rest of the guidelines. Async is great, but you still have to consider things like caching, batching and efficiency.

Use Async Extensions

For the common cases where async would provide maximum benefit, HHVM provides convenient extension libraries to help make writing code much easier. Depending on your use case scenario, you should liberally use:

  • MySQL for database access and queries.
  • cURL for web page data and transfer.
  • McRouter for memcached-based operations.
  • Streams for stream-based resource operations.

Do Not Use Async in Loops

If you only remember one rule, remember this:

** DON'T await IN A LOOP **

It totally defeats the purpose of async.

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\Loop;

class User {
  public string $name;

  protected function __construct(string $name) { $this->name = $name; }

  static function get_name(int $id): User {
    return new User(str_shuffle("ABCDEFGHIJ") . strval($id));
  }
}

async function load_user(int $id): Awaitable<User> {
  // Load user from somewhere (e.g., database).
  // Fake it for now
  return User::get_name($id);
}

async function load_users_await_loop(array<int> $ids): Awaitable<Vector<User>> {
  $result = Vector {};
  foreach ($ids as $id) {
    $result[] = await load_user($id);
  }
  return $result;
}

function runMe(): void {
  $ids = array(1, 2, 5, 99, 332);
  $result = \HH\Asio\join(load_users_await_loop($ids));
  var_dump($result[4]->name);
}

runMe();
Output
string(13) "JFHBIAEDGC332"

In the above example, the loop is doing two things:

  1. Making the loop iterations the limiting factor on how this code is going to run. By the loop, you are guaranteed to get the users sequentially.
  2. You are creating false dependencies. Loading one user is not dependent on loading another user.

Instead, you will want to use our async-aware mapping function, vm().

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\NoLoop;

class User {
  public string $name;

  protected function __construct(string $name) { $this->name = $name; }

  static function get_name(int $id): User {
    return new User(str_shuffle("ABCDEFGHIJ") . strval($id));
  }
}

async function load_user(int $id): Awaitable<User> {
  // Load user from somewhere (e.g., database).
  // Fake it for now
  return User::get_name($id);
}

async function load_users_no_loop(array<int> $ids): Awaitable<Vector<User>> {
  return await \HH\Asio\vm(
    $ids,
    fun('\Hack\UserDocumentation\Async\Guidelines\Examples\NoLoop\load_user')
  );
}

function runMe(): void {
    $ids = array(1, 2, 5, 99, 332);
    $result = \HH\Asio\join(load_users_no_loop($ids));
    var_dump($result[4]->name);
}

runMe();
Output
string(13) "AJBIHCDGFE332"

Considering Data Dependencies Is Important

Possibly the most important aspect in learning how to structure async code is understanding data dependency patterns. Here is the general flow of how to make sure your async code is data dependency correct:

  1. Put each sequence of dependencies with no branching (chain) into its own async function.
  2. Put each bundle of parallel chains into its own async function.
  3. Repeat to see if there are further reductions.

Let's say we are getting blog posts of an author. This would involve the following steps:

  1. Get the post ids for an author.
  2. Get the post text for each post id.
  3. Get comment count for each post id.
  4. Generate final page of information
<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\DataDependencies;

// So we can use asio-utilities function vm()
class PostData {
  // using constructor argument promotion
  public function __construct(public string $text) {}
}

async function fetch_all_post_ids_for_author(int $author_id)
  : Awaitable<array<int>> {

  // Query database, etc., but for now, just return made up stuff
  return array(4, 53, 99);
}

async function fetch_post_data(int $post_id): Awaitable<PostData> {
  // Query database, etc. but for now, return something random
  return new PostData(str_shuffle("ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
}

async function fetch_comment_count(int $post_id): Awaitable<int> {
  // Query database, etc., but for now, return something random
  return rand(0, 50);
}

async function fetch_page_data(int $author_id)
  : Awaitable<Vector<(PostData, int)>> {

  $all_post_ids = await fetch_all_post_ids_for_author($author_id);
  // An async closure that will turn a post ID into a tuple of
  // post data and comment count
  $post_fetcher = async function(int $post_id): Awaitable<(PostData, int)> {
    list($post_data, $comment_count) =
      await \HH\Asio\v(array(
        fetch_post_data($post_id),
        fetch_comment_count($post_id),
      ));
    /* The problem is that v takes Traverable<Awaitable<T>> and returns
     * Awaitable<Vector<T>>, but there isn't a good value of T that represents
     * both ints and PostData, so they're currently almost a union type.
     *
     * Now we need to tell the typechecker what's going on.
     * In the future, we plan to add HH\Asio\va() - VarArgs - to support this.
     * This will have a type signature that varies depending on the number of
     * arguments, for example:
     *
     *  - va(Awaitable<T1>, Awaitable<T2>): Awaitable<(T1, T2)>
     *  - va(Awaitable<T1>,
     *       Awaitable<T2>,
     *       Awaitable<T3>): Awaitable<(T1, T2, T3)>
     *
     * And so on, with no need for T1, T2, ... Tn to be related types.
     */
    invariant($post_data instanceof PostData, "This is good");
    invariant(is_int($comment_count), "This is good");
    return tuple($post_data, $comment_count);
  };

  // Transform the array of post IDs into an array of results,
  // using the vm() function from asio-utilities
  return await \HH\Asio\vm($all_post_ids, $post_fetcher);
}

async function generate_page(int $author_id): Awaitable<string> {
  $tuples = await fetch_page_data($author_id);
  $page = "";
  foreach ($tuples as $tuple) {
    list($post_data, $comment_count) = $tuple;
    // Normally render the data into HTML, but for now, just create a
    // normal string
    $page .= $post_data->text . " " . $comment_count . PHP_EOL;
  }
  return $page;
}

$page = \HH\Asio\join(generate_page(13324)); // just made up a user id
var_dump($page);
Output
string(89) "AGEDMJQTFIVSCPHKLURWXNOZBY 9
ALSJURTKYIFBQMHXPNVWCDGZOE 25
GFMEYPITXDBORLVCKNAWJSUZQH 10
"

The above example follows our flow:

  1. One function for each fetch operation (post ids, post text, comment count).
  2. One function for the bundle of data operations (post text and comment count).
  3. One top function that coordinates everything.

Consider Batching

Wait handles can be rescheduled. This means that it will be sent back to the queue of the scheduler, waiting until other awaitables have run. Batching can be a good use of rescheduling. For example, say you have high latency lookup of data, but you can send multiple keys for the lookup in a single request.

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\Batching;

// For asio-utilities function later(), etc.
async function b_one(string $key): Awaitable<string> {
  $subkey = await Batcher::lookup($key);
  return await Batcher::lookup($subkey);
}

async function b_two(string $key): Awaitable<string> {
  return await Batcher::lookup($key);
}

async function batching(): Awaitable<void> {
  $results = await \HH\Asio\v(array(b_one('hello'), b_two('world')));
  echo $results[0] . PHP_EOL;
  echo $results[1];
}

\HH\Asio\join(batching());

class Batcher {
  private static array<string> $pendingKeys = array();
  private static ?Awaitable<array<string, string>> $aw = null;

  public static async function lookup(string $key): Awaitable<string> {
    // Add this key to the pending batch
    self::$pendingKeys[] = $key;
    // If there's no awaitable about to start, create a new one
    if (self::$aw === null) {
      self::$aw = self::go();
    }
    // Wait for the batch to complete, and get our result from it
    $results = await self::$aw;
    return $results[$key];
  }

  private static async function go(): Awaitable<array<string, string>> {
    // Let other awaitables get into this batch
    await \HH\Asio\later();
    // Now this batch has started; clear the shared state
    $keys = self::$pendingKeys;
    self::$pendingKeys = array();
    self::$aw = null;
    // Do the multi-key roundtrip
    return await multi_key_lookup($keys);
  }
}

async function multi_key_lookup(array<string> $keys)
  : Awaitable<array<string, string>> {

  // lookup multiple keys, but, for now, return something random
  $r = array();
  foreach ($keys as $key) {
    $r[$key] = str_shuffle("ABCDEF");
  }
  return $r;
}
Output
/data/users/joelm/fbsource-opt/fbcode/_bin/hphp/hhvm/hhvm
BEACFD
FDCEBA

In the example above, we reduce the number of roundtrips to the server containing the data information to two by batching the first lookup in b_one() and the lookup in b_two(). The Batcher::lookup() function helps enable this reduction.

The await HH\Asio\later() in Batcher::go() basically allows Batcher::go() to be deferred until other pending awaitables have run.

So, await HH\Asio\v(array(b_one..., b_two...)); has two pending awaitables. If b_one() is called first, it calls Batcher::lookup(), which calls Batcher::go(), which reschedules via later(). Then HHVM looks for other pending awaitables. b_two() is also pending. It calls Batcher::lookup() and then it gets suspended via await self::$aw because Batcher::$aw is not null any longer. Now Batcher::go() resumes, fetches and returns the result.

Don't Forget to Await an Awaitable

What do you think happens here?

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\ForgetAwait;

async function speak(): Awaitable<void> {
  echo "one";
  await \HH\Asio\later();
  echo "two";
  echo "three";
}

async function forget_await(): Awaitable<void> {
  $handle = speak(); // This just gets you the handle
}

forget_await();
Output
one

The answer is undefined. You might get all three echoes. You might only get the first echo. You might get nothing at all. The only way to guarantee that speak() will run to completion is to await it. await is the trigger to the async scheduler that allows HHVM to appropriately suspend and resume speak(); otherwise, the async scheduler will be provide no guarantees with respect to speak().

Minimize Undesired Side Effects

In order to minimize any unwanted side effects (e.g., ordering disparities), your creation and awaiting of awaitables should happen as close as possible.

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\SideEffects;

async function get_curl_data(string $url): Awaitable<string> {
  return await \HH\Asio\curl_exec($url);
}

function possible_side_effects(): int {
  sleep(1);
  echo "Output buffer stuff";
  return 4;
}

async function proximity(): Awaitable<void> {
  $handle = get_curl_data("http://example.com");
  possible_side_effects();
  await $handle; // instead you should await get_curl_data("....") here
}

\HH\Asio\join(proximity());
Output
Output buffer stuff

In the above example, possible_side_effects() could cause some undesired behavior when you get to the point of awaiting the handle associated with getting the data from the website.

Basically, don't depend on the order of output between runs of the same code. i.e, don't write async code where ordering is important and instead use dependencies via awaitables and await.

Memoization May Be Good. But Only Awaitables

Given that async is commonly used in operations that are time-consuming, memoizing (i.e., caching) the result of an async call can definitely be worthwhile.

The <<__Memoize>> attribute does the right thing. So, if you can, use that. However, if you need explicit control of the memoization, make sure you memoize the awaitable and not the result of awaiting it.

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\MemoizeResult;

async function time_consuming(): Awaitable<string> {
  sleep(5);
  return "This really is not time consuming, but the sleep fakes it.";
}

async function memoize_result(): Awaitable<string> {
  static $result = null;
  if ($result === null) {
    $result = await time_consuming(); // don't memoize the resulting data
  }
  return $result;
}

function runMe(): void {
  $t1 = microtime();
  \HH\Asio\join(memoize_result());
  $t2 = microtime() - $t1;
  $t3 = microtime();
  \HH\Asio\join(memoize_result());
  $t4 = microtime() - $t3;
  var_dump($t4 < $t2); // The memmoized result will get here a lot faster
}

runMe();
Output
bool(true)

On the surface, this seems reasonable. We want to cache the actual data associated with the awaitable. However, this can cause an undesired race condition.

Imagine that there are two other async functions awaiting the result of memoize_result(), call them A() and B(). The following sequence of events can happen:

  1. A() gets to run, and awaits memoize_result().
  2. memoize_result() finds that the memoization cache is empty ($result is null), so it awaits time_consuming(). It gets suspended.
  3. B() gets to run, and awaits memoize_result(). Note that this is a new awaitable; it’s not the same awaitable as in 1.
  4. memoize_result() again finds that the memoization cache is empty, so it awaits time_consuming() again. Now the time-consuming operation will be done twice.

If time_consuming() has side effects (e.g. a database write), then this could end up being a serious bug. Even if there are no side effects, it’s still a bug; the time-consuming operation is being done multiple times when it only needs to be done once.

Instead, memoize the awaitable:

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\MemoizeAwaitable;

async function time_consuming(): Awaitable<string> {
  sleep(5);
  return "Not really time consuming but sleep."; // For type-checking purposes
}

function memoize_handle(): Awaitable<string> {
  static $handle = null;
  if ($handle === null) {
    $handle = time_consuming(); // memoize the awaitable
  }
  return $handle;
}

function runMe(): void {
  $t1 = microtime();
  \HH\Asio\join(memoize_handle());
  $t2 = microtime() - $t1;
  $t3 = microtime();
  \HH\Asio\join(memoize_handle());
  $t4 = microtime() - $t3;
  var_dump($t4 < $t2); // The memmoized result will get here a lot faster
}

runMe();
Output
bool(true)

This simply caches the handle and returns it verbatim - Async Vs Awaitable explains this in more detail.

This would also work if it were an async function that awaited the handle after caching. This may seem unintuitive, because the function awaits every time it’s executed, even on the cache-hit path. But that’s fine: on every execution except the first, $handle is not null, so a new instance of time_consuming() will not be started. The result of the one existing instance will be shared.

Either approach works, but the non-async caching wrapper can be easier to reason about.

Use Lambdas Where Possible

Lambdas can cut down on code verbosity that comes with writing full closure syntax. They are quite useful in conjunction with the async utility helpers.

For example, look how the following three ways to accomplish the same thing can be shortened using lambdas.

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\Lambdas;

// For asio-utilities that we installed via composer
async function fourth_root(num $n): Awaitable<float> {
  return sqrt(sqrt($n));
}

async function normal_call(): Awaitable<Vector<float>> {
  $nums = Vector {64, 81};
  return await \HH\Asio\vm(
    $nums,
    fun('\Hack\UserDocumentation\Async\Guidelines\Examples\Lambdas\fourth_root')
  );
}

async function closure_call(): Awaitable<Vector<float>> {
  $nums = Vector {64, 81};
  $froots = async function(num $n): Awaitable<float> {
    return sqrt(sqrt($n));
  };
  return await \HH\Asio\vm($nums, $froots);
}

async function lambda_call(): Awaitable<Vector<float>> {
  $nums = Vector {64, 81};
  return await \HH\Asio\vm($nums, async $num ==> sqrt(sqrt($num)));
}

async function use_lambdas(): Awaitable<void> {
  $nc = await normal_call();
  $cc = await closure_call();
  $lc = await lambda_call();
  var_dump($nc);
  var_dump($cc);
  var_dump($lc);
}

\HH\Asio\join(use_lambdas());
Output
object(HH\Vector)#8 (2) {
  [0]=>
  float(2.8284271247462)
  [1]=>
  float(3)
}
object(HH\Vector)#16 (2) {
  [0]=>
  float(2.8284271247462)
  [1]=>
  float(3)
}
object(HH\Vector)#24 (2) {
  [0]=>
  float(2.8284271247462)
  [1]=>
  float(3)
}

Use join in Non-async Functions

Imagine you are making a call to an async function join_async() from a non-async scope. In order to obtain your desired results, you must join() in order to get the result from an awaitable.

<?hh

namespace Hack\UserDocumentation\Async\Guidelines\Examples\Join;

async function join_async(): Awaitable<string> {
  return "Hello";
}

// In an async function, you would await an awaitable.
// In a non-async function, or the global scope, you can
// use `join` to force the the awaitable to run to its completion.
$s = \HH\Asio\join(join_async());
var_dump($s);
Output
string(5) "Hello"

This scenario normally occurs in the global scope (but can occur anywhere).

Remember Async Is NOT Multi-threading

Async functions are not running at the same time. They are CPU sharing via changes in wait state in executing code (i.e., preemptive multitasking). Async still lives in the single-threaded world of normal PHP and Hack.

await Is Not an Expression

You can use await in three places:

  1. As a statement by itself (e.g., await func())
  2. On the right-hand side (RHS) of an assignment (e.g., $r = await func())
  3. As an argument to return (e.g., return await func())

You cannot, for example, use await in var_dump().