Collections: Introduction

In addition to fully supporting the one-size-fits-all PHP array container type, Hack allows additional typing to be placed on arrays, and provides a number of classes which implement more specialized collection patterns.

Array Typing

Arrays in Hack behave similarly to Generics in that additional type information may be provided about how they are specialized. For example, an unindexed array of strings takes the form array<string>. Similarly, for indexed arrays, the key type (int or string) may be specified first, followed by the value type. So a dictionary with string keys and floating point values might look like array<string, float>.

Since the value side of an array may itself be an array (or a Generic class), type specialization may be nested as with array<int, array<string, string>> which is a numerically indexed array containing string dictionaries.

<?hh

namespace Hack\UserDocumentation\Collections\Intro\Examples\Arr;

// array<string, string> is an array map with string keys and string values
function array_as_map(array<string, string> $arr): string {
  $r = substr(str_shuffle('ABCDEF'), 0, 1); // random letter
  return array_key_exists($r, $arr) ? $arr[$r] : 'Z';
}

// array<int> is an array vector with integer keys and integer values
function array_as_vector(array<int> $arr): int {
  $r = rand(0, 10);
  return array_key_exists($r, $arr) ? $arr[$r] : PHP_INT_MAX;
}

function run(): void {
  $v = array(100, 200, 300, 400);
  $v[] = 500; // index 5, value 500
  var_dump($v);
  var_dump(array_as_vector($v));

  $m = array('A' => 'California', 'B' => 'Oregon', 'C' => 'North Carolina');
  $m['D'] = 'Florida';
  var_dump($m);
  var_dump(array_as_map($m));
}

run();
Output
array(5) {
  [0]=>
  int(100)
  [1]=>
  int(200)
  [2]=>
  int(300)
  [3]=>
  int(400)
  [4]=>
  int(500)
}
int(9223372036854775807)
array(4) {
  ["A"]=>
  string(10) "California"
  ["B"]=>
  string(6) "Oregon"
  ["C"]=>
  string(14) "North Carolina"
  ["D"]=>
  string(7) "Florida"
}
string(7) "Florida"

Hack Collections

While PHP Arrays are extremely versatile, that flexibility occaisionally comes at a cost either in terms of performance, correctness, or readability. Hack Collection classes seek to resolve those issues by providing deeply specialized object versions of arrays in the form of common container patterns: Vector, Map, and Set.

Type of Collections

There are seven collections in Hack:

Type Description
Vector Mutable, integer-indexed, ordered sequence of values. Values can be of any type. The indicies start at 0 and end at n-1, where n is the number of elements.
ImmVector An immutable version of Vector. Once the ImmVector is created, elements cannot be changed, removed or added.
Map Mutable, string or integer-indexed, ordered sequence of values. Values can be of any type. Order is remembered. This is most similar to the array in usage.
ImmMap An immutable version of Map. Once the ImmMap is created, elements cannot be changed, removed or added.
Set Mutable, ordered set of unique values. The values can be only int or string. There are no keys in a Set.
ImmSet An immutable version of Set. Once the ImmSet is created, elements cannot be changed, removed or added.
Pair An immutable sequence of exactly two values. The keys are 0 and 1. They are similar to tuples, but less flexible.

Readability

Imagine a 3rd party function declared like:

function foo(array<int, string> $arr): void {}

Is this array a vector-style array with sequential integer keys? Or maybe a map-style array with potentially non-sequential integer keys? Are the values in this array unique?

Now imagine this function declaration:

function bar(Vector<string> $vec): void {}

By looking at the function signature, we know those integer keys are ordered and sequential. That the actual key number is probably much less relevant that the thing its holding.

Similarly, if the function declaration were:

function bar(Set<string> $set): void {}

Then we realize that those integer keys were irrelevant, and that the strings contained in this parameter will all have unique values.

Performance

The obvious benefit in Vectors and Sets are that the keys can be ignored by the runtime. For Vectors that means layouting out the values contiguously and having fast lookup and iteration based on index. In the case of Sets it actually means (as an implementation detail) turning the internal structure inside out, using the values as keys (which must be unique), and ignoring the other half of the array pair. This is a pattern already available to PHP code, but it removes the cognitive overhead of having to remember that the array is inside out, and does it in a more efficient way than script code is able to do.

Here is the typical-case amortized time complexity for operations on the key collection classes and array:

Class Access Iteration Insert Removal
Vector O(1) O(n) O(n) O(n)
Map O(1) O(n) O(1) O(1)
Set O(1) O(n) O(1) O(1)
Pair O(1) O(1) N/A N/A
array O(1) O(n) O(1) O(1)

Type Checking

Related to readability, the Hack typechecker cannot tell, for example, whether the context of your code is passing an array used like a vector to a function with a parameter of an array used like a map. For example, the following code passes the typechecker.

<?hh

namespace Hack\UserDocumentation\Collections\Intro\Examples\ArrTypeCheck;

// This is a array used like a map, but it can be passed an array used like
// a vector.
function array_as_map(array<int, int> $arr): int {
  $r = rand(0, 10); // random letter
  return array_key_exists($r, $arr) ? $arr[$r] : PHP_INT_MAX;
}

function run(): void {
  $v = array(100, 200, 300, 400);
  $v[] = 500; // index 5, value 500
  var_dump($v);
  var_dump(array_as_map($v));
}

run();
Output
array(5) {
  [0]=>
  int(100)
  [1]=>
  int(200)
  [2]=>
  int(300)
  [3]=>
  int(400)
  [4]=>
  int(500)
}
int(9223372036854775807)

However, if you take a similar style of code, but instead use Vector and Map, the typechecker can easily tell the intent.

<?hh

namespace Hack\UserDocumentation\Collections\Intro\Examples\MapTypeCheck;

// array<string, string> is an array map with string keys and string values
function array_as_map(Map<int, int> $arr): int {
  $r = rand(0, 10); // random letter
  return array_key_exists($r, $arr) ? $arr[$r] : -1;
}

function run(): void {
  $v = Vector { 100, 200, 300, 400 };
  $v[] = 500; // index 5, value 500
  var_dump($v);
  // The call to array_as_map will not typecheck because you are trying to pass
  // a Vector into a function expecting a Map. You will also get a runtime
  // error as well trying to do this.
  var_dump(array_as_map($v));
}

run();
Output
object(HH\Vector)#1 (5) {
  [0]=>
  int(100)
  [1]=>
  int(200)
  [2]=>
  int(300)
  [3]=>
  int(400)
  [4]=>
  int(500)
}

Catchable fatal error: Argument 1 passed to Hack\UserDocumentation\Collections\Intro\Examples\MapTypeCheck\array_as_map() must be an instance of HH\Map, HH\Vector given in /data/users/joelm/user-documentation/guides/hack/23-collections/01-introduction-examples/map-typecheck.php.type-errors on line 9

Using Hack collections gives the typechecker more data to work with in trying to decide whether you have typing problems in your code.