Collections: Semantics

In general, Hack collections should be your first choice when deciding between them and arrays for new code. They provide the readability, performance and type-checkability needed, without sacrificing much in terms of flexibility.

That said, there is one key area where you must be cognizant of the differences between collections and arrays.

Reference Semantics

Hack collections have reference semantics. This means that a collection is treated like an object, and modifications made to a collection will affect any collections that were assigned or somehow copied to it.

Arrays have value semantics. Thus, a modification to an array will have no affect to an array that were assigned or somehow copied to it.

<?hh

namespace Hack\UserDocumentation\Collections\Semantics\Examples\RefVal;

function foo(Vector<int> $vec): void {
  $vec[1] = 500;
  var_dump($vec);
}

function bar(array<int> $arr): void {
  $arr[1] = 500;
  var_dump($arr);
}

function reference_semantics(): void {
  $vec = Vector {1, 2, 3};
  var_dump($vec);
  $cp_vec = $vec;
  var_dump($cp_vec); // The two vectors are the same reference
  $vec[0] = 100; // $cp_vec is also affected by the change. They are the same.
  var_dump($vec);
  var_dump($cp_vec);
  foo($vec); // $vec will be affected by anything foo does to it.
  var_dump($vec);
}

function value_semantics(): void {
  $arr = array (1, 2, 3);
  var_dump($arr);
  $cp_arr = $arr;
  var_dump($cp_arr); // The two arrays have the same values, but are copies.
  $arr[0] = 100; // $cp_arr is not affected by this
  var_dump($arr);
  var_dump($cp_arr);
  bar($arr); // $arr is not affected by anytnig bar does to it
  var_dump($arr);
}

function run(): void {
  echo "--- REFERENCE SEMANTICS ---\n\n";
  reference_semantics();
  echo "\n\n--- VALUE SEMANTICS ---\n\n";
  value_semantics();
}

run();
Output
--- REFERENCE SEMANTICS ---

object(HH\Vector)#1 (3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
object(HH\Vector)#1 (3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
object(HH\Vector)#1 (3) {
  [0]=>
  int(100)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
object(HH\Vector)#1 (3) {
  [0]=>
  int(100)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
object(HH\Vector)#1 (3) {
  [0]=>
  int(100)
  [1]=>
  int(500)
  [2]=>
  int(3)
}
object(HH\Vector)#1 (3) {
  [0]=>
  int(100)
  [1]=>
  int(500)
  [2]=>
  int(3)
}


--- VALUE SEMANTICS ---

array(3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
array(3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
array(3) {
  [0]=>
  int(100)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
array(3) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
}
array(3) {
  [0]=>
  int(100)
  [1]=>
  int(500)
  [2]=>
  int(3)
}
array(3) {
  [0]=>
  int(100)
  [1]=>
  int(2)
  [2]=>
  int(3)
}

The above example shows the difference between reference and value semantics. This is even true across function calls as well.

Converting Arrays to Collections

The fact that arrays have value semantics and collections have reference semantics is actually very important when converting existing code using arrays to collections.

<?hh

namespace Hack\UserDocumentation\Collections\Semantics\Examples\Converting;

function foo_with_vector(Vector<int> $vec): void {
  $vec[] = 5;
}

function foo_with_array(array<int> $arr): void {
  $arr[] = 5;
}

function run(): void {
  $arr = array (1, 2, 3);
  foo_with_array($arr);
  $arr[] = 4; // The call to foo_with_array did not affect this $arr.
  var_dump($arr);

  // Many would expect the same sequence of code to work the same
  $vec = Vector {1, 2, 3};
  foo_with_vector($vec);
  $vec[] = 4; // Uh oh, reference semantics at work. foo_with_vector affects us
  var_dump($vec);
}

run();
Output
array(4) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
  [3]=>
  int(4)
}
object(HH\Vector)#1 (5) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
  [3]=>
  int(5)
  [4]=>
  int(4)
}

So, if you had some automatic code modifier to convert array to Vector, your code could break as shown by the example above.

One way to help remedy this is to use ImmVector and Vector::immutable() to make sure that you cannot modify the collection when you pass it to the function.

<?hh

namespace Hack\UserDocumentation\Collections\Semantics\Examples\ConvertingImm;

function foo_with_vector(ImmVector<int> $vec): void {
  try {
    // The type checker actually won't allow this, but you can run this
    // and catch the exception
    $vec[] = 5;
  } catch (\InvalidOperationException $ex) {
    echo "Cannot modify immutable collection. Create copy first\n";
    $cp_vec = new Vector($vec);
    $cp_vec[] = 5;
    var_dump($cp_vec);
  }
}

function foo_with_array(array<int> $arr): void {
  $arr[] = 5;
}

function run(): void {
  $arr = array (1, 2, 3);
  foo_with_array($arr);
  $arr[] = 4; // The call to foo_with_array did not affect this $arr.
  var_dump($arr);

  $vec = Vector {1, 2, 3};
  // Now any change in foo_with_vector won't affect this $vec
  foo_with_vector($vec->immutable());
  $vec[] = 4;
  var_dump($vec);
}

run();
Output
array(4) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
  [3]=>
  int(4)
}
Cannot modify immutable collection. Create copy first
object(HH\Vector)#4 (4) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
  [3]=>
  int(5)
}
object(HH\Vector)#1 (4) {
  [0]=>
  int(1)
  [1]=>
  int(2)
  [2]=>
  int(3)
  [3]=>
  int(4)
}

Equality on Collections ==

Collections can be compared for equality.

$coll1 == $coll2;

Here are the rules:

  1. Are the two collections the same type of collection (mutability ignored)? If no, then equality is false.
  2. Do the two collections have the same number of values (or keys for maps)? If not, then equality is false.
  3. For vectors and pairs, is the value at each index equal via ==? If not, then equality is false; otherwise, equality is true.
  4. For sets, is every value in one contained in the other? If not, then equality is false; otherwise, equality is true.
  5. For maps, does every key in one exist in the other via ===? If not, then equality is false. If so, do the identical keys map to equal values in each collection via ==? If not, then equality is false; otherwise equality is true.
<?hh

namespace Hack\UserDocumentation\Collections\Semantics\Examples\Equality;

function run(): void {
  $vecA = Vector {1, 2, 3};
  $vecB = Vector {1, 2, 3};
  $vecC = Vector {4, 5, 6};
  $vecD = Vector {2, 1, 3};
  $setA = Set {1, 2, 3};
  $setB = Set {3, 2, 1};
  $mapA = Map {1 => 'A', 2 => 'B'};
  $mapB = Map {2 => 'B', 1 => 'A'};

  var_dump($vecA == $vecB); // true
  var_dump($vecA == $vecC); // false, different values
  var_dump($vecA == $vecD); // false, same values, but different order
  var_dump($setA == $setB); // true, same values, order doesn't matter
  var_dump($mapA == $mapB); // true, ordering of keys doesn't matter
}

run();
Output
bool(true)
bool(false)
bool(false)
bool(true)
bool(true)

Identity on Collections ===

Collections can be compared for identity.

$coll1 === $coll2;

Identity only evaluates to true if the both collections are the same object. Otherwise, it is false.

<?hh

namespace Hack\UserDocumentation\Collections\Semantics\Examples\Identity;

function run(): void {
  $vecA = Vector {1, 2, 3};
  $vecB = Vector {1, 2, 3};
  $vecC = $vecA;
  $setA = Set {1, 2, 3};
  $setB = Set {3, 2, 1};
  $setC = $setB;
  $mapA = Map {1 => 'A', 2 => 'B'};
  $mapB = Map {2 => 'B', 1 => 'A'};

  var_dump($vecA === $vecB); // false, not the same object
  var_dump($vecA === $vecC); // true, the same object
  var_dump($setA === $setB); // false, not the same object
  var_dump($setA === $setC); // false, not the same object
  var_dump($setB === $setC); // true, the same object
  var_dump($mapA === $mapB); // false, not the same object
}

run();
Output
bool(false)
bool(true)
bool(false)
bool(false)
bool(true)
bool(false)

Using list()

You can use list() with Vector and Pair just like you can with arrays.

While you can use list() with Map and Set at runtime, the Hack typechecker will throw an error. Note that you must have a zero integer key and subsequent ordered keys for Map and Set; otherwise you will get an OutOfBoundsException.

<?hh

namespace Hack\UserDocumentation\Collections\Semantics\Examples\Liust;

function run(): void {
  $vecA = Vector {1, 2, 3};
  $setA = Set {0, 1, 2};
  $mapA = Map {1 => 'A', 0 => 'B'};
  $pairA = Pair {999, 9999};
  $setB = Set {200, 300};
  list($v1, $v2, $v3) = $vecA;
  list($s1, $s2, $s3) = $setA;
  list($m1, $m2) = $mapA;
  list($p1, $p2) = $pairA;
  try {
    // Exception will be thrown since there is no 0 and 1 value in the Set
    // to serve as a key-like value
    list($x, $y) = $setB;
  } catch (\OutOfBoundsException $ex) {
    var_dump($ex->getMessage());
  }
  var_dump($v1);
  var_dump($v2);
  var_dump($v3);
  var_dump($s1);
  var_dump($s2);
  var_dump($s3);
  var_dump($m1);
  var_dump($m2);
  var_dump($p1);
  var_dump($p2);
}

run();
Output
string(28) "Integer key 1 is not defined"
int(1)
int(2)
int(3)
int(0)
int(1)
int(2)
string(1) "B"
string(1) "A"
int(999)
int(9999)

Using Array Built-In Functions

Hack collections support some built-in functions that take arrays.

Sorting

Pairs do not support sorting since they are immutable. You can convert the Pair to a mutable collection and then do a sort.

There is currently a contradiction with sorting Sets to where the Hack typechecker and HHVM do not agree. For example, the typechecker is ok with a call to sort() on a Set, but HHVM is not; and HHVM is ok with a call to asort() on a Set, but the typechecker is no. We are working on fixing this issue.

Querying

Method Valid Collection
array_keys() All
array_key_exists() All
array_values() All
count() All
idx() Vector, Map

idx() is a function that takes a collection, index and an optional default return value if the index isn't found (null is returned if you don't specify one).

Manipulation

Modification

Introspection

Method Valid Collection
var_dump() All
print_r() All
var_export() All
debug_zval_dump All

APC

Method Valid Collection
apc_store() All

Extending

All of the concrete collection classes are final (i.e., they cannot be sub-classed). However, you can create new concrete collection classes from the various interfaces provided by the collections infrastructure.