Converting an Existing Project to Hack

Once you have a project running under HHVM and the type checker, we have several automated conversion tools to help with different aspects of fully converting a project to Hack — especially the static type system. The suggested workflow is:

  1. Make as much PHP code visible to the type checker as possible.

  2. Guess type annotations with a global inference tool, but log when they fail instead of failing hard.

  3. Parse error logs and remove annotations that do not match at runtime.

  4. Make the remaining annotations fail hard at runtime.

Moving individual files over to Hack

HHVM ships with a tool called the hackificator that attempts to move as many files as possible into Hack. It does not change the code in the file itself over to use any new features of Hack; it just changes the file headers from <?php to <?hh in places where such a conversion can happen cleanly. (With one exception: it marks as nullable typehinted function parameters with a null default value.)

Running the hackificator

To use the hackificator, first make sure you have properly gotten the type checker running on your code and that it reports no errors. Then, run hackificator /path/to/your/project. The tool will attempt to convert your code, file by file, to the strictest mode that still produces no errors. This can take a while for projects with lots of files.

Thoughts on conversion process

For simple projects, this automated conversion may get good coverage. For projects making use of PHP's more dynamic features that are unsupported in Hack, the coverage of the automated tool may not be good enough. Now is a good time to see what did and did not convert cleanly, and to see why the remaining files still in PHP mode did not convert cleanly. Facebook has found that it's often easier to address conversion problems right at the start; errors that are simple now have a way of propagating into inconsistencies that are more gnarly to untangle in the future.

In particular, you should consider the difference between a wide conversion and a deep conversion. To see the difference, consider the following three files:

<?php

abstract class WorkItem {
  public abstract function 
subclassDoWork();

  final public function 
beforeWork() {
    
// ...
  
}

  final public function 
run() {
    
$this->beforeWork();
    
$this->subclassDoWork();
  }

  
// ...
}

<?php

final class WorkItemA extends WorkItem {
  final public function 
subclassDoWork() {
    
$this->foo 1;
  }

  
// ...
}

<?php

final class WorkItemB extends WorkItem {
  final public function 
subclassDoWork() {
    
$this->bar true;
  }

  
// ...
}

If all three files are converted to Hack files, then there will be a type error. WorkItemA and WorkItemB refer to undefined member variables, $this->foo and $this->bar respectively.

Other than manually diagonsing and fixing the error, there are two ways that automated tools like the hackificator can avoid producing this type error. First, it could move the WorkItem superclass into Hack, and leave WorkItemA and WorkItemB in PHP files. This is a deep conversion — since WorkItem itself was moved into a Hack file, the entire inheritance hierarchy of any other subclasses of WorkItem which also happen to convert cleanly will reside in Hack files. Since the entire inheritance hierarchy is visible, the type checker can do much more aggressive checks against all converted subclasses of WorkItem; unconverted subclasses can be fixed and converted one-by-one, reaping all the benefits of static coverage once they have been moved over to Hack files.

The other way for an automated tool to reconcile this is to move WorkItemA and WorkItemB into Hack files, and leave WorkItem itself unconverted in a PHP file. Since Hack can no longer see the entire inheritance hierarchy, it will assume that the undefined members are defined in a PHP superclass, and allow both WorkItem subclasses to type check with no errors. This is a wide conversion, since many more subclasses are now in Hack files than with the other approach, and can have many checks done against them. However, the checking done is considerably less complete than if the entire hierarchy is visible to the type checker. Classes that were previously completely clean (even if WorkItem were in a Hack file too) can silently have some classes errors added to them; there is nothing enforcing that previously clean subclasses remain so.

This is a tradeoff that each project will have to decide to make one way or the other. The hackificator tends to do wide conversions instead of deep ones. Since it converts files one at a time, reverting a file if it introduces an error, it is much more likely to encounter and convert a broken subclass before it encounters the superclass. Automating a deep conversion is considerably trickier and is not currently implemented in the hackificator — though for small projects that want a deep conversion converting key superclasses by hand first is likely not an unreasonable approach.

Facebook did a wide conversion instead of a deep one. In our experience, having many classes "converted" but not fully checked due to a single key superclass remaining in a PHP file is a big deal. The WorkItem example above is actually a (dramatically) simplified example of this in our codebase. Our central "batched work item" superclass has over 25000 recursive subclasses, none of which can be fully checked until all of them are fully converted and the superclass moved into a Hack file — a herculean effort. We did many one-off hacks in order to get around this. For example, we defined a CrippleTypeChecking trait that does nothing except live in a PHP file; this way we can move superclasses into Hack files and just include this trait in subclasses where errors are exposed.

Of course, since we didn't do a deep conversion, we don't know what the pitfalls of doing one would have been. Notably, even in the classes that aren't fully checked for things like undefined methods and undefined instance variables, the type checker still can make many other classes of useful checks, so there is massive benefit even in a wide conversion.

Thus this is something each individual project should be aware of and consider whether a deep conversion, a wide conversion, or some hybrid is appropriate.

Inferring type annotations

After moving as much code into Hack files as possible, that code is still largely going to be missing type annotations. We also provide a tool to attempt to infer parameter, return, and member variable types where possible. This inference is far from perfect. While it will always produce a set of types that are self-consistent and do not cause any type errors according to the type checker, self-consistency does not necessarily mean that they will always align with reality. This is why all the types inserted by this inference engine are "soft" types. Instead of failing hard at runtime, like a normal type annotation, they will produce a log message and continue. These log messages can be used to find and remove incorrect annotations.

To add annotations, first move as much code into Hack files as possible — the more information the inference tool has access to, the better it will do, and the more consistency it can ensure. Then, run hh_server --convert directory-to-add-annotations project-root. This checks project-root for consistency, adding annotations only in the subdirectory directory-to-add-annotations while keeping the entire project clear of errors. Again, since the tool works best when it can see and modify as much of the code as possible, making directory-to-add-annotations the same as project-root (or as close to it as possible) is likely to lead to the best results.

Since this inference process is holistic, it is considerably more resource intensive than the hackificator, which can operate on a single file at a time. It scales well enough to run on all of Facebook's library code all at once — tens of millions of lines of code — though it takes about half a day and 10 GB of RAM. Most projects will be of dramatically smaller scale than that and are expected to have no problems. For larger projects, RAM is more of a limiting factor than CPU, since parts of the process are unfortunately inherently serial. (Though more cores of course help and will be used when possible!)

Hardening type annotations

Once annotations have been added, the logs from the "soft" failures can be automatically parsed to remove the annotations that mismatch at runtime, and to turn the annotations that match into hard failures. Good places to collect such logs include unit test runs and even from production.

To parse a logfile and remove the annotations it is complaining about, use hack_remove_soft_types --delete-from-log your-log.log. Keep in mind that this tool is fairly unintelligent; it uses regular expressions to grab certain key parts of the log, including the file path to modify. If your log was generated on a machine where its PHP code lives in a different location than your development environment, you may want to use sed or a similar tool to correct the paths before running hack_remove_soft_types.

Finally, once all annotations are known to be correct, hack_remove_soft_types --harden file.php will turn all annotations in that file into hard failures. This currently works file-at-a-time, so you may want something like find dir -type f -name '*.php' -exec hack_remove_soft_types --harden '{}' ';' to harden every file in a directory.

To Top