[Pl-seminar] 10/25 Seminar: Julien Verlaguet & Brett Simmers: The Hack Experiment & HHVM, Then And Now

Tue Oct 25 16:02:27 EDT 2016

Below are the notes I took during the talk. Thanks!

----

# Brett Simmers on HHVM (HipHop Virtual Machine)

October 25, 2016

    HHVM is a virtual machine to run PHP built at Facebook. The previous
    version of a PHP implementation at Facebook was called HPHPc,
    a compiler from PHP to C++, retired from production in 2013.

(Gabriel: do they get external contributions?)

Alexa top websites: 3 to 6 run on PHP, 3 to 5 run on HHVM.

Basic principle: most of the time types are stable.

They use a bytecode as an intermediate step (in-hourse bytecode
language, stack-based; Zend uses something similar but
register-based). Then speculative optimization.

Three versions of the JIT.

1. Trace-based JIT (translation unit is a "tracelet")

   Direct translation from the (branch-free) bytecode to x86

   "Very simple, very fast, pretty straightforward, but not a very good
compiler"

   Just translating one bytecode at a time gives unoptimized native code.

   But still beats HPHPC. (Gabriel: Why was HPHPC so bad?)

(Gabriel: how do you make the distinction, when you switch
implementation techniques (eg. trace-based to method-based JIT),
between the actual effectiveness of the technique used and the pros
and cons related to the engineer efforts that went in and who worked
on the project?)

To beat HPHPC, it was still difficult. We decided to go into lockdown:
just focus on ideas that are expected to take no longer than a few
days (two or three).

M.: How accurate do you think these time estimations actually were?

Brett: in practice very few things actually took less than three
days. We've done a couple of time and we know now that accuracy is
very bad, now we just categorize short/medium/long. One important
thing is that we timeout aggressively: after six days, we kill the
thing if it was supposed to be short.

M.: can you name some of the high-reward things?

Brett: I'll come back to it if they don't show in a future slide.

November 2012: 108% performance of HPHPC.

Sam: during this time, did the performance of HPHPC change?

Brett: yes, this graph is fixed on 1.0 performance for HPHPC, but
between the beginning and the end HPHPC more than doubled. We were
running against a moving goalpost.

Gabriel: so how did you know you really wanted HHVM in production
instead of HPHPC?

Brett: with HPHPC the build process took about 20-30 minutes. Even if
you changed one line of PHP you had this build. We built a separate
interpreter for developpers to use in feedback loops. But then you
have two systems that are slightly different... That's one thing we
wanted to fix in HHVM, with the same JIT between sandbox and
production. We just thought that the design of HHVM was more likely to
have future performance wins. I'm biased as I never worked on the old
system but HHVM feels faster/easier to add feature to.

  2. JIT v2: HHIR introduced: strongly-typed intermediate representation
  between HHBC and x86-64.

M. What do you mean by strongly-typed?

Brett: the bytecode is untyped -- the output type for each instruction
depends on the input types. Here it is type.

M.: what do the type look like? Just basic types or also objects?

Brett: like what you have in PHP, plus arbitrary unions.

M.: If you have objects in the union, do you insist that the methods
have distinct types? Otherwise you don't know how to distinguish.

Brett: either we say "of some class" or "of specific class X" --
nominal system.

Brett: if the type is just Integer, the value is stored just in
a register; but if the type is "Integer ∪ Bool", then we use two
registers, one for the value and one for a type tag.

M.: does PHP has first-class functions?

Brett: kind of, but... [Didn't take notes here]

   HHIR allows copy/constant propagation, dead code elimination etc.

   PHP-specific optimizations: reference count elision, inlining

Oli.: did you keep refcounting in your implementation?

Brett: X is working on that, has been for two years.

Gabriel: is there any success story somewhere of moving away from
refcount while it's been used for years in a system?

Brett: PHP has exact destruction semantics, you can't support that
without refcounting.

M.: there is to be significant evidence that GC is faster than
refcounting.

Brett: it wasn't our choice, we had to implement PHP as it was. Users
can observe copies. You can observe when COW-arrays are copied.

   3. JIT v3.
     - Increase size of compilation units (not just "one basic block at a
time").
       (increase scope of optimizations)

     - enable profile-guided optimization (PGO)

   Repurpose the tracing jit to emit profiling blocks; tracelets plus
   profiling probes at the entry of each trace.

Max: what do you measure performance with?

Brett: we measure the capacity of a server, how many queries can it
answer per second when it's about to overload?

M.: do you do any array analysis?

Brett: we tried going fancy with arrays, but nothing resulted in
performance gains yet. We keep working on that. We know many arrays
are used as records in PHP, it should be possible to use that for
good. But what we tried didn't work.

Oli: how do you profile?

Brett: one of the things we profile is the JIT code, another is the
C++ runtime. Only 25%-30% is spent in the JIT code compiler, the rest
is in the runtime. For C++ code we use perf.

Brett: for PGO we just have a single global counter at the start of
each basic block.

M.: have you considered giving feedback to the programmer from
profiling info?

Brett: we have. No automated way, we try to convince the community to
give up on specific pattern that we find to be slow.

M.: there is a paper at last ECOOP (in Prague) from the Mozilla folks
about developers tools in Firefox, and it's interesting, you should go
read it.

Brett: yes, there is a lot of things around that, especially around
arrays. In PHP, if all of the contents of an array are known at
compile-time, we can create a static shared array. But if one is
a class constant that's not known at compile-time, then we have to
create a new instance each time.

Gabriel: the "larger scope", is it still trace-based?

Brett: yes, we stitch traces together. Sometimes it's a full method,
sometimes more or something more tranversal.

   3+n: things we want in the future

   We would like to do layout optimizations. Cluster the hot functions
   together whenever we can. We do a good job inside blocks, but not
   at a more macro scale.

   (see slides for the rest)

   At the end of each tracelet, we sink everything back in memory so
   that we can always jump back to the interpreter. It's nice, but it
   also constrains our memory layout.

   Loop optimizations (maybe): but so far we have not been able to get
   much from it. The loops we see in typical PHP code do a pretty big
   amount of work on each iteration, so not much wins from clever loop
   optimizations.

   Ed can answer GC questions.

M. : Before you went to the slides, we had this short discussion on
refcounting. But now you go back to GC. Have you considered how to
transition transparently?

Brett: it's impossible to make it completely transparent.

M.: well you could have a static check that the code is GC-impl-agnostic.

Ed: have not done that; we're still gonna run destructors, but work
with the community to ensure that they fix their code assuming strict
collection semantics.

Will: what do you mean by scheduling collections?

Ed: deciding when to run the GC. We have a system that's mostly
refcounting, it's mostly freeing its garbage when it runs. You want to
decide when to do backup collection – full Mark&Sweep (M&S) as
a backup. You have to decide what is leakage, it's harder than it
sounds. We are experimenting, but also working with the teams that
develop PHP frameworks to see with them when the best time to collect
is.

M.: is it conservative?

Ed: C++ can only be scanned if we know the validity of fields. Stuff
in JIT can be conservatively scanned. We try to move the needle.

M.: In racket, before we compile to C++, we leave layout information
behind.

Ed: something we may never can do is have precise C++ stack
scanning. But the LLVM community is working on that.

Oli: the VM parts of the stack you scan exactly.

Ed: we can scan exactly, we haven't got to that part yet.

Gabriel: couldn't you use some abstraction for PHP values to enable
scanning? You would have to rewrite the runtime code...

Ed: currently we use the same kind of handles for stack and heap value
references, so that would add too much overhead. We would need to
distinguish them in the runtime code.

Gabriel: I remember when Mozilla did the transition to precise GC they
used a static analysis in their C++ codebase to find out the parts to
update.

Ed: I could definitely see something like that working for us.

Gabriel: do you get external contributions?

Brett: we get a lot. Most contributions come from a small number of
people, around 10 people (outside Facebook) that contribute something
every month on two.

M.: how big is you group?

Brett: 14 people working on HHVM now, including 6 in Boston.

# Julien Verlaguet on Hack

There is a tons of people from Facebook here. We have people from
mobile development, one of the Flow guys, a guy that knows everything
about dubdubdub (our codebase), etc. We got you covered. Hack is
a gradually typed programming language that we develop at Facebook. It
is a dialect of PHP, not fully compatible, but deeply rooted in
PHP. I think you don't support the whole language is unfair to call it
typed PHP; it's a dialect.

Thousands of engineers working on the same codebase.

Performance matters: 1% performance regression does not goes it. 1% is
a really big deal.

Super-fase development cycle: we push three times a day to
production. We're going to switch to something continuous
soonish. (Gabriel: I guess 3/day is close to the Planck constant.)

There was a good reason reason to choose PHP back in 2004: fast
feedback loops. But then the number of engineers working on the same
codebase exploded. It became very difficult to change the code. "How
does this change affect the rest of the code?".

M.: do you assume no modularization?

Julien: it's still difficult to refactor even with modules. A function
you can grep for; but if you have a method called "render". Go grep
for "render" in a millions-lines-of-code codebase.

M.: that's precisely why you should use lexical scope as a builtin
information system. Let's talk offline.

Hack can be seen as "statically typed"
(or gradually typed). Compatible with PHP. No overhead when moving
from PHP to Hack and vice versa. Instantaneous type-checking. The
type-checker is a daemon that we start before you want to start
working, and it always keeps track of all the changes of the
disk. When a user asks about typing errors we can answer instantly;
that's how we did not break the fast feedback loop.

In our case the gradual typing was more something we had to put up
with (before incremental adoption) instead of per choice. We had to go
gradual.

M.: did you want to convert all PHP code to Hack?

Julien: we did that. In incremental changes, that's why we needed
gradual.

M.: that's the point!

Julien: some people would claim that gradual has other advantages. But
I like to think of just the static fragment now.

  function hello (bool $polite): string { ... }

The types have been enriched. We added generics and nullability. It
might seem like a detail but it's pretty important because in PHP null
is really evil, you really want to know if something can be null or
not. (Gabriel: when exactly is it not?).

Demo of Nuclide, based on ATOM.

M.: what about first-class classes?

Julien: ClassNameOf<X>.

M.: then you have row polymorphism to handle that?

Julien: no you don't.

Demo with artifically many users of a function definition (1000 users,
generated code): it goes "fast", with only 2s to type-check
everything.

Julien: this is a strange version of OCaml where the runtime has been
ripped out and changed with another runtime. We wrote a concurrent
runtime.

M.: did you contribute that back to upstream OCaml?

Julien: the code is open-source, but we haven't pushed for something
reusable really. The community is working on multicore OCaml, I'm
waiting to see how it goes.

M.: are you part of the Caml consortium?

Julien: yes, we are.

When we started working on Hack, a lot of people were not convinced
that static typing was a good idea. When we started the migration, we
had a plan to roll it all back if it were not working.

We were also working on automated conversion, figuring out the types
and adding them for you. You have to be careful though. It's very
tempting to break type erasure; but if you do that, it means that
removing the type annotation changes the semantics. It can bite you
really big time! Because then consider the automated system adding
annotations, if you can remove them easily, it's too hard to manually
edit the output.

M.: We could not figure out how to do polymorphic arrays without
changes to the language semantics.

M.: oh sorry, you are not sound.

M.: If you have any idea how to do that without type erasure, I would
love to know. We had to add a branch to every primitive to enforce
parametricity.

Julien: we have a system that does the dual of Hack. Hack assumes that
missing annotations are correct; that's what you want when you develop
code. When you compile code, you want to turn missing annotations into
a black hole; that's what HHBC does.

M.: is that whole-program? for millions of lines of code?

Brett: yes. Only two to three minutes.

M.: what is the summary format?

Brett: We add annotations to the bytecode.

M.: but sometimes annotations are huge, polynomially bigger than the whole
code.

Brett: we use unions for numbers, but we widen very agressively to
have short types.

M.: that also means that you don't ever use that type that you
analyzed back to the programmer.

Will: no but programmers may prefer the short imprecise types to huge
precise types.

Julien: no no, those types never resurface to the programmer, they're
only in the bytecode.

Strict mode should behave like a static language; we shouldn't have
anything that is unsound by design (like covariant arrays in Dart).

When an annotation is missing, we don't try to infer anything on that
type. I like it because it's very easy to give a mental model of what
this form of gradualness really is.

Quite a few people were unhappy about our transition strategy. You have two
choices:

- bottom-up, from the leaves and move up slowly

- tightening the belt, try to get everything in Hack at once with very
  few annotations, and add then one by one

What's nice with the belt strategy is social: that you stop the
bleeding early. Your developers know that there is this type-checker
and they have to know about it someway. People cannot keep adding
difficult-to-convert code.

What's not nice is that the core abstractions stay dynamic, and there
is a false sense of security that it gives, people think their code is
type-safe but it isn't, so they end up not trusting the programmer.

M.: if you go from Python to Reticulated Python in one jump, you take
a massive performance hit. But you erase types so you are not
concerned with that.

Julien: not quite; we keep the types that PHP knows about, and we
forget about the more expressive ones. But they don't quite go away,
they are fed into HHBC. I'm pretty sure we also use them on guards and
parameters.

Brett: we do actually check types on runtime, on function entry we
check typing and throw an exception.

Julien: those annotations that you have seen are kind of half-checked;
if it's primitive type we check, but if it's an array of integers
we're not going to scan it to check. Some of the types can be used for
optimization, but not all of it.

M.: performance-wise you made the correct choice. If you add too many
checks you easily get a 5X slowdown.

M.: so did your users hate you?

Kendall: when you add a type "Int", you don't have to add a performance
regression. You are never discouraged from adding types.

M.: I know, that's what is difficult with Reticulated Python.

Kendall: we have a tool in Nuclide that tells you what can be trusted.

Julien: it's not exactly true; given a file it will show you what
parts the type-check has a type for. But it's not because it has
a type that it is trusted. You can be in an array and it's
corrupted.... But in the absolute people like it better than no
information at all.

Basil: type-checking is not associated with performance regressions, in
Hack or Flow... Hack and Flow both are extremely tilted to optional
typing as a design.

We had different ways to infer annotations: instrument what's going on
at runtime, type inference... But if you add an annotation that adds
a check, and the check fails in a critical place inside Facebook, you
can have Facebook go down. So we have a "non-errror version" @String
instead of String, that just logs the violation. We can audit that. So
a three-step process (1) infer annotations (2) add the @-version of
the types (3) log for a while (4) if it works, remove the @-sign.

Will: @ is a warning, right?

Julien: yeah and you can't log each time, we sample, because otherwise
performance...

M. : I've been thinking about it for ten years. I want to have some
evidence that programmers become more productive afterwards. I would
love to have a large pool of programmers volunteer for an experiment.

Julien: I think most programmers would agree that it's better now than
it used to be.

M.: but you have no evidence.

Julien: some people wanted to measure the amount of bugs and see if it
goes down. Yohann was really into that. But the measure is
questionable. Once a bug makes it into production, there has been
a lot of safety nets; tests, QA etc. Going through the whole process
of running our tests is long, and having our quick checks is more
valuables. That doesn't show up in the bugs-per-capita metric. And
also refactoring. I don't think we are optimizing for bugs, really.

Basil: one thing we noticed, it changes the shape of the codebase. In
a dynamic typing refactoring is harder. You get variants of functions
because the old code cannot be safely changed etc.

M.: I agree. Types catch typoes. But the code structure, we can tell
if it's typed or not. Types give better shapes.

Ed: if we have version-control information, precise, could you extract
the information you have?

M.: do you know Steve ... at Brown? He run that kind of analysis from
a git history around ten years ago. I don't know what exactly his
results were.

Will: we also miss data on how much time was spent.

Ed: we have that data too. We know everything anyone did. We have
holidays too.

M.: There was the "San Francisco" project by Kathi Border(?), she
converted from C++ to Java at IBM. They claimed improvement of
a factor of over 4 in productivity. But they thought it was not the
types that contributed, but garbage collection.

Julien: one thing that you do get from static typing is the IDE
feedback.

M.: you get IntelliJ in Python.

Julien: no it doesn't work.

Gabriel: if we knew of someone that have the skills to do this work,
can we send them to you?

Julien: we would have to check with someone, but I think that should
be possible.

Julien: I've also been thinking of working from the error
messages. I've spent some time optimizing error messages. The errors
in Hack, they tell you a story. Where the error is, and then they
point you to the places in the code that you would have to modify to
fix the problem. Usually when you have typing error there is
a constructor that imposes a type equality or subtyping
constraint. But what is missing is "what makes you think that this has
this type and that has that type".

Mitch: how long does the list get?

Julien: usually only 2. Whenever there is a join/unification, we
maintain a witness around. When there is more than one witness we only
maintain one. We do whack-a-mole, we call the type-checker again. If
you have 2 witnesses it's an int, and 2000 it's a string, usually it's
int. Often you don't need to keep more than one witness for each, but
sometimes you do; explaining the variance of a generic parameter for
example, it's very difficult. Enforcing that that a T only occurs in
a covariant position is complicated. In this case we keep a longer
trace: "here is why we think this is a contravariant position".

M.: but usually there are three sites.

Julien: the two are the ones you want to fix; the third site is the
error site.

Jan: where is Hack going?

Julien: we hired a bunch of people to make the system better. Andrew
Kennedy just joined us in London, he is doing very nice work. The type
system accumulated a lot of complexity, I think it's time for
a cleanup. When we introduced Hack we optimized for avoiding
friction. The way we removed friction is by adding complexity to the
type system to accomodate idioms that already exist. But now this has
changed, people like Hack, we should be able to take that complexity
away – replace it with more principled constructions – even if it
provokes change in the codebase. Now people are willing to help us
refactor that code. Another big chunk is performance, we need to
consume less memory, be more efficient, that's an ongoing effort all
the time.

Jan: any plans to retain types throughout? Why throw them away?

Ed: yeah, that's part of the plan too.

M.: you want to move to soundness. Even when types are missing.

Julien: HHBBC is sound but more restrictive. The two should meet in
the middle; Hack should be able to give hints to HHBBC for things it
cannot infer, such as parametric polymorphism. Right now we are still
trying to type-check more and more of our codebase.

Kendall: if you look at points that could have an annotation, we're at about
70% of sites annotated, opposed to 0% two years ago.

M.: we had to add 5% more code lines to make typing work. If I have an
algebraic datatype, I have to write it down. Class declarations,
struct declarations.

Kendall: I would guess it would be a small number. With generics it
can get larger.

Julien: bear in mind we have no duck typing. Everything is nominal. It
was copied from Java, jncluding visibility, visibility works
differently but I think it is by mistake. No ducktyping means that in
PHP you have to write the class hierarchy anyway.

Gabriel: did you get complains when asking people to write OCaml code?

Julien: the Hack team is not that big. A lot of problems would be
avoided in our case. Something that comes back very often is people
that would like to contribute to the checker but don't know
OCaml. I think it's fishy if they want to hack on a programming, but
can't learn a new language?

Basil: The biggest complain is the lack of a debugger.

...: Have you considered F#?

Julien: I like F#, but by the time it was not opensource. I would
consider it today, but I would be concerned with the complexity of
performance reasoning. What's nice with OCaml is that it has a really
simple, predicable compiler. What you see is what you get. Little
global optimization, etc. But yeah, today we would consider F# more
seriously.

M.: how many OCaml programmers do you have?

Julien: There is our group, but also Flow, Reason... We would say
around 30-35 if we had to guess. The amount of people that write OCaml
*everyday* at Facebook. Maybe a hundred in total if you count
occasional committers.

Julien: most of the OCaml stuff that we do today has to do with static
analysis. For those kind of profiles, it's very rare for those people
not to know or not to like ML. Usually what they say is that they
don't care as long as they have algebraic datatypes. As long as it's
functional. As long as OCaml is confided to that world, I see no
problem going forward. If we wanted to grow to a larger audience... We
would have to invest in the tooling. Some investment is going on, if
you look at Merlin for example.

M.: if you are mostly focused on static analysis, have you considered
going on with a DSL? Is there a language that expresses your thoughts
more clearly?

Julien: my knee-jerk reaction as an engineer is "does your thing
scale?". That's an interesting question. What's nice with having
a general language, if you need something outside the scope...

Gabriel: before a DSL there is a library. Do you reuse code between,
say, Flow and Hack?

Basil: all the parallelization, incrementalization work is
shared. It's a set of APIs.

On Thu, Oct 20, 2016 at 5:09 PM, Daniel Patterson <dbp at ccs.neu.edu> wrote:

> NUPRL Seminar presents a *dual seminar*
>
> Julien Verlaguet & Brett Simmers
> Facebook
> Host: Jan Vitek
>
> 1:30 - 3:00PM
> Tuesday, October 25
> Room 366 WVH (http://www.ccs.neu.edu/home/wand/directions.html)
>
>
> The Hack Experiment & HHVM, Then And Now
>
> Abstract:
>
> The Hack programming language developed at Facebook is an evolution of
> PHP. With tens of millions of lines of PHP code in house, the lack of type
> and poor IDE support were felt to be major threats to Facebook. We designed
> Hack gradually, adding feature at regular intervals and encouraging
> developers to adopt them in their code. This talk will touch on the design
> of the Hack language, its type system, the implementation of the type
> checker, and the social processes involved in converting 20 millions lines
> of untyped code to a rich type system.
>
> HHVM is an open source virtual machine for PHP and Hack programs. It was
> developed by Facebook but is also used by Wikipedia, Baidu, Etsy, and many
> others. This talk will discuss the history of the project, what goes on
> behind the scenes while executing your PHP code, and what we're currently
> working on.
>
> Bio:
>
> Julien Verlaguet is a OCaml programmer with a Master from Université
> Pierre et Marie Curie in Paris. Before joining Facebook in 2011, he worked
> at Esterl technologies on verified compilation. At Facebook he designed the
> Hack programming language, a typed offspring of PHP and managed to convince
> the majority of developers in the company to add types to their code.
>
> Brett Simmers is a Software Engineer at Facebook, where he's been working
> on HHVM for the past five years. He primarily works on the internals of the
> JIT compiler and HHVM's profile-guided optimizations. Before joining
> Facebook he spent two years at VMware working on the Linux version of
> VMware Workstation.
>
> _______________________________________________
> pl-seminar mailing list
> pl-seminar at lists.ccs.neu.edu
> https://lists.ccs.neu.edu/bin/listinfo/pl-seminar
>
>
-------------- next part --------------
HTML attachment scrubbed and removed