Cyber Grand Shellphish


Antonio Bianchi, Kevin Borgolte, Jacopo Corbetta, Francesco Disperati, Andrew Dutcher, John Grosen, Paul Grosen, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Nick Stephens, Giovanni Vigna, Ruoyu Wang


Phrack (Volume 15, Issue 70), January 2017


Hacking is often considered more than a skill. In popular culture, hackers are seen as wizards of sorts, artists with powers to access the inaccessible, or perform acts that seem impossible. Hacking, like art, has great people, who are recognized for their skills, but whose abilities cannot be captured or reproduced. In fact, a single great hacker in a team is better than a hundred mediocre ones, similar to as none of the paintings from a hundred mediocre artists can match a painting from van Gogh.

Vulnerability analysis is the science of capturing and reproducing what some hackers do. Vulnerability analysis studies how one can reason, in a principled way, about finding and exploiting vulnerabilities in all types of software or hardware. By developing algorithms and tools to help humans identify flaws in software, researchers "codify" the knowledge that hackers use, in an organic way, to analyze systems and find their flaws. The resulting tools can then be used at scale, and composed to create new analysis systems.

This scientific process has generated a number of useful tools, such as static analysis tools, fuzzers, and symbolic execution frameworks. However, these tools codify only a subset of the skills of a hacker, and they are still used only to augment the abilities of humans.

One approach to push the codification of what human hackers do, is to take the hackers out of the equation. This is precisely what the DARPA Cyber Grand Challenge was set out to do.

The DARPA Cyber Grand Challenge (CGC) was designed as a Capture The Flag (CTF) competition among autonomous systems without any humans being involved. During the competition, Cyber Reasoning Systems (CRSs) would find vulnerabilities in binaries, exploit them, and generate patches to protect them from attacks, without any human involvement at all.

The separation between human and machine is key, as it forces the participants to codify, in algorithms, the techniques used for both attack and defense. Although the competition was only a first step toward capturing the art of hacking, it was an important one: for the first time, completely autonomous systems were hacking one another with code, and not human intuition, driving the discovery of flaws in complex software systems.

Shellphish is a team that was founded by Professor Giovanni Vigna at UC Santa Barbara in 2005 to participate in the DEF CON CTF with his graduate students. Since then, Shellphish has evolved to include dozens of individuals (graduate students - now professors elsewhere, undergraduate students, visitors, their friends, etc.) who are somewhat connected by the Security Lab at UC Santa Barbara, but who are now spread all across the world. Nonetheless, Shellphish has never lost its "hackademic" background and its interest in the science behind hacking. Participation in many CTF competitions sparked novel research ideas, which, in addition to publications, resulted in tools, which, in turn, were put to good use during CTF competitions.

Given the academic focus of Shellphish, it is no surprise that the DARPA Cyber Grand Challenge seemed like a great opportunity to put the research carried out at the UC Santa Barbara SecLab to work. Unfortunately, when the call for participation for the funded track came out, the lab was (as usual) busy with a great number of research projects and research endeavors, and there were simply no cycles left to dedicate to this effort. However, when the call for the qualification round that was open to anybody who wanted to throw their hat in the ring was announced, a group of dedicated students from the SecLab decided to participate, entering the competition as team Shellphish.

With only a few weeks to spare, the Shellphish team put together a prototype of a system that automatically identifies crashes in binaries using a novel composition of fuzzing and symbolic execution. Unsurprisingly, the system was largely unstable and crashed more than the binaries it was supposed to crash. Yet, it performed well and Shellphish was one of the seven teams (out of more than a hundred participants) that qualified for the final event. Since Shellphish was not initially funded by DARPA through the funded track, it received a $750,000 award to fund the creation of the autonomous system that would participate in the final competition.

The following few months focused mostly on basic research, which resulted in a number of interesting scientific results in the field of binary analysis [Driller16, ArtOfWar16], and to the dramatic improvement of angr [angr], an open-source framework created at the UC Santa Barbara SecLab to support the analysis of binaries.

Eventually, the pressure to create a fully autonomous system increased to the point that academic research had to be traded for system-building. During the several months preceding the final competition event, all the energy of the Shellphish team focused on creating a solid system that could be resilient to failure, perform at scale, and be able not only to crash binaries, but also to generate reliable exploits and patches.

After gruesome months of Sushi-fueled work that lead to severe Circadian rhythm sleep disorder [Inversion] in many of the team's members, the Mechanical Phish Cyber Reasoning System was born. Mechanical Phish is a highly-available, distributed system that can identify flaws in DECREE binaries, generate exploits (called Proofs Of Vulnerability, or POVs), and patched binaries, without human intervention. In a way, Mechanical Phish represents a codification of some of the hacking skills of Shellphish.

Mechanical Phish participated in the final event, held in Las Vegas on August 4th, in conjunction with DEF CON, and placed third, winning a $750,000 prize. As a team, we were ecstatic: our system performed more successful exploits than any other CRS, and it was exploited on fewer challenges than any other CRS. Of course, in typical Shellphish style, the game strategy is where we lost points, but the technical aspects of our CRS were some of the best.

We decided to make our system completely open-source (at the time of writing, Shellphish is the only team that decided to do so), so that others can build upon and improve what we put together.

The rest of this article describes the design of our system, how it performed, and the many lessons learned in designing, implementing, and deploying Mechanical Phish.

  title     = {{Cyber Grand Shellphish}},
  author    = {Bianchi, Antonio and Borgolte, Kevin and Corbetta, Jacopo and Disperati, Francesco and Dutcher, Andrew and Grosen, John and Grosen, Paul and Machiry, Aravind and Salls, Christopher and Shoshitaishvili, Yan and Stephens, Nick and Vigna, Giovanni and Wang, Ruoyu},
  month     = {January},
  year      = {2017},
  journal   = {Phrack},
  note      = {(Authors listed alphabetically)},
  number    = {70},
  url       = {},
  volume    = {15}