Tutorial: JSON

JSON, or JavaScript Object Notation, is a widely used data exchange format, using human readable text to store and transit data. As a subset of JavaScript, JSON itself is language independent.

There are quite several specs trying to standardize the JSON format, such as RFC 4527, RFC 7159, RFC 8259, ECMA 262, ECMA 404, ISO/IEC 21778:2017, etc. As hinted on json.org, ECMA-404 is now the JSON data interchange standard.

In this tutorial, we will write a JSON parser using Peppa PEG. There are already many JSON parsers in the wild, but learning how to write a complete JSON parser gives us a better understanding of Peppa PEG. In the end, you should be able to write a parser for other grammars, as the mindset to develop any new grammar using Peppa PEG is the same.

Step 1: Define Grammar

Let’s create a new file “json.h”.

#include "peppa.h"

We’ll create the grammar using P4_LoadGrammar(). The grammar in Peppa PEG form is identical to the one described on page json.org.

P4_Grammar*  P4_CreateJSONGrammar() {
    return P4_LoadGrammar(
        "@lifted\n"
        "entry = &. value !.;\n"

        "@lifted\n"
        "value = object / array / string / number / true / false / null;\n"

        "object = \"{\" (item (\",\" item)*)? \"}\";\n"
        "item = string \":\" value;\n"

        "array = \"[\" (value (\",\" value)*)? \"]\";\n"

        "@tight\n"
        "string = \"\\\"\" ([\\u0020-\\u0021] / [\\u0023-\\u005b] / [\\u005d-\\U0010ffff] / escape )* \"\\\"\";\n"

        "true = \"true\";\n"
        "false = \"false\";\n"
        "null = \"null\";\n"

        "@tight @squashed\n"
        "number = minus? integral fractional? exponent?;\n"

        "@tight @squashed @lifted\n"
        "escape = \"\\\\\" (\"\\\"\" / \"/\" / \"\\\\\" / \"b\" / \"f\" / \"n\" / \"r\" / \"t\" / unicode);\n"

        "@tight @squashed"
        "unicode = \"u\" ([0-9] / [a-f] / [A-F]){4};\n"

        "minus = \"-\";\n"
        "plus = \"+\";\n"

        "@squashed @tight\n"
        "integral = \"0\" / [1-9] [0-9]*;\n"

        "@squashed @tight\n"
        "fractional = \".\" [0-9]+;\n"

        "@tight"
        "exponent = i\"e\" (plus / minus)? [0-9]+;\n"

        "@spaced @lifted\n"
        "whitespace = \" \" / \"\\r\" / \"\\n\" / \"\\t\";\n"
    );
}

Step 2: Parse

Let’s create a new file “parse_json.c” and parse a JSON array.

The main function does below things:

int main(int argc, char* argv[]) {
    P4_Grammar* grammar = P4_CreateJSONGrammar();
    const P4_String input = "[1,2.0,3e1,true,false,null,\"xyz\",{},[]]";
    P4_Source* source = P4_CreateSource(input, P4_JSONEntry);
    P4_Error err = P4_Parse(grammar, source);
    P4_Token* root = P4_GetSourceAst(source);
    P4_JsonifySourceAst(stdout, root);
    P4_DeleteSource(source);
    P4_DeleteGrammar(grammar);
    return 0;
}

Run:

$ gcc ../examples/parse_json.c ../peppa.c
$ ./a.out
[{"slice":[0,39],"type":"array","children":[{"slice":[1,2],"type":"number"},{"slice":[3,6],"type":"number"},{"slice":[7,10],"type":"number"},{"slice":[11,15],"type":"true"},{"slice":[16,21],"type":"false"},{"slice":[22,26],"type":"null"},{"slice":[27,32],"type":"string"},{"slice":[33,35],"type":"object"},{"slice":[36,38],"type":"array"}]}]