Getting Started

Header File and Source File

Peppa PEG has a header file and a C file, so you can add it to your project by copying files peppa.h and peppa.c.

Peppa PEG assumes your project is ANSI C (C89, or C90) compatible.

Once copied, add include macro and start using the library:

#include "peppa.h"

Data Structures

All of the data structures and functions provided by Peppa PEG start with P4_ (Count the number of character “P” in P e p p a P E G).

Let’s learn some basic P4 data structures:

  • P4_Grammar: The grammar object that defines all grammar rules.

  • P4_Expression: The grammar rule object.

  • P4_Source: The content to parse.

  • P4_Node. The node of an AST (abstract syntax tree), e.g. the final parsing result of a grammar given the source.

Step 1: Create Grammar

In Peppa PEG, we always start with creating a P4_Grammar. We create such a data structure using P4_LoadGrammar().

    grammar = P4_LoadGrammar("entry = i\"hello\\nworld\";");
    if (grammar == NULL) {
        printf("Error: CreateGrammar: Error.\n");
        return 1;
    }

Step 2: Create Source

We create a P4_Source using P4_CreateSource().

    source = P4_CreateSource("Hello\nWORLD", "entry");
    if (source == NULL) {
        printf("Error: CreateSource: MemoryError.\n");
        return 1;
    }

The first parameter is the content of the source. The second parameter is the name of entry grammar rule.

In this example, we have only single rule entry, so the name can only be “entry”.

Step 3: Parse

Now the stage is setup; call P4_Parse(). If everything is okay, it returns a zero value - P4_Ok.

    if (P4_Parse(grammar, source) != P4_Ok) {
        printf("Error: Parse: ErrCode[%u] Err[%s] Message[%s]\n",
            P4_GetError(source),
            P4_GetErrorString(P4_GetError(source)),
            P4_GetErrorMessage(source)
        );
        return 1;
    }

Step 4: Traverse AST

P4_Source contains a tree if parse successfully. We get the root node of such a tree using P4_GetSourceAst().

To traverse the AST,

  • node->head is the first children.

  • node->tail is the last children.

  • node->next is the next sibling.

  • node->slice.start is the start position in the source string that the slice covers.

  • node->slice.stop is the end position in the source string that the slice covers.

  • P4_CopyNodeString() returns the string the AST node covers.

    root = P4_GetSourceAst(source);
    text = P4_CopyNodeString(root);

    printf("root span: [%lu %lu]\n", root->slice.start.pos, root->slice.stop.pos);
    printf("root start: line=%lu offset=%lu\n", root->slice.start.lineno, root->slice.start.offset);
    printf("root stop: line=%lu offset=%lu\n", root->slice.stop.lineno, root->slice.stop.offset);
    printf("root next: %p\n", (void *)root->next);
    printf("root head: %p\n", (void *)root->head);
    printf("root tail: %p\n", (void *)root->tail);
    printf("root text: %s\n", text);

It may be helpful to output the source AST in JSON format:

    P4_JsonifySourceAst(stdout, root, NULL);

Step 5: Clean Up

Last but not least, don’t forget to free all the allocated memory.

    P4_DeleteSource(source);
    P4_DeleteGrammar(grammar);

Full Example Code

The complete code for this example:

#include <stdio.h>
#include <peppa.h>

int main() {
    P4_Grammar* grammar;
    P4_Source* source;
    P4_Node* root;
    char* text;

    grammar = P4_LoadGrammar("entry = i\"hello\\nworld\";");
    if (grammar == NULL) {
        printf("Error: CreateGrammar: Error.\n");
        return 1;
    }

    source = P4_CreateSource("Hello\nWORLD", "entry");
    if (source == NULL) {
        printf("Error: CreateSource: MemoryError.\n");
        return 1;
    }

    if (P4_Parse(grammar, source) != P4_Ok) {
        printf("Error: Parse: ErrCode[%u] Err[%s] Message[%s]\n",
            P4_GetError(source),
            P4_GetErrorString(P4_GetError(source)),
            P4_GetErrorMessage(source)
        );
        return 1;
    }

    root = P4_GetSourceAst(source);
    text = P4_CopyNodeString(root);

    printf("root span: [%lu %lu]\n", root->slice.start.pos, root->slice.stop.pos);
    printf("root start: line=%lu offset=%lu\n", root->slice.start.lineno, root->slice.start.offset);
    printf("root stop: line=%lu offset=%lu\n", root->slice.stop.lineno, root->slice.stop.offset);
    printf("root next: %p\n", (void *)root->next);
    printf("root head: %p\n", (void *)root->head);
    printf("root tail: %p\n", (void *)root->tail);
    printf("root text: %s\n", text);

    free(text);

    P4_JsonifySourceAst(stdout, root, NULL);

    P4_DeleteSource(source);
    P4_DeleteGrammar(grammar);

    return 1;
}

The output of the example looks like:

$ gcc -o example ../example.c ../peppa.c
$ ./example
root span: [0 11]
root start: line=1 offset=1
root stop: line=2 offset=6
root next: (nil)
root head: (nil)
root tail: (nil)
root text: Hello
WORLD
[{"slice":[0,11],"type":"entry"}]

Conclusion

In this tutorial, we walk through the basic data structures and combine them in one example. The example parses nothing but “Hello World” to a single node.

I hope this example can get you a basic understanding of Peppa PEG. Now you can go back to Peppa PEG’s Documentation and pick more docs to read!