Parse `.ini`

Monadic Parsing Theory2

What is parsing?

  • Define grammer;
  • Takes input string;
  • Generate data with structure you want.

What do we need?

  • LL, LR, LALR, CYK? No.
  • Manually? No.
  • YACC, Antlr? No.
  • Monadic parsers & combinations! Yes.

Simple:

type Parser a = String -> [(a, String)]

Parser abstraction and Monads:

;; takes any input and "consume" first char from it
(defn any [input]
  (if (empty? input) '()
      (list [(first input)
             (apply str (rest input))])))

;; this one doesn't accept any input
(defn failure [_] '())

(defn parse [parser input]
  (parser input))

(defn parse-all [parser input]
  (->> input
       (parse parser)
       (filter #(= "" (second %)))
       ffirst))

;; builds parser that always returns given element without consuming (changing) input
(defn return [v]
  (fn [input] (list [v input])))

;; takes parser and function that builds new parsers from (each) result of applying first one
(defn >>= [m f]
  (fn [input]
    (->> input
         (parse m)
         (mapcat (fn [[v tail]] (parse (f v) tail))))))

(defn merge-bind [body bind]
  (if (and (not= clojure.lang.Symbol (type bind))
           (= 3 (count bind))
           (= '<- (second bind)))
    `(>>= ~(last bind) (fn [~(first bind)] ~body))
    `(>>= ~bind (fn [~'_] ~body))))

(defmacro do* [& forms]
  (reduce merge-bind (last forms) (reverse (butlast forms))))

Basic Parsers and Combinators

(defn sat [pred]
  (>>= any (fn [v] (if (pred v) (return v) failure))))

;; just a helper
(defn char-cmp [f]
  (fn [c] (sat (partial f (first c)))))

;; recognizes given char
(def match (char-cmp =))

;; rejects given char
(def noneOf (char-cmp not=))

;; just a helper
(defn from-re [re]
  (sat (fn [v] (not (nil? (re-find re (str v)))))))

;; recognizes any digit
(def digit (from-re #"[0-9]"))

;; recognizes any letter
(def letter (from-re #"[a-zA-Z]"))

;; (ab)
(defn and-then [p1 p2]
  (do*
   (r1 <- p1)
   (r2 <- p2)
   ;; xxx: note, that it's dirty hack to use STR to concat outputs
   ;; Full functional implementation should use MonadPlus protocol
   (return (str r1 r2))))

;; (a|b)
(defn or-else [p1 p2]
  (fn [input]
    (lazy-cat (parse p1 input) (parse p2 input))))

(declare plus)
(declare optional)

;; (a*)
(defn many [parser] (optional (plus parser)))

;; (a+) equals to (aa*)
(defn plus [parser]
  (do*
   (a <- parser)
   (as <- (many parser))
   (return (cons a as))))

;; (a?)
(defn optional [parser] (or-else parser (return "")))

What is INI file?

The INI file format is an informal standard for configuration files for some platforms or software. INI files are simple text files with a basic structure composed of “sections” and “properties”.1 It is human-readable and simple to parse, so it is a usable format for configuration files that do not require much greater complexity.

Format

  • Keys

The basic element contained in an INI file is the key or property. Every key has a name and a value, delimited by an equals sign (=). The name appears to the left of the equals sign.

name=value
  • Sections

Keys may (but need not) be grouped into arbitrarily named sections. The section name appears on a line by itself, in square brackets ([ and ]). All keys after the section declaration are associated with that section. There is no explicit “end of section” delimiter; sections end at the next section declaration, or the end of the file. Sections may not be nested.

  • Case Not Sensitive

  • Comments

Semicolons (;) at the beginning of the line indicate a comment. Comment lines are ignored.

Example

; Development configuration

[database]
host=localhost
port=3306
db=test_vagrant9200
user=eye
passwd=sauron

What do we want to get?

{
  :database {
    :host "localhost",
    :port "3306",
    :db "test_vagrant9200",
    :user "eye",
    :passwd "sauron"
  }
 }

INI Rule

From wikipedia ini definition:

data Property = Property String String deriving Show
data Section = Section String [Property] deriving Show

In Clojure we can use records to represent data types:

(defrecord property [key value])
(defrecord section [name properties])

Now lets define property and section parsers:

(def property-parser
  (do*
   (k <- (many (noneOf "[")))
   spaces
   (match "=")
   spaces
   (v <- (many (noneOf "\n")))
   spaces
   (return (Property. (trim (apply str k)) (trim (apply str v)) ))))
(def section-parser
  (do*
   spaces
   (match "[")
   (name <- (plus (noneOf "]")))
   (match "]")
   spaces
   (properties <- (plus property-parser))
   (return (Section. (apply str name) properties))))

(parse-all property-parser "port = 3306  ")
;= #user.Property{:key "port", :value "3306"}
(parse-all section-parser "
[database]

host=127.0.0.1
port=3306
db=test_vagrant9200
user=eye
passwd=sauron

")
;= #user.Section{:name "database", :properties (#user.Property{:key "host", :value "127.0.0.1"} #user.Property{:key "port", :value "3306"} #user.Property{:key "db", :value "test_vagrant9200"} #user.Property{:key "user", :value "eye"} #user.Property{:key "passwd", :value "sauron"})}

Resource

Monadic Parsing in Python by Alexey Kachayev