Parse `.ini`
Monadic Parsing Theory2
What is parsing?
- Define grammer;
- Takes input string;
- Generate data with structure you want.
What do we need?
- LL, LR, LALR, CYK? No.
- Manually? No.
- YACC, Antlr? No.
- Monadic parsers & combinations! Yes.
Simple:
type Parser a = String -> [(a, String)]
Parser abstraction and Monads:
;; takes any input and "consume" first char from it
(defn any [input]
(if (empty? input) '()
(list [(first input)
(apply str (rest input))])))
;; this one doesn't accept any input
(defn failure [_] '())
(defn parse [parser input]
(parser input))
(defn parse-all [parser input]
(->> input
(parse parser)
(filter #(= "" (second %)))
ffirst))
;; builds parser that always returns given element without consuming (changing) input
(defn return [v]
(fn [input] (list [v input])))
;; takes parser and function that builds new parsers from (each) result of applying first one
(defn >>= [m f]
(fn [input]
(->> input
(parse m)
(mapcat (fn [[v tail]] (parse (f v) tail))))))
(defn merge-bind [body bind]
(if (and (not= clojure.lang.Symbol (type bind))
(= 3 (count bind))
(= '<- (second bind)))
`(>>= ~(last bind) (fn [~(first bind)] ~body))
`(>>= ~bind (fn [~'_] ~body))))
(defmacro do* [& forms]
(reduce merge-bind (last forms) (reverse (butlast forms))))
Basic Parsers and Combinators
(defn sat [pred]
(>>= any (fn [v] (if (pred v) (return v) failure))))
;; just a helper
(defn char-cmp [f]
(fn [c] (sat (partial f (first c)))))
;; recognizes given char
(def match (char-cmp =))
;; rejects given char
(def noneOf (char-cmp not=))
;; just a helper
(defn from-re [re]
(sat (fn [v] (not (nil? (re-find re (str v)))))))
;; recognizes any digit
(def digit (from-re #"[0-9]"))
;; recognizes any letter
(def letter (from-re #"[a-zA-Z]"))
;; (ab)
(defn and-then [p1 p2]
(do*
(r1 <- p1)
(r2 <- p2)
;; xxx: note, that it's dirty hack to use STR to concat outputs
;; Full functional implementation should use MonadPlus protocol
(return (str r1 r2))))
;; (a|b)
(defn or-else [p1 p2]
(fn [input]
(lazy-cat (parse p1 input) (parse p2 input))))
(declare plus)
(declare optional)
;; (a*)
(defn many [parser] (optional (plus parser)))
;; (a+) equals to (aa*)
(defn plus [parser]
(do*
(a <- parser)
(as <- (many parser))
(return (cons a as))))
;; (a?)
(defn optional [parser] (or-else parser (return "")))
What is INI file?
The INI file format is an informal standard for configuration files for some platforms or software. INI files are simple text files with a basic structure composed of “sections” and “properties”.1 It is human-readable and simple to parse, so it is a usable format for configuration files that do not require much greater complexity.
Format
- Keys
The basic element contained in an INI file is the key or property. Every key has a name and a value, delimited by an equals sign (=). The name appears to the left of the equals sign.
name=value
- Sections
Keys may (but need not) be grouped into arbitrarily named sections. The section name appears on a line by itself, in square brackets ([ and ]). All keys after the section declaration are associated with that section. There is no explicit “end of section” delimiter; sections end at the next section declaration, or the end of the file. Sections may not be nested.
-
Case Not Sensitive
-
Comments
Semicolons (;) at the beginning of the line indicate a comment. Comment lines are ignored.
Example
; Development configuration
[database]
host=localhost
port=3306
db=test_vagrant9200
user=eye
passwd=sauron
What do we want to get?
{
:database {
:host "localhost",
:port "3306",
:db "test_vagrant9200",
:user "eye",
:passwd "sauron"
}
}
INI Rule
From wikipedia ini definition:
data Property = Property String String deriving Show
data Section = Section String [Property] deriving Show
In Clojure we can use records to represent data types:
(defrecord property [key value])
(defrecord section [name properties])
Now lets define property
and section
parsers:
(def property-parser
(do*
(k <- (many (noneOf "[")))
spaces
(match "=")
spaces
(v <- (many (noneOf "\n")))
spaces
(return (Property. (trim (apply str k)) (trim (apply str v)) ))))
(def section-parser
(do*
spaces
(match "[")
(name <- (plus (noneOf "]")))
(match "]")
spaces
(properties <- (plus property-parser))
(return (Section. (apply str name) properties))))
(parse-all property-parser "port = 3306 ")
;= #user.Property{:key "port", :value "3306"}
(parse-all section-parser "
[database]
host=127.0.0.1
port=3306
db=test_vagrant9200
user=eye
passwd=sauron
")
;= #user.Section{:name "database", :properties (#user.Property{:key "host", :value "127.0.0.1"} #user.Property{:key "port", :value "3306"} #user.Property{:key "db", :value "test_vagrant9200"} #user.Property{:key "user", :value "eye"} #user.Property{:key "passwd", :value "sauron"})}