Introducing scanf in Common Lisp

There are many ways to extract data from strings or files. The scanf family of function offers one of them.

These functions scan input according to a provided format string. The format string might contain conversion specifiers or conversion directives to extract integers, floating-point numbers, characters, strings, etc. from the input and store it in the arguments.

For example, a format string for parsing components of an IP address might look like: “%3d.%3d.%3d.%3d” — the scanf function will parse 4 integers (maximum 3 digits each) that are delimited with dots and return them to the caller.

There are basically two ways to implement scanf:

  1. as an interpreter, that scans format string and executes commands as they are retrieved.
  2. as a translator to an intermediate language that, in turn, is compiled into machine code.

The trivial-scanf package (that comes as the part of the CL-STRING-MATCH library) takes the first approach. The trivial-scanf implementation reads one character at a time, and depending on the read character performs the designated operation. Underneath, it uses PROC-PARSE library to deal with the input. Outline of the function’s main loop looks as follows:

 (while (< fmt-pos fmt-len))
 (for c = (char fmt fmt-pos))
 (case c
    ;; process conversion directive
   ((#\Space #\Tab #\Return #\Newline #\Page)
    ;; process white space characters
    ;; process ordinary characters

Conversion directives might have optional flags and parameters that must be taken into account. Simple directives, like %d, are handled in a straightforward way: input matching to the designated data type (digits) are bound to a string that is then parsed using corresponding function (parse-integer in this case).

However, the standard scanf also specifies a directive to match a set of designated characters. For example, directive ‘%[a-z0-9-]’ would scan input and return a string composed of letters, digits, and a dash from the current position, until first mismatch. In case, if we dealt with an octet-string (a string where every character is guaranteed to be a single byte in size), it would be feasible to interpret this directive using a table to mark characters that belong to the set. The trivial-scanf takes another approach: characters set directive is converted into a list of closures that serve as predicates for the input string binding operation. In our example, the list of closures would contain predicates for: (range #\a…#\z), (range #\0…#\9) (character #\).

trivial-scanf will be accessible through Quicklisp after the next packages update. At the moment you can clone the repository and install it locally.

Some usage examples:

(ql:quickload :trivial-scanf)
(snf:scanf "%3d.%3d.%3d.%3d" "") => (127 0 0 1)
(snf:scanf "%d %[A-C] %d" "1  ABBA  2") => (1 "ABBA" 2)

This the first (almost alpha) release of the code, so some bugs are expected. Feel free to comment or submit them.

trivial-scanf is the part of the CL-STRING-MATCH library.

CL-GDAL hits a milestone

CL-GDAL project that aims at creating high-level Lisp bindings to the GDAL Geospatial Data Abstraction Library has hit an important milestone: the small demo app was able to render a simple map of a vector geographic map using the high-level wrappers to the lower-level CFFI bindings. The sources are not yet ready for a production use and can be seen in the devel branch of the code repository.


There is still a long path to walk so any contributions are welcome. Please take a look at the list of open issues and feel free to either comment or join.