Introducing scanf in Common Lisp

There are many ways to extract data from strings or files. The scanf family of function offers one of them.

These functions scan input according to a provided format string. The format string might contain conversion specifiers or conversion directives to extract integers, floating-point numbers, characters, strings, etc. from the input and store it in the arguments.

For example, a format string for parsing components of an IP address might look like: “%3d.%3d.%3d.%3d” — the scanf function will parse 4 integers (maximum 3 digits each) that are delimited with dots and return them to the caller.

There are basically two ways to implement scanf:

  1. as an interpreter, that scans format string and executes commands as they are retrieved.
  2. as a translator to an intermediate language that, in turn, is compiled into machine code.

The trivial-scanf package (that comes as the part of the CL-STRING-MATCH library) takes the first approach. The trivial-scanf implementation reads one character at a time, and depending on the read character performs the designated operation. Underneath, it uses PROC-PARSE library to deal with the input. Outline of the function’s main loop looks as follows:

(iter
 (while (< fmt-pos fmt-len))
 (for c = (char fmt fmt-pos))
 (case c
   (#\%
    ;; process conversion directive
    )
   ((#\Space #\Tab #\Return #\Newline #\Page)
    ;; process white space characters
    )
   (otherwise
    ;; process ordinary characters
    )))

Conversion directives might have optional flags and parameters that must be taken into account. Simple directives, like %d, are handled in a straightforward way: input matching to the designated data type (digits) are bound to a string that is then parsed using corresponding function (parse-integer in this case).

However, the standard scanf also specifies a directive to match a set of designated characters. For example, directive ‘%[a-z0-9-]’ would scan input and return a string composed of letters, digits, and a dash from the current position, until first mismatch. In case, if we dealt with an octet-string (a string where every character is guaranteed to be a single byte in size), it would be feasible to interpret this directive using a table to mark characters that belong to the set. The trivial-scanf takes another approach: characters set directive is converted into a list of closures that serve as predicates for the input string binding operation. In our example, the list of closures would contain predicates for: (range #\a…#\z), (range #\0…#\9) (character #\).

trivial-scanf will be accessible through Quicklisp after the next packages update. At the moment you can clone the repository and install it locally.

Some usage examples:

(ql:quickload :trivial-scanf)
(snf:scanf "%3d.%3d.%3d.%3d" "127.0.0.1") => (127 0 0 1)
(snf:scanf "%d %[A-C] %d" "1  ABBA  2") => (1 "ABBA" 2)

This the first (almost alpha) release of the code, so some bugs are expected. Feel free to comment or submit them.

trivial-scanf is the part of the CL-STRING-MATCH library.

Advertisements

2 thoughts on “Introducing scanf in Common Lisp

  1. Hi Patrick,

    no, I didn’t, since FORMAT directives are much more sophisticated and complicated. Scanf is rather simple and plain.

    However, the current implementation is still far from perfect as it lacks ability to handle input streams and it is also possible to implement a compiler.

    Thanks,
    Victor

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s