in which the cost of structured data is reduced


Last year I got the wonderful opportunity to attend RacketCon as it was hosted only 30 minutes away from my home. The two-day conference had a number of great talks on the first day, but what really impressed me was the fact that the entire second day was spent focusing on contribution. The day started out with a few 15- to 20-minute talks about how to contribute to a specific codebase (including that of Racket itself), and after that people just split off into groups focused around specific codebases. Each table had maintainers helping guide other folks towards how to work with the codebase and construct effective patch submissions.

lensmen chronicles

I came away from the conference with a great sense of appreciation for how friendly and welcoming the Racket community is, and how great Racket is as a swiss-army-knife type tool for quick tasks. (Not that it's unsuitable for large projects, but I don't have the opportunity to start any new large projects very frequently.)

The other day I wanted to generate colored maps of the world by categorizing countries interactively, and Racket seemed like it would fit the bill nicely. The job is simple: show an image of the world with one country selected; when a key is pressed, categorize that country, then show the map again with all categorized countries colored, and continue with the next country selected.

I have yet to see a language/framework more accessible and straightforward out of the box for drawing1. Here's the entry point which sets up state and then constructs a canvas that handles key input and display:

(define (main path)
  (let ([frame (new frame% [label "World color"])]
        [categorizations (box '())]
        [doc (call-with-input-file path read-xml/document)])
    (new (class canvas%
           (define/override (on-char event)
             (handle-key this categorizations (send event get-key-code)))
         [parent frame]
         [paint-callback (draw doc categorizations)])
    (send frame show #t)))

While the class system is not one of my favorite things about Racket (most newer code seems to avoid it in favor of generic interfaces in the rare case that polymorphism is truly called for), the fact that classes can be constructed in a light-weight, anonymous way makes it much less onerous than it could be. This code sets up all mutable state in a box which you use in the way you'd use a ref in ML or Clojure: a mutable wrapper around an immutable data structure.

The world map I'm using is an SVG of the Robinson projection from Wikipedia. If you look closely there's a call to bind doc that calls call-with-input-file with read-xml/document which loads up the whole map file's SVG; just about as easily as you could ask for.

The data you get back from read-xml/document is in fact a document struct, which contains an element struct containing attribute structs and lists of more element structs. All very sensible, but maybe not what you would expect in other dynamic languages like Clojure or Lua where free-form maps reign supreme. Racket really wants structure to be known up-front when possible, which is one of the things that help it produce helpful error messages when things go wrong.

Here's how we handle keyboard input; we're displaying a map with one country highlighted, and key here tells us what the user pressed to categorize the highlighted country. If that key is in the categories hash then we put it into categorizations.

(define categories #hash((select . "eeeeff")
                         (#\1 . "993322")
                         (#\2 . "229911")
                         (#\3 . "ABCD31")
                         (#\4 . "91FF55")
                         (#\5 . "2439DF")))

(define (handle-key canvas categorizations key)
  (cond [(equal? #\backspace key) ; undo
         (swap! categorizations cdr)]
        [(member key (dict-keys categories)) ; categorize
         (swap! categorizations (curry cons key))]
        [(equal? #\space key) ; print state
         (display (unbox categorizations))])
  (send canvas refresh))

Finally once we have a list of categorizations, we need to apply it to the map document and display. We apply a fold reduction over the XML document struct and the list of country categorizations (plus 'select for the country that's selected to be categorized next) to get back a "modified" document struct where the proper elements have the style attributes applied for the given categorization, then we turn it into an image and hand it to draw-pict:

(define (update original-doc categorizations)
  (for/fold ([doc original-doc])
            ([category (cons 'select (unbox categorizations))]
             [n (in-range (length (unbox categorizations)) 0 -1)])
    (set-style doc n (style-for category))))

(define ((draw doc categorizations) _ context)
  (let* ([newdoc (update doc categorizations)]
         [xml (call-with-output-string (curry write-xml newdoc))])
    (draw-pict (call-with-input-string xml svg-port->pict) context 0 0)))

The problem is in that pesky set-style function. All it has to do is reach deep down into the document struct to find the nth path element (the one associated with a given country), and change its 'style attribute. It ought to be a simple task. Unfortunately this function ends up being anything but simple:

;; you don't need to understand this; just grasp how huge/awkward it is
(define (set-style doc n new-style)
  (let* ([root (document-element doc)]
         [g (list-ref (element-content root) 8)]
         [paths (element-content g)]
         [path (first (drop (filter element? paths) n))]
         [path-num (list-index (curry eq? path) paths)]
         [style-index (list-index (lambda (x) (eq? 'style (attribute-name x)))
                                  (element-attributes path))]
         [attr (list-ref (element-attributes path) style-index)]
         [new-attr (make-attribute (source-start attr)
                                   (source-stop attr)
                                   (attribute-name attr)
         [new-path (make-element (source-start path)
                                 (source-stop path)
                                 (element-name path)
                                 (list-set (element-attributes path)
                                           style-index new-attr)
                                 (element-content path))]
         [new-g (make-element (source-start g)
                              (source-stop g)
                              (element-name g)
                              (element-attributes g)
                              (list-set paths path-num new-path))]
         [root-contents (list-set (element-content root) 8 new-g)])
    (make-document (document-prolog doc)
                   (make-element (source-start root)
                                 (source-stop root)
                                 (element-name root)
                                 (element-attributes root)
                   (document-misc doc))))

The reason for this is that while structs are immutable, they don't support functional updates. Whenever you're working with immutable data structures, you want to be able to say "give me a new version of this data, but with field x replaced by the value of (f (lookup x))". Racket can do this with dictionaries but not with structs2. If you want a modified version you have to create a fresh one3.

first lensman

When I brought this up in the #racket channel on Freenode, I was helpfully pointed to the 3rd-party Lens library. Lenses are a general-purpose way of composing arbitrarily nested lookups and updates. Unfortunately at this time there's a flaw preventing them from working with xml structs, so it seemed I was out of luck.

But then I was pointed to X-expressions as an alternative to structs. The xml->xexpr function turns the structs into a deeply-nested list tree with symbols and strings in it. The tag is the first item in the list, followed by an associative list of attributes, then the element's children. While this gives you fewer up-front guarantees about the structure of the data, it does work around the lens issue.

For this to work, we need to compose a new lens based on the "path" we want to use to drill down into the nth country and its style attribute. The lens-compose function lets us do that. Note that the order here might be backwards from what you'd expect; it works deepest-first (the way compose works for functions). Also note that defining one lens gives us the ability to both get nested values (with lens-view) and update them.

(define (style-lens n)
  (lens-compose (dict-ref-lens 'style)
                (list-ref-lens (add1 (* n 2)))
                (list-ref-lens 10)))

Our <path> XML elements are under the 10th item of the root xexpr, (hence the list-ref-lens with 10) and they are interspersed with whitespace, so we have to double n to find the <path> we want. The second-lens call gets us to that element's attribute alist, and dict-ref-lens lets us zoom in on the 'style key out of that alist.

Once we have our lens, it's just a matter of replacing set-style with a call to lens-set in our update function we had above, and then we're off:

(define (update doc categorizations)
  (for/fold ([d doc])
            ([category (cons 'select (unbox categorizations))]
             [n (in-range (length (unbox categorizations)) 0 -1)])
    (lens-set (style-lens n) d (list (style-for category)))))
second stage lensman

Often times the trade-off between freeform maps/hashes vs structured data feels like one of convenience vs long-term maintainability. While it's unfortunate that they can't be used with the xml structs4, lenses provide a way to get the best of both worlds, at least in some situations.

The final version of the code clocks in at 51 lines and is is available on GitLab.

[1] The LÖVE framework is the closest thing, but it doesn't have the same support for images as a first-class data type that works in the repl.

[2] If you're defining your own structs, you can make them implement the dictionary interface, but with the xml library we have to use the struct definitions provided us.

[3] Technically you can use the struct-copy function, but it's not that much better. The field names must be provided at compile-time, and it's no more efficient as it copies the entire contents instead of sharing internal structure. And it still doesn't have an API that allows you to express the new value as a function of the old value.

[4] Lenses work with most regular structs as long as they are transparent and don't use subtyping. Subtyping and opaque structs are generally considered bad form in modern Racket, but you do find older libraries that use them from time to time.

« older | 2018-01-12T19:53:51Z