in which five different paths lead to methods

I recently made a change in a codebase I've been working on which illustrated an interesting trade-off around modeling in software. The project was Taverner, an IRC server written in Fennel.

In particular it had to do with the way that channels were modeled. A channel is basically a "chat room"; it's just something that users can join which lets you send messages to anyone else who's also in the channel. In many languages you would define a Channel class which has a bunch of methods like join, part, send, etc. Fennel doesn't have classes, but there are a few different alternatives available1.

Take 1: Module-based methods

The most obvious approach is to have a channel module which just exports the functions that would have been methods along with a constructor function:

(fn send [{: buffer} nick ...]
  (table.insert buffer [nick (table.concat [...] " ")]))

(fn join [{: members : name &as ch} nick conn]
  (tset members nick conn)
  (send ch "" (.. ":" nick) :JOIN name))

(fn part [{: members : name &as ch} nick ?cmd]
  (tset members nick nil)
  (send ch nil (.. ":" nick) (or ?cmd :PART) name)
  (when (empty? ch)
    (ch.remove)))

(fn flush [{: members : buffer}]
  (each [nick conn (pairs members)]
    (each [_ [sender msg] (ipairs buffer)]
      (when (not= nick sender)
        (conn:send (.. msg "\r\n")))))
  (while (next buffer)
    (table.remove buffer)))

(fn empty? [{: members}] (= nil (next members)))
(fn member-names [{: members}] (icollect [k (pairs members)] k))
(fn member? [{: members} nick] (not= nil (. members nick)))

(fn make-channel [name state]
  {: name :members {} :buffer []
   :remove #(tset state.channels name nil)})

{: send : join : part : flush
 : empty? : member-names : member?
 : make-channel}

There's nothing particularly clever going on here, which I believe is a big strength. Everything is obvious. The make-channel function acts as a constructor, while every other function in the module takes a channel as its first argument, so you can write code like (channel.join ch client.nick client.conn) where ch is a channel table you got from calling the constructor.

The biggest downside here is that it lacks encapsulation. All the data for a channel is exposed in the table that gets passed around to other modules, and it isn't clear which fields are safe to use and which are implementation details which might change later on. In a small codebase maybe this is no problem, but as it grows and changes over time, it will make it more difficult to know what effect a given change might have in a different part of the codebase.

Take 2: Closure-based methods

There's an old saying that "closures are a poor man's objects" and "objects are a poor man's closures". Keeping internal data private by exporting functions whose scope closes over the internal data is one of the oldest tricks in the book:

(fn make-channel [name server-state]
  (let [members {}
        buffer []]

    (fn flush []
      (each [nick conn (pairs members)]
        (each [_ [sender msg] (ipairs buffer)]
          (when (not= nick sender)
            (conn:send (.. msg "\r\n")))))
      (while (next buffer)
        (table.remove buffer)))

    (fn send [nick ...]
      (table.insert buffer [nick (table.concat [...] " ")]))

    (fn join [nick conn]
      (tset members nick conn)
      (send "" (.. ":" nick) :JOIN name))

    (fn part [nick ?cmd]
      (tset members nick nil)
      (send nick nick (or ?cmd :PART) name)
      (when (= nil (next members)) ; last one out off the lights
        (tset server-state.channels name nil)))

    {: name : send : join : part : flush
     :empty? #(= nil (next members))
     :member-names #(icollect [k (pairs members)] k)
     :member? #(not= nil (. members $))}))

{: make-channel}

Now the module only exports one thing: make-channel function, which returns a table that you can think of as if it were an instance of a Channel class. It has functions inside the table which act like methods would. This makes the interface of the channel very clear and well-defined. If you want to do anything with a channel, you have to use one of the functions in the channel table. You can change anything about the internals, and as long as you update everything in the make-channel function, you know you won't break something elsewhere. In a word, it's encapsulated.

But there is one very serious downside to this style2: reloading the code would only affect new channels; existing ones would keep the same old code as before since only the module gets the new functions3. In a normal program I might put up with this, even though I really love reloading. But in a long-running IRC server, it's really not a good idea! Getting everyone to leave a channel so you can recreate a version of it which has the new version of the code is extremely disruptive. I absolutely need the ability to fix bugs and add features while the server is running without disrupting the users, and that meant as nice as this code feels, it's not going to cut it. How can we get both encapsulation and reloadability?

Take 3: Metatable-based methods

Lua tables have one feature which gives them an extraordinary amount of flexibility: metatables. There's a lot you can do with metatables, but for the purposes of this code the most important thing is that you can set an __index method on them which will let you set a fallback for when you try to look up a field which does not exist in the table. This lets us keep doing method lookups using the module table directly (allowing reloads) but also keeping the data itself out of the table which is exposed as the public interface:

(fn send [{: buffer} nick ...]
  (table.insert buffer [nick (table.concat [...] " ")]))

(fn join [{: members : name &as ch} nick conn]
  (tset members nick conn)
  (send ch "" (.. ":" nick) :JOIN name))

;; ... all the methods are the same as the first version

(fn make-channel [name state]
  (let [public {: name}
        channel-state {:members {} :buffer []
                       :remove #(tset state.channels name nil)}]
    (setmetatable public {:__index channel-state})
    (setmetatable channel-state {:__index (require :channel)})
    public))

{: send : join : part : flush
 : empty? : member-names : member?
 : make-channel}

This looks nice! It's very clear what the public fields are (only the channel's name) and the private fields are attached using the first metatable. But if we put the method functions directly into the channel-state table during the constructor we would have the same reload problem as the previous version where the module containing the methods would change after we already pulled the functions out of it, and we wouldn't see the new values. Because of that, we use the module itself as the metatable of the metatable.

There's one big downside to this compared to the previous version: it lacks transparency:

>> (local channel (require :make-channel))
>> (local ch (make-channel "#mychannel" state []))
>> ch
{:name "#mychannel"} ; wait, where are the methods?
>> ch.join
#<function: 0x55c7d468f0> ; but it's found if you ask for it directly
>> (ch:join client.nick client.conn) ; and this works fine!

The functions are found (via __index) when you go look them up, but they do not show up otherwise. This is a common problem with using metatables; they can lead to surprising, unpredictable behavior. While there is a workaround to this (the pairs metamethod) it's error-prone and does not work on all versions of the Lua runtime. Personally I try to avoid metatables unless the downsides of the alternatives are too great. But what other options are there?

Take 4: Class-based methods

Just because Lua and Fennel don't have classes as part of the language doesn't mean you can't use classes; metatables give you the flexibility to construct your own class system if that's what you really want. The middleclass library is one of the most popular implementations of this for Lua, which means of course that we can use it from Fennel too:

(local class (require :middleclass))

(local Channel (class :Channel))

(fn Channel.send [{: buffer} nick ...]
  (table.insert buffer [nick (table.concat [...] " ")]))

(fn Channel.join [{: members : name &as ch} nick conn]
  (tset members nick conn)
  (ch:send "" (.. ":" nick) :JOIN name))

;; the methods are the same as before

(fn Channel.initialize [self name state]
  (set self.name name)
  (set self.members {})
  (set self.buffer [])
  (set self.remove #(tset state.channels name nil)))

Channel

If you're used to Java or Ruby or another class-based language, this may look comfortingly familiar to you. You define a class, and you give it methods. You invoke them using (ch:join client.nick client.conn) notation. But how does it fare on the encapsulation and reloadability fronts?

>> (Channel:new "mychannel" {})
{:buffer {}
 :class {:__declaredMethods {:__tostring #<function: 0x55c7c28450>
                             :empty? #<function: 0x55c7b56660>
                             :flush #<function: 0x55c7bd4aa0>
                             :initialize #<function: 0x55c7b711c0>
                             :isInstanceOf #<function: 0x55c7d45000>
                             :join #<function: 0x55c7d36da0>
                             :member-names #<function: 0x55c7c06720>
                             :member? #<function: 0x55c7f06a80>
                             :part #<function: 0x55c7b710b0>
                             :send #<function: 0x55c7f03e40>}
         :__instanceDict @3{:__index @3{...}
                            :__tostring #<function: 0x55c7c28450>
                            :empty? #<function: 0x55c7b56660>
                            :flush #<function: 0x55c7bd4aa0>
                            :initialize #<function: 0x55c7b711c0>
                            :isInstanceOf #<function: 0x55c7d45000>
                            :join #<function: 0x55c7d36da0>
                            :member-names #<function: 0x55c7c06720>
                            :member? #<function: 0x55c7f06a80>
                            :part #<function: 0x55c7b710b0>
                            :send #<function: 0x55c7f03e40>}
         :name "Channel"
         :static {:allocate #<function: 0x55c7c0a930>
                  :include #<function: 0x55c7b8a5d0>
                  :isSubclassOf #<function: 0x55c7d44f70>
                  :new #<function: 0x55c7d44b40>
                  :subclass #<function: 0x55c7b8a590>
                  :subclassed #<function: 0x55c7af5600>}
         :subclasses {}}
 :members {}
 :name "mychannel"
 :remove #<function: 0x55c7d2e570>}

Yikes! That's a lot of ... stuff. The methods are just dumped straight into a nested table inside the instance itself (twice, for some reason?) and the fields are not encapsulated away at all. The middleclass wiki has some suggestions for how to keep data private but they are quite inconvenient compared to simply using closures. On top of that, the printed representation of the instance is very cluttered and messy. Overall it's not clear that we gain much from this approach beyond a sense of familiarity for people who come from certain other languages.

Take 5: Reloadable, encapsulated methods

So far the closure version from take 2 has appealed to me the most; the tight encapsulation there just feels so tidy. What if we could go back to that but do something about the reloading? Well, there is actually one other concern we haven't touched on with reloading yet, and it leads us to our solution.

When you reload, you're bringing a new version of a module into play in a system that's already running. When your program is a server, that means that you've got "in-flight" connections with active users of your program. What happens when you add a function that expects some new fields that didn't exist when your users initially connected? For example, let's say we add a ban list to the channels. This data wasn't included in the existing channels, but now you need to check it when you join:

(fn join [{: members : banned &as ch} nick conn]
  (assert (not (lume.find (or banned []) nick)) "Cannot join channel; banned.")
  (tset members nick conn)
  (send ch "" (.. ":" nick) :JOIN name))

You could code defensively and make sure that every single reference to the field is wrapped in an or, but that's a drag. You're sure to miss one. And do you really want that check sticking around in your codebase forever? What we really want here is something like Erlang's upgrade process4 for when it hot loads a new module. Here we provide an upgrade function which takes the existing table and replaces its contents with the closures from the new version:

(fn make-channel [name server-state ?members ?buffer ?banned]
  (let [members (or ?members {})
        banned (or ?banned [])
        buffer (or ?buffer [])]

    (fn send [nick ...]
      (table.insert buffer [nick (table.concat [...] " ")]))

    (fn join [nick conn]
      (assert (not (lume.find banned nick))
              "Cannot join channel; banned.")
      (tset members nick conn)
      (send "" (.. ":" nick) :JOIN name))

    ;; ... the methods are all the same as the closure-based version

    (fn upgrade [self new-make]
      (each [k v (pairs (new-make name server-state members buffer banned))]
        (tset self k v)))

    {: name : send : join : part : flush
     : empty? : member-names : member?
     : upgrade}))

{: make-channel}

We've extended the constructor to accept all the state fields as optional arguments, (the ones beginning with a question mark) allowing you to build a new version of an existing channel by passing the existing state on in. The upgrade function does exactly this with the private data it's closed over. We'll need to modify the server's reload command to call upgrade on every one of the channels with the new constructor as its second argument. The upgrade function calls the new constructor to get an updated version of the channel, then it takes all these new functions from it and drops them into the existing channel, seamlessly upgrading it in-place without dropping any connections. Any currently-running code which had access to the old channel now can see all the new methods from the new constructor. It's the best of both worlds, and it didn't require sacrificing encapsulation. Best of all only took a few lines of code to accomplish.

But I do want to stress that each of these five approaches are all just trade-offs, and none of them are universally wrong. If you're not writing a server that keeps live connections open, it might not make sense to care about hot-loading upgrades. If you're writing a program that launches, prints its output, and immediately exits, you might not care about reloading, and the second approach is probably fine. If you've got a high tolerance for weird/unexpected behavior, maybe metatables are fine. If serialization is important to you, the first one might come out ahead. Even though the class-based approach is my least favorite, it could suit some projects if the people working on the codebase have a background in object-oriented languages and aren't comfortable changing their style. Context is everything.

All in all I have to say that writing an IRC server has been a lot of fun and not as difficult as I expected it to be. At this point my code is only 366 lines but it supports channels, private messages, channel operators, bans, kicks, listing, and more. Writing an IRC bot is of course easier (a simple one is under a hundred lines) but this could be good if you're looking for a little more of a challenge when picking up a new language.


[1] If you don't know Fennel, you can probably still follow along if you understand scope and closures; the main things to know are that fn declares a function, the curly brackets in the argument list are used to pull fields out of a table argument, curly brackets outside a the argument list are used to make tables, :colon-style is string shorthand, and #(+ 2 $) is shorthand for a function that adds 2 to its argument.

[2] Another smaller problem with this approach is that closures cannot be serialized, so if you had to save off a channel, you can't just take the channel table and write it out to disk. This isn't an issue in Taverner, but it could be for other things which could be modeled this way.

[3] This is because in the Lua runtime used by Fennel, modules are the unit of reloading. Reloading a module involves taking the module table, emptying it out, re-executing the module's file, and pouring the resulting fields back into the original table, meaning that any existing code which had access to the module table can see the new fields. I wrote about this in more detail in a previous blog post.

[4] Of course, Erlang's version is much more sophisticated; it allows the old and new versions of the module to both exist simultaneously, moving process over when it detects an opportune time to call the upgrade function. Since we don't have to worry about concurrency in Lua, it's much simpler.

« older | 2022-02-08 08:52:53