Updates on the ruby-lang.org overhaul

As you may know, Ruby’s official website sucks. Being one of its current maintainers, I couldn’t agree more. Man! just compare with Haskell or Perl! I do am responsible for it in some extent. Facts are: I made several attempts at triggering an update process in the past two years; it mostly failed due to lack of motivation and feedback. But now is time for a change!

Not long after Peter Cooper published his post on Ruby Inside, the so-called VIT core team (I mean, the active maintainers…) stepped in. The current platform is a Radiant application used concurrently by several small teams of translators. Content synchronization is cumbersome and the process is doomed by what I’d call the “no-one-is-in-charge” syndrome. That is, everyone would be happy to enhance things a little, but nobody feels like he is responsible or authorized, so nothing ever happens. Content decays, translations die and alternate resources arise, making it difficult to remember where to find what.

So we have numerous issues to solve. What do we really need?

  • an efficient translation process, with cheap sync built-in
  • a platform allowing easy contributions (especially content updates)
  • a soft leadership to ensure everything works smoothly

Among several ideas, the “git workflow with a static content generator” seemed to put everyone on the same wavelength (there is a “less is more” feel to it), so that’s where we’re heading to right now. Using git offers the i18n, cheap sync features, a fully-fledged contribution tool, whereas using a static content generator makes contributing a no-brainer. It does introduce some new challenges, like the deploy process, but we’re confident about designing a good solution. There are many options when it comes to static content generation and git; we had to choose.

We’re currently tweaking a Jekyll instance as the static generator, and leveraging github as the contribution platform: both tools seems to be widely accepted among the Ruby community. Should it fail, we will be able to switch to another backend, but the duo is doing the job pretty well so far! Obviously, some part of the website could be dynamic, but we basically don’t need it at the moment, so let’s KISS!

Our goal is to release a brand-new ruby-lang.org, updating or restructuring any part of it which should be. And when I mean “our goal”, I really mean OUR’s. If you’re part of the community, dive in and start contributing. If you don’t feel like a member of the great Ruby community, then this could be a great way to kick it off! How could you help?

  • join the debates about the content update, the website structure, missing features…
  • start contributing by making a pull request, for instance with a news or a translation update
  • talk to your fellow rubyists about the project

It’s a community effort which purpose is to switch from a quite closed, inefficient process to an open, streamlined one. This doesn’t mean everything is to be accepted without inspection and thinking, though. One important aspect of this “Ruby Refresh” is to ensure we will build something solid, hopefully making rubyists and matz proud of their main frontend.

There is a live preview at http://ruby.github.com/ruby-lang.org. Caution, this is work in progress! We don’t know about the design yet: should we stick to the legacy one or build something new? I’m currently using a custom version of Brandon Mathis’ Octopress theme but help form a professional designer (I already asked Brandon, wait & see!) As for the content and the general structure, there is an ongoing discussion and I invite you to participate. Feedback appreciated.

Be aware I’m currently working on merging postmodern’s content into the project. He built a crawler able to fetch all existing content and process them in markdown templates. This will be used as the basis for a content overhaul, so you might don’t want to hurry along pushing new translations, for things will broke next week :)

Which concurrency model to pick?

clojure, concurrency, erlang

I’ve been working with Erlang lately. While implementing a simple gen_event in U.C.Engine, something struck me as odd. Despite not being a blocker, I found it weird that the language enforces each actor to bind to the others in such an explicit way (message passing). Every actor must know about its mates before anything can happen, otherwise they cannot notify or reply or anything. This seems to lead to either more coupling than expected (“it didn’t look that complicated on the diagrams”) or to adding some kind of proxy layers to abstract things out (supervisors and wrappers).

Then I looked at Clojure on my spare time, and stumbled upon this:

It [the Actor Model] reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can’t observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.

That and the heavy payloads. But I don’t think it’s a real issue, for Erlang’s general model is quite consistent: it’s been designed for multilateral communication tunneling and its concurrency model assumes a distributed architecture, with the possibility of transparent distributions and all the related issues (point made clear in Joe Armstrong’s book available @PragProg). I never worked on a project of this kind, but I can see where Erlang comes from.

All in all, picking up a new language for achieving concurrency, or for a task in general, is all about understanding the domain and issues it was created to solve. As far concurrency is concerned, it’s a real jungle outhere. Should I pick a lock-based synch, a STM, an AM or even consider a FBP? (+1 XP if none acronym was unknown to you). It all depends on your problems. Don’t go too fast on picking up the latest, hippest language (node.js is fine, but it may not be what you need despite its high watching score on github; Scala’s great but Foursquare ain’t tell you how to run your business; …). Take some time to grasp some insights about the different concurrency models and your very problem. On the higher-level, there may be only two or three “problems genres” (as in distributed vs local), still scales, load, costs must be taken into account.

It’s fun to experiment with all of them though!

Hooks-as-a-service

ruby

One little kink some people get to notice while coding in Ruby is the #new/#initialize weirdness. In order to create an object, one calls a class’ #new method, but its implementation is (well, seems to be) available under #initialize. How come such a flaw has made its way into the language design? Well, actually, there is no flaw in there, just a lack of knowledge. I’d like to talk a little bit about hooks.

Ruby has several “hooks”, which are methods automatically fired when a particular event occurs at runtime. It’s a method call that’s explicitly designated to provide future extensibility, triggering a void callback by default. For instance, #included is a hook triggered when a module is included into a class (mixin). The whole point is that it does nothing by default, but you may override its behaviour to perform arbitrary actions, like a simple informative output:

1
2
3
4
5
6
7
8
9
module Foo
def self.included(base)
puts "I (#{self}) have been included into #{base}!"
end
end
class Bar
include Foo # will trigger the #included hook
end

Notice we never explicitely called Foo.included: we implemented our own version of #included and it gets fired. The #included method is a hook, and the particular implementation we provide is a callback associated to it, for a hook is kind of a placeholder, gently lurking around waiting for specific orders. Some other hooks are #extended, #inherited, #method_added, #method_removed, the (in)famous #method_missing, and many others. Some are defined at class level, some as instance methods. Getting to know them and use them may allow you to leverage a little bit more of the Ruby’s dynamic aspects. One hook remains mostly unknown though, and still it may be the most used method by rubyists: our beloved #initialize.

What’s going on when calling Bar.new?

  • some space is allocated in memory for the new object, using Bar.allocate which returns a vanilla Bar instance;
  • the .initialize message is sent to this vanilla instance, resulting in the matching #initialize private method to be called.

If no #initialize method is found in your class implementation, everything still works smoothly: we get a new Bar instance, for Ruby used the default, empty #initialize implementation. Ok, hooks are great. But it’s more important than that. Because:

Hooks can make you a better lover

In Ruby, hooks are nothing special, just normal methods; triggering associated callbacks is done using the standard message passing style. This means one may bypass #initialize altogether by redefining #new. You may also rely on super to alter the object at creation time. The existence of the #initialize hook is a bless in that it allows us not to accidentally overcome the core implementation behaviour (memory allocation) while making it easy to change the workflow extensively.

This applies for third-party libraries as well. By implementing the hooks provided by a (kind) library, we may tweak our program to suit our implementation-specific requirements the way the library author intented us to do, avoiding monkey-patching. This pattern is so neat and simple, I’d advice you not to stick to Ruby hooks but to become a hook-provider yourself. Spreading hooks over your whole public APIs, you’d give other developpers some simple niceties:

  • prevents them from monkey-patching when it is not needed (my experience: most of the time if placed at key locations, see below)
  • make it easier to prepare ground for Inversion of Control patterns
  • clarify the program workflow when properly documented
  • have you stop & think to actually design your code for its users, not you-as-a-coder

But providing a bunch of hooks is not beneficial only for the others. By thinking about which hooks would be useful (and which would not), and where to place them, you may well cast the light on implementation flaws, find better design ideas or simplify the workflow. Writing the documentation with your well-defined hooks in mind is a great way to assert the resulting doc is targeted at library users, not hackers/fellow coders/nobody.

So the core issue here is: what is a good hook?

Hooks duties

The first important thing to pay attention is separating the core, inviolable behaviour of your codebase from the more “foggy” parts. When talking with a database, atomic operations like read/write/delete are not to be messed with, whereas validations, filtering, consolidation… may be less critical. Maybe it could be allowed to bypass them, redefine them or extend them. Depending on the needs and the extent of what is to be allowed, a public domain will be designed so the lib users may not impeed on the core.

Hooks should not be too many, but you should have enough of them to cover your domain. Place them at hot-spots. These are logical junctures in the code flow: maybe around core operations, in initializers, gate keepers and sweeper methods, any place where it would make sense to check specific requirements at specific time to provide flexibility.

As the library author, do not trust hooks’ callbacks. Consider them to be dumb, unicellular code units; they should not introduce coupling. This is obviously hard to enforce by design. One way to do it is to register callbacks as closures, then run them under the control of exceptions. This requires a rigorous coding style, and may not prove efficient or safe enough. Another way is to provide a good documentation and to let users do whatever they want, granted they have been warned and fed with good usage examples.

It’s easy!

All of this is by no means new stuff, just Plain Old Good Sense. It is not specific to Ruby. It is not a design pattern per se. Yet it is f*ckin awesome! But you don’t get to see much libraries providing hooks nowadays though, especially in the Ruby ecosystems where many standard practices seems to have been forgotten or never took root. But it’s never too late!

To help you get started, my friend Nick wrote a little gem to formalize things a little: hooks, you may want to give it a try. There are other libs as well, but most rely on what I’d consider overhead metaprogramming magic to “hook around” method calls. Sticking to a more declarative style is enough, most of the time.

How to Host a Jekyll App on Github Pages With Plugins

gh-pages, github, jekyll

Github Pages runs your gh-pages content through jekyll --pygments --safe. The safe option prevents from loading any Ruby script, thus disabling all your lovely plugins. There is no way to bypass this option, but here’s a workaround.

The idea is to launch the jekyll command locally (allowing for the use of plugins) and to push the static result to Github.

Say you have the following Jekyll project, on branch master:

_config.yml _plugins  _posts index.html

Running jekyll should generate your static content into a public folder (default configuration).

Then, Github Pages’ help instructions require you to create a root branch (kind of a new, independent “master”-like branch within the same git repository):

$ cd /path/to/fancypants
$ git symbolic-ref HEAD refs/heads/gh-pages
$ rm .git/index

There is the important step. Remove everything except public/:

$ git clean -fdx -e public

You may add an empty .nojekyll to have Github Pages not triggering its Jekyll processing when data is pushed, saving some time and CPU:

$ touch .nojekyll

Then you’re done:

$ git add .
$ git ci -m "my website (first commit)"
$ git push origin gh-pages

Easy as pie!

By the way, this simple scheme result in public/ being your base url (me.github.com/myproject/public), so this may not fit your requirements if you’re not using a custom domain name. Fear not, because there is another way. I’ll just drop the script. On the master branch, Save it under, say, deploy.sh:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/sh
git checkout master
jekyll
git add -A .
git commit -m "static content update"
git checkout gh-pages
rm -rf *
for file in `git ls-tree master public -r --name-only`; do
`git checkout master $file` && cp -r public/* . && rm -rf public && touch .nojekyll
done
git add -A .
git ci -m "content update"
git push origin gh-pages
git checkout master

Then, use it:

$ chmod +x deploy.sh
$ ./deploy.sh

But this is not over! Most of the time, you’ll have different options for publishing. For instance, on Github pages, you have to set the root option to the name of your project: if you’re publishing to me.github.com/myproject, then root (in _config.yml) must be set to /myproject (otherwise all links will break). You’d also use / everywhere inside your liquid templates. This also means you’d better use a liquid filter to automate this work, too (ask me if you have no idea on how to do that). To handle this, you can branch from master, edit your conf and commit in this “publishing-only” branch, and the overall workflow is slightly different:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/sh
# assuming a git checkout -b publish was performed from master,
# with some config edits commited on publish
git branch -d publish
git checkout -b publish
git checkout -- public
git rebase master
jekyll
git add -A .
git commit -m "static content update"
git checkout gh-pages
rm -rf *
for file in `git ls-tree publish public -r --name-only`; do
`git checkout publish $file` && cp -r public/* . && rm -rf public && touch .nojekyll;
done
git add -A .
git ci -m "content update"
git push origin gh-pages
git checkout master

Using git checkout -- public allows to easily discard static content changes when switching branches.

Instead of a raw bash script, I use a Thor task to deploy: thor deploy:github. It looks nice eventually!

Logg your messages, dispatch events

gem, github, ruby

Unsatisfied by Rails’ internal logger and available alternatives (too light, too heavy), I created my own little logging gem. It’s called Logg and strives for both simplicity and efficiency. Actually it can be used as anything but a logger, but logging is what I’ve been using it from the beginning, so I’ll just stick to “Logg”.

My main goal here was to create a versatile “method builder”. When it comes to logging, I may not need LOG_LEVELS flags, I may not need complex filters, I may not need anything but methods acting as little logging helpers. Whenever I need a specific feature, I’ll just add it on top of that (maybe as a plugin? a simple mixin? using a gem?) What I do need all the time however is:

  • Simple, dynamic formatting. I thus focused on this aspect, and leveraged Tilt features.
  • Easy “dispatching”. I’m not talking about a protocol or anything, just the ability to yell something to whoever wants to know.

The first point is built-in on Logg: it supports templating. The README gives several examples. To address the second point, I’d recommend using a third-party library, such as AS::Notifications or Onfire. Logg just makes it easy to plug things together thanks to its block-everywhere interface and cheap method definition.

So here it is: a simple logging/dispatching/whatever library. Tell me if you happen to use it!