PLOS recently launched ALM Reports, implemented as a Ruby on Rails (RoR) app. I’m one of two engineers who implemented this site. I’m not going to say too much about the functional aspects of the app, other than that it’s a really cool way to compare groups of PLOS research articles and that you should check it out. The point of this blog post is to talk about the technical architecture of the implementation, and what I think worked and didn’t work well with RoR. Also, I should add that this was my first Rails (and Ruby) app ever, although I luckily had the help of another talented developer with previous Rails experience.
The Rails Way
The Rails getting started guide states very clearly that “Rails is opinionated software.” They are not kidding. Well, I’m an opinionated developer, hence this blog post.
“Convention over configuration” as a framework paradigm has been gaining in popularity for many years now. Maven, which we use as our Java build framework at PLOS, is another product that heavily uses CoC. The reason this idea has become so popular is simple–no developer wants to go through the headache of customizing dozens of configuration files just to get to “first light” of their app, as it seemed like you had to do with many of the earlier Java web application frameworks.
RoR takes CoC a step further, the goal being to make it easy to do simple things and possible to do hard ones. The framework offers many useful features out of the box, from XSRF and XSS protection to a schema migration framework, and everything in between. But the catch is that these features expect you to do things “the Rails way.” For example, form validation is primarily done at the model layer, instead of at the controller layer as in other MVC-based frameworks. But what if you don’t have a model instance associated with a form? (More on why that might happen below.) Turns out you are pretty much out of luck. You can try to shimmy a bogus model in that’s only around to do validation, and I tried that, but it felt like such a hack that I ended up doing all the validation in the controller by hand for this particular form.
Another interesting aspect of our app that Rails wasn’t particularly happy about: about half of it is backed by Active Record models, while the other half is not. In the first part of the app flow, the user assembles a collection of PLOS articles by performing searches. You can think of it a little bit like filling your shopping cart while browsing amazon.com or any other ecommerce site. We made the decision that we wanted to store the article IDs in the user’s session (backed by memcache), rather than in the database, to keep things simple and avoid having to garbage-collect abandoned sessions. Furthermore, our representation of a PLOS article, and the information about it that we want to display in the app, never comes from our app’s database. Instead, we query our solr instance for this information (with caching as necessary for performance). These two factors meant that we simply couldn’t use Active Record for the first part of the app flow, and instead arrived at a “heavy controller” architecture that Rails advocates would consider a code smell.
There are a couple of frameworks that claim to back Active Record entities with solr entities. We didn’t try them since, frankly, we were more interested in getting our site out the door than playing around with some third-party code that may or may not have worked for us.
Once a user generates a report (think of this as “proceeding to checkout” with your shopping cart), we save the report to the DB, so we then have Active Record entities to back our app. But even then, we seemed to be fighting against Rails at times. Rails really, really wants each page of an application to deal with a single database row (or all the rows for a particular table, with paging if necessary). But what if you want to render a form with more than one record, but not all of them (like here)? Turns out it’s a pain, and again I avoided Active Record to get the feature out the door sooner.
Simplicity vs. Naiveté
Because of these and other examples, I am starting to question the Rails dogma that web application developers should always do things “the Rails way.” My retort would be: are any modern websites simple enough to be done purely the Rails way? The example app that ships with Rails is a simple blogging site, where indeed, each form deals with either exactly one row in a table, or all of them. But who nowadays finds the need to implement a blog from scratch?
To paraphrase Einstein, “make things as simple as possible, but not simpler.” I would argue that Rails is a bit too simple. Or rather, it assumes that the world is simpler than it really is.
Dynamic Typing and Technical Debt
I’d like to make one other point, which is not Rails or even Ruby specific, but has to do with type systems in general, and statically-typed languages vs. dynamically-typed ones. I am not religious about my programming languages, and I don’t really have a favorite language. They all have their niches (except perhaps for this one), and I hope to continue to learn new languages over the course of my career. But I do have strong opinions about using dynamically-typed languages (Ruby, Python, php, et al.) for large projects. In short, I think this usually leads to a maintenance nightmare. And in my mind, ALM Reports is near this upper limit (right now it is about 3k lines of Ruby and will undoubtedly grow.)
Statically-typed languages like Java, C++, C#, etc. give you the benefit of knowing the type of a variable while you are editing the code. (Indeed, better names might be compiled-typed and runtime-typed.) This gives you the obvious benefit of finding certain bugs at compile-time instead of test- or runtime. But another, less obvious benefit is that the extra type information serves as a layer of commenting in the source code. It’s much easier for a new developer to become familiar with a large codebase in a statically-typed language for this reason–even if the original developers were total slackers, and included no comments, the types serve as a basic roadmap.
It’s also undeniable that you can get a prototype of a new app up much more quickly when using a dynamically-typed language. It would take a few more blog posts to explain why this is so, but suffice it to say that dynamically-typed languages allow you to do more with fewer lines of code. So it’s common for startups to implement their site with Ruby or Python, in the interests of speed, but doing so incurs undeniable technical debt. Comments become more important with the absence of explicit types, test coverage is more essential, and so forth. Ideally, I think that when the codebase for a single app gets above 10 KLOC or so, it needs to be reimplemented in a statically-typed language. Maybe we’ll eventually do this with ALM Reports. I’ve worked at companies where this didn’t happen, and they wound up with hundreds of thousands of lines of dynamically-typed code. I don’t want to ever work at such a place again! I know this is a controversial opinion, and the prevailing thinking these days is that frameworks like Rails and Django, built atop dynamically-typed languages, are the future for all sites, large or small.
Of course, there is also a performance argument for reimplementing a high-traffic site in a statically-typed language, but I’m not going to get into that here.
To sum up, I was impressed with Rails overall, and the plethora of things you get in the core platform that would be considered superfluous extras in other frameworks. But I am not going to blindly just do everything The Rails Way and give up my technical judgement as a developer.
Whether you agree with me or not I hope you’ve found this interesting, and please leave a comment if you have the time: am I on to something, or just a clueless Rails noob?