Dan Chak’s Blog

Speeding up ActiveRecord with Hashes, Take 2

Posted in plugins, rails by Dan Chak on February 9, 2008

A few weeks ago, I posted about the release of my gem, hash_extension, which makes Ruby hashes act a little more like plain old objects. That’s a good thing, because ActiveRecord requests that return hashes instead of full-blown ActiveRecord objects are about 50% faster. 50% is not a performance tweak — it’s basically the best thing that’s ever happened to ActiveRecord.

In version 0.0.1, the only way to get back hashes from ActiveRecord for use with hash_extension was to use select_all:

Foo.connection.select_all("select * from foo")

In the docs for hash_extension I put out a call for someone to extend ActiveRecord itself to return hashes from more natural, ActiveRecord-esque methods. Elliot Laster answered that call, and now, in version 0.0.2, find_as_hashes and find_by_sql_as_hashes are now available:

Foo.find_as_hashes(:all)
Foo.find_as_hashes(:first)
Foo.find_as_hashes(:all, :conditions => "bar = 'baz'")
Foo.find_by_sql_as_hashes("complex sql goes here")

To learn more about the gem and to download, go here.

Tagged with: , , ,

ActiveRecord is slow. Hashes are fast.

Posted in plugins, rails, scaling to enterprise by Dan Chak on January 22, 2008

Introducing the hash_extension gem…

…the first gem associated with my upcoming book, Scaling to Enterprise with Ruby on Rails.

Are you tired of hearing that Ruby is slow? Well, Ruby is slow, in many ways. The trick to a fast site is to not use the parts of Ruby and Rails that are slow in places where performance counts. For example, loading ActiveRecord objects happens to be extremely slow. Something simple like the following statement may take very little database time, but then will spin through the slow process of ActiveRecord object creation.

MyObject.find(:all)

On my dual core macbook pro, on a table with 40k records, this takes 7 seconds of Ruby time. Conversely, the following query, which returns not an array of ActiveRecord (MyObject) objects, but an array of hashes with all the same properties, takes just over 3 seconds:

MyObject.connection.select_all("select * from my_objects")

So if you don’t need the associations or methods that come with the full ActiveRecord version of your data, you can save a lot of Ruby cycles by using hashes instead — over 50% of the overhead. The problem is that the two statements above are not drop-in replacements for each other. Objects follow dot notation (f.attr) whereas hashes follow, well, hash notation (f['attr']). So to switch to the hash result, you would have to update all your code to follow hash notation instead of dot notation, and that’s a pain (not to mention ugly).

hash_extension to the rescue! This gem allows you to access hashes just as you would regular objects. With this gem, the following is possible:

>> hash = Hash.new
>> hash.foo = 'bar'
>> hash.foo
=> 'bar'

Now the two statements above are interchangeable. If you have slow pages in need of tuning, and you’re loading lots of object for display purposes only (e.g., you don’t actually need the weight associated with the full objects), this is an easy way to eek out some more performance.You can download the gem here and read more about how to set it up and use it here.

Tagged with: , , ,

Geographic Distance in Postgres

Posted in scaling to enterprise by Dan Chak on December 25, 2007

To help proliferate the idea that the database is very much a part of your Rails app, I’m going to be posting useful tidbits of PL/pgSQL that can help you application go and go fast. Feel free to send in your own PL/pgSQL snippets and I’ll post them here as well.

I find that when folks say Ruby/Rails is slow, they’re doing things in Ruby/Rails application code that they shouldn’t be. The following PL/pgSQL function will compute an approximate distance between two sets of latitude/longitude pairs.

create or replace function miles_between_lat_long(
  lat1 numeric, long1 numeric, lat2 numeric, long2 numeric
) returns numeric
language 'plpgsql' as $$
declare
  x numeric = 69.1 * (lat2 - lat1);
  y numeric = 69.1 * (long2 - long1) * cos(lat1/57.3);
begin
  return sqrt(x * x + y * y);
end
$$;

Why is this useful? Well, if your web application needs to find distances, such as “drug stores within 10 miles of zip code 02139,” then you can construct a single SQL query that will give you the answer. This can be far more efficient than loading the entire zip code database in memory and trudging through the data in the application layer, which may not be optimized for doing these sorts of computations. For example, assume you’ve got a table drug_stores with latitude and longitude columns, and a table zip_codes that has the latitude and longitude for each zip code in the United States. To find the drug stores within 10 miles of 02139, you issue the following query:

select *
  from drug_stores d,
       zip_codes z
 where z.zip_code = '02139'
   and miles_between_lat_long(d.latitude, d.longitude, z.latitude, z.longitude) < 10;
Tagged with: ,

CourseAdvisor acquired by Washington Post Company

Posted in courseadvisor by Dan Chak on October 12, 2007

Scaling to Enterprise

Posted in rails, scaling to enterprise by Dan Chak on October 12, 2007

It’s time to let the proverbial cat out out of the bag (meow!). Over the past several months, I’ve been working on a book for O’Reilly Media called Scaling to Enterprise with Ruby on Rails.

This book teaches you how to think like a architect and therefore it picks up where other Rails books leave off. What do I mean by that?

Most books out there teach you how to use tools: Ruby the language, Rails the framework, or Rails plug-in xyz. These books are great if you’re reading them purely to learn syntax.  Syntax is one thing; how you put everything together to make a site scale to millions of users is quite another thing altogether.

Also, this book is not focused exclusively on Ruby/Rails.  There’s a lot of theory in building enterprise web sites that doesn’t map directly into Ruby statements. Examples of such topics are schema design, different caching techniques and when each is appropriate, and service oriented architecture. These are all questions of web architecture, and the concepts are true independent of the language or framework choice.  In the end (and in the book), the theory does translate into Rails code, but a lot of time is devoted to exploring the theories themselve.  Armed with the why in addition to the what, you can make intelligent design decisions in your own projects.

The topics in this book will not be new to seasoned web software architects. However, there’s a new spin on old tried and true ideas, as the ideas will finally make their way into the Rails community discussion. This is a book for beginners (who have at least read the Programming Ruby and Agile Web Development With Rails books), intermediate, and advanced Rails users alike. I’m looking forward to feedback, especially once we start posting Rough Cuts online.

Tagged with:

CourseAdvisor is hiring!

Posted in courseadvisor by Dan Chak on October 3, 2007

We’re hiring Ruby on Rails (or would-be Ruby on Rails) developers. We’re looking for designer-front-end types, back-end-service types, and database-wizard-types. If you’re in the Boston area, please check out the the job description and get in touch!

CourseAdvisor Case Study at Rails For All

Posted in courseadvisor, rails by Dan Chak on July 13, 2007

Rails For All has as one of its goals gaining acceptance for Ruby on Rails in the enterprise. They recently did a case study on CourseAdvisor’s ability to handle 1.5M users per month on a Ruby on Rails stack.

Click for the case study.

Some notes that weren’t covered in the piece:

  • We use Postgres on our production sites. While MySQL has come a long way (especially with the latest spate of Google patches), Postgres is still in our view much better suited for a stack that treats the database as a real component of the application rather than simply a place to dump data.
  • Our peak traffic is currently around 32k hits in an hour, which is about 9 hits per second.
  • We have an SLA of one second for generation of all user facing pages. The one exception is our “money” page where we do our complex matching algorithms to suggest a personalized list of schools to a visitor based on their individual profiled. For that page we have a two second SLA.