Dan Chak’s Blog

Materialized Views in Postgres

Posted in postgres, scaling to enterprise by Dan Chak on June 2, 2008

I got a lot of great feedback on my tutorial at PGCon. The slides, as well as a sample chapter from my upcoming O’Reilly Book are posted on the PGCon web site. You can also grab them here:

I highly recommend the sample chapter, as it takes you through an example in prose form. The slides from the talk are pretty content-heavy but some code slides may be difficult to grasp without commentary.

PGCon 2008

Posted in postgres by Dan Chak on February 19, 2008

My presentation on materialized views in Postgres, and especially materialized views with time dependencies, has been accepted to PGCon 2008.

Abstract: Complex queries repeated in application code often are recast as views. This makes programming easier, but it does nothing to improve the performance of the underlying query. Materializing the view is the next logical step. Materialized views are stored in standard tables, so joins, procedure evaluation, and conditions from the original query are pre-computed before queries are run against the materialized view. Because the data is in a table, performance can be further improved by adding indexes.

Many developers shy away from materializing views in Postgres because the functionality is not built in, as it is in Oracle. In this talk, I will present a straightforward approach that can be applied to materialize any view, even those that Oracle cannot handle. Case in point: views that have a time-dependency (such as those based on the current time, now()) don’t have convenient trigger points (such as commits) to signal that the view should be refreshed, and are a major stumbling block for view materialization. This talk presents a technique to handles those cases as well, bringing materialized views even more powerful than those available in commercial products to Postgres.

Hope to see you in Ottawa!

Tagged with: ,

Geographic Distance in Postgres

Posted in scaling to enterprise by Dan Chak on December 25, 2007

To help proliferate the idea that the database is very much a part of your Rails app, I’m going to be posting useful tidbits of PL/pgSQL that can help you application go and go fast. Feel free to send in your own PL/pgSQL snippets and I’ll post them here as well.

I find that when folks say Ruby/Rails is slow, they’re doing things in Ruby/Rails application code that they shouldn’t be. The following PL/pgSQL function will compute an approximate distance between two sets of latitude/longitude pairs.

create or replace function miles_between_lat_long(
  lat1 numeric, long1 numeric, lat2 numeric, long2 numeric
) returns numeric
language 'plpgsql' as $$
declare
  x numeric = 69.1 * (lat2 - lat1);
  y numeric = 69.1 * (long2 - long1) * cos(lat1/57.3);
begin
  return sqrt(x * x + y * y);
end
$$;

Why is this useful? Well, if your web application needs to find distances, such as “drug stores within 10 miles of zip code 02139,” then you can construct a single SQL query that will give you the answer. This can be far more efficient than loading the entire zip code database in memory and trudging through the data in the application layer, which may not be optimized for doing these sorts of computations. For example, assume you’ve got a table drug_stores with latitude and longitude columns, and a table zip_codes that has the latitude and longitude for each zip code in the United States. To find the drug stores within 10 miles of 02139, you issue the following query:

select *
  from drug_stores d,
       zip_codes z
 where z.zip_code = '02139'
   and miles_between_lat_long(d.latitude, d.longitude, z.latitude, z.longitude) < 10;
Tagged with: ,