Tutorial :Algorithm improvement on a simple looking postgresql query



Question:

High-level: Can I do this order by, group by based on sum any faster? (PG 8.4, fwiw., on a non-tiny table .... think O(millions of rows) )

Suppose I had a table like this:

                                 Table "public.summary"     Column    |       Type        |                      Modifiers  -------------+-------------------+------------------------------------------------------   ts          | integer           | not null default nextval('summary_ts_seq'::regclass)   field1      | character varying | not null   otherfield  | character varying | not null   country     | character varying | not null   lookups     | integer           | not null      Indexes:      "summary_pk" PRIMARY KEY, btree (ts, field1, otherfield, country)      "ix_summary_country" btree (country)      "ix_summary_field1" btree (field1)      "ix_summary_otherfield" btree (otherfield)      "ix_summary_ts" btree (ts)  

And the query I want is:

select summary.field1,      summary.country,      summary.ts,      sum(summary.lookups) as lookups,  from summary  where summary.country = 'za' and      summary.ts = 1275177600  group by summary.field1, summary.country, summary.ts  order by summary.ts, lookups desc, summary.field1  limit 100;  

(English: top 100 field1's at a particular (ts,country) where 'topness' is the sum of lookups for any matching row, regardless of value of otherfield)

Is there anything I can really do to speed this up? Algorithmically this seems to be a full table scan kind of thing, but I might be missing something.


Solution:1

Any query plan for this query will have to scan every row that matches the WHERE conditions, rolling them up by the grouping conditions - that is, the amount of work is proportional to the number of input rows to the group by, not the number of result rows.

The most efficient query plan possible for a query like this is a single index scan. This ought to be possible if you build an index on (country, ts) in that order; with that index, every possible query of this form resolves to a contiguous range over the index. This will still require an in-memory sort, though - it may be possible to avoid this with a different index.

As others have said, though, posting an execution plan is your best option.


Solution:2

In order to be able to suggest anything, you should post the execution plan of the query.

And "OMG Ponies" is right: limit 100 will limit the overall result to 100 rows, it will not work on individual groups!

There is a nice article in the Postgres Wiki that explains how to post a question related to a slow query:

http://wiki.postgresql.org/wiki/SlowQueryQuestions


Solution:3

Index on (country, ts) is a best bet (like Nick Johnson suggests), and additionally you may want to raise work_mem if its not set very high. You can SET this at runtime if needed (and if making it very high, then recommended). It will help keep your sorts in memory and not spill to disk (if thats happening).

For real help, we'll need to see an EXPLAIN ANALYZE, posting it on explain.depesz.com can make it very readable.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »