Skip to content

Magnus Hagander's PostgreSQL blog
Syndicate content
Updated: 7 hours 2 min ago

Finding gaps in partitioned sequences

Fri, 01/27/2012 - 17:53

There are an almost unlimited number of articles on the web about how to find gaps in sequences in SQL. And it doesn't have to be very hard. Doing it in a "partitioned sequence" makes it a bit harder, but still not very hard. But when I turned to a window aggregate to do that, I was immediately told "hey, that's a good example of a window aggregate to solve your daily chores, you should blog about that". So here we go - yet another example of finding a gap in a sequence using SQL.

I have a database that is very simply structured - it's got a primary key made out of (groupid, year, month, seq), all integers. On top of that it has a couple of largish text fields and an fti field for full text search. (Initiated people will know right away which database this is). The sequence in the seq column resets to zero for each combination of (groupid, year, month). And I wanted to find out where there were gaps in it, and how big they were, to debug the tool that wrote the data into the database. This is really easy with a window aggregate:


SELECT * FROM (
   SELECT
      groupid,
      year,
      month,
      seq,
      seq-lag(seq,1) OVER (PARTITION BY groupid, year, month ORDER BY seq) AS gap FROM mytable
) AS t
WHERE NOT (t.gap=1)
ORDER BY groupid, year, month, seq
 

One advantage to using a window aggregate for this is that we actually get the whole row back, and not just the primary key - so it's easy enough to include all the data you need to figure something out.

What about performance? I don't really have a big database to test this on, so I can't say for sure. It's going to be a sequential scan, since I look at the whole table,and not just parts of it. It takes about 4 seconds to run over a table of about a million rows, 2.7Gb, on a modest VM with no actual I/O capacity to speak of and a very limited amount of memory, returning about 100 rows. It's certainly by far fast enough for me in this case.

And as a bonus, it found me two bugs in the loading script and at least one bug in somebody elses code that I'm now waiting on to get fixed...

Categories: Blogs, Open Source

www.postgresql.org - brand new, yet old and familiar

Wed, 12/21/2011 - 14:33

Most of the visitors to www.postgresql.org probably never noticed that a couple of weeks back, the entire site was replaced with a new one. In fact, we didn't just change the website - just days before, we made large changes to our ftp network as well (more about that in another post, from me or others). So in fact, we hope that most people didn't notice. The changes were mainly a technical refresh, and there hasn't been much change to the contents at all yet. We did sneak in a few content changes as well, that have been requested for a while, so I'm going to start with listing those:

  • The developer version of the documentation (updated serveral times per day from the tip of the HEAD branch that will eventually become the next version of PostgreSQL) now live on the main website, and will use the same stylesheets to look a lot nicer than before.
  • Anybody who submits content to our site (news, events, professional services, products, etc) will notice there is now a new concept of an Organisation. This means that it will finally be possible to have more than one person manage the submissions from a single company or group.
  • Again for those that submit content, it is now possible to view which of your submissions are still in the moderation queue, and it's also possible to edit something after it's been submitted. In fact, you can edit your items even after they've been approved. Any such editing will be post-moderated, and if this is abused that organization will be banned from post-moderation - but we don't expect that to ever be necessary.
  • And finally, for those that submit content again, we've switched to markdown to format your submissions, instead of a very random subset of allowed HTML tags.
The rest of the changes are under the hood, and it's mostly done for two reasons:
  • The technology powering the site was simply very old
  • The frameworks used were quite obscure, which severely limited the number of people who could (or wanted to) work with them

Hopefully these two changes will make it easier to contribute to the website, so if you're potentially interested in doing that, please read on!


Continue reading "www.postgresql.org - brand new, yet old and familiar"
Categories: Blogs, Open Source