Professional Web Applications Themes

Multi-column indexes - PostgreSQL / PGSQL

Greetings! I have a technical question concerning multi-column indexes and their implementation. I tried looking for the answr in the docs but couldn't find anything. I have the following table: eventlog=> \d agent.record Table "agent.record" Column | Type | Modifiers --------------------+--------------------------+--------------------------------------------------------- luid | bigint | not null default nextval('agent.record_luid_seq'::text) host_luid | bigint | remote_system_luid | bigint | log_luid | bigint | not null time_logged | timestamp with time zone | not null default now() record | bytea | not null error | boolean | error_reason | text | Indexes: "record_pkey" primary key, btree (luid) "record_to_process_idx" unique, btree (host_luid, log_luid, luid) ...

  1. #1

    Default Multi-column indexes

    Greetings!

    I have a technical question concerning multi-column indexes and their
    implementation. I tried looking for the answr in the docs but couldn't
    find anything.

    I have the following table:

    eventlog=> \d agent.record
    Table "agent.record"
    Column | Type |
    Modifiers
    --------------------+--------------------------+---------------------------------------------------------
    luid | bigint | not null default nextval('agent.record_luid_seq'::text)
    host_luid | bigint |
    remote_system_luid | bigint |
    log_luid | bigint | not null
    time_logged | timestamp with time zone | not null default now()
    record | bytea | not null
    error | boolean |
    error_reason | text |
    Indexes:
    "record_pkey" primary key, btree (luid)
    "record_to_process_idx" unique, btree (host_luid, log_luid, luid) WHERE (error IS NULL)
    "record_to_process_idx2" unique, btree (luid) WHERE (error IS NULL)
    "record_last_logged_idx" btree (time_logged, host_luid, log_luid, luid)
    Foreign-key constraints:
    "$1" FOREIGN KEY (host_luid) REFERENCES eventlog.host(luid) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (remote_system_luid) REFERENCES eventlog.remote_system(luid)
    "$3" FOREIGN KEY (log_luid) REFERENCES eventlog.log(luid) ON UPDATE CASCADE ON DELETE CASCADE

    consisting of 27306578 rows.


    So I try running the following query:

    explain yze
    select record
    from agent.record
    where host_luid = 3::bigint
    and log_luid = 2::bigint
    and error is null
    order by host_luid desc, log_luid desc, luid desc
    limit 1

    I get the following query plan:

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=0.00..1.47 rows=1 width=286) (actual time=249064.949..249064.950 rows=1 loops=1)
    -> Index Scan Backward using record_to_process_idx on record (cost=0.00..13106.73 rows=8898 width=286) (actual time=249064.944..249064.944 rows=1 loops=1)
    Index Cond: ((host_luid = 3::bigint) AND (log_luid = 2::bigint))
    Filter: (error IS NULL)
    Total runtime: 249065.004 ms
    (5 rows)

    Now, this plan seems kinda slow, in the sense of scanning backwards. And
    it takes quite a long time (compared to seeking the last row based only on
    <luid>, for example). It feels that if I have <host_luid> values of
    (1,2,3,4,5), that the above is scanning through _all_ 5 entries, then 4
    entries, and then finally gets to 3.

    So, now to my question: is this really happening?

    I guess it breaks down to how these indexes are implemented. Are
    multi-column indexes implemented as true multiple level indexes, in the
    sense there is a level 1 index on <host_luid>, pointing to a level 2 index
    on <log_luid>, pointing to a level 3 index on <luid>? Or are they the
    equivalent of a <host_luid,log_luid,luid> single index (ie, as if I
    created a functional index consisting of
    host_luid || ',' || log_luid || ',' || luid
    )?

    My initial guess was that Postgresql would search first to the <host_luid>
    desc, then from there to the specific <log_luid> desc, and then from there
    to the <luid> (ie, the equivalent of 3 btree jumps), essentialy skipping
    over the inappropriate <host_luid>'s of 5 and 4. But it seems to be
    scanning through them, even though I have a low cost for random page
    accesses within my postgresql.conf file. Are they components of the index
    to allow it to "skip" backwards lots of pages rather than loading them
    from disk?

    Any ideas? How does multi-column indexes really work? I would hate to have
    to define specific indexes for each <host_luid> as this is an
    unmaintainable situation.

    Thanks!
    Ed


    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Edmund Dengler Guest

  2. #2

    Default Re: Multi-column indexes

    On Sat, Jan 15, 2005 at 04:00:03PM -0500, Edmund Dengler wrote:
    > Greetings!
    >
    > I have a technical question concerning multi-column indexes and their
    > implementation. I tried looking for the answr in the docs but couldn't
    > find anything.
    <snip>
    > I guess it breaks down to how these indexes are implemented. Are
    > multi-column indexes implemented as true multiple level indexes, in the
    > sense there is a level 1 index on <host_luid>, pointing to a level 2 index
    > on <log_luid>, pointing to a level 3 index on <luid>? Or are they the
    > equivalent of a <host_luid,log_luid,luid> single index (ie, as if I
    > created a functional index consisting of
    > host_luid || ',' || log_luid || ',' || luid
    > )?
    It's a true multi-column index, so it'll go straight to the right
    host_luid, then the log_luid and then start scanning. My guess is that
    your problem is that there are lots of rows that match but have a
    non-NULL error. Are you suffering from index bloat?

    As a suggestion, if (error IS NULL) is something you search on
    regularly and is a small portion of the table you should create a
    partial index:

    CREATE INDEX ... WHERE error IS NULL;

    Hope this helps,
    --
    Martijn van Oosterhout <kleptogsvana.org> [url]http://svana.org/kleptog/[/url]
    > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
    > tool for doing 5% of the work and then sitting around waiting for someone
    > else to do the other 95% so you can sue them.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.0.6 (GNU/Linux)
    Comment: For info see [url]http://www.gnupg.org[/url]

    iD8DBQFB6ZgPY5Twig3Ge+YRAh3UAKCkWGvWe9KplTsX15ykZe Riyc5rswCdGyHI
    hcJfSb7O0mZVKXSqGwm2VSE=
    =aeLd
    -----END PGP SIGNATURE-----

    Martijn van Oosterhout Guest

  3. #3

    Default Re: Multi-column indexes

    Edmund Dengler <edmunddeSentire.com> writes:
    > "record_to_process_idx" unique, btree (host_luid, log_luid, luid) WHERE (error IS NULL)
    > explain yze
    > select record
    > from agent.record
    > where host_luid = 3::bigint
    > and log_luid = 2::bigint
    > and error is null
    > order by host_luid desc, log_luid desc, luid desc
    > limit 1
    > Limit (cost=0.00..1.47 rows=1 width=286) (actual time=249064.949..249064.950 rows=1 loops=1)
    > -> Index Scan Backward using record_to_process_idx on record (cost=0.00..13106.73 rows=8898 width=286) (actual time=249064.944..249064.944 rows=1 loops=1)
    > Index Cond: ((host_luid = 3::bigint) AND (log_luid = 2::bigint))
    > Filter: (error IS NULL)
    > Total runtime: 249065.004 ms
    Are there a whole lotta rows with that host_luid/log_luid combination?

    What's happening is that the index search initially finds the first such
    row, and then it has to step to the last such row to start the backwards
    scan. This is fixed as of 8.0, but all earlier releases are going to be
    slow in that scenario. It's got nothing to do with single vs multi
    column indexes, it is just a shortcoming of the startup code for
    backwards index scans. (I get the impression that the original
    implementation of Postgres' btree indexes only supported unique indexes,
    because there were a number of places where it was horridly inefficient
    for large numbers of equal keys. I think this 8.0 fix is the last such
    issue.)

    Since your index has an additional column, there is a hack you can use
    to get decent performance in 7.4 and before. Add a dummy condition on
    the last column:
    where host_luid = 3::bigint
    and log_luid = 2::bigint
    AND LUID <= someverylargevalue::bigint
    and error is null
    order by host_luid desc, log_luid desc, luid desc
    limit 1
    Now, instead of positioning to the first row with value (3,2,anything)
    the initial btree descent will home in on the end of that range, and
    so the expensive stepping over all the rows between is avoided.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain yze is your friend

    Tom Lane Guest

  4. #4

    Default Re: Multi-column indexes

    Hi Tom!

    Yep, there are a large number of host_luid/log_luid combinations (there
    are approximatly 5-10 hosts and 1-3 logs per system we are running).

    Thanks for the recommended workaround, I'll have a try at it at some point
    tomorrow.

    Regards!
    Ed


    On Sat, 15 Jan 2005, Tom Lane wrote:
    > Edmund Dengler <edmunddeSentire.com> writes:
    > > "record_to_process_idx" unique, btree (host_luid, log_luid, luid) WHERE (error IS NULL)
    >
    > > explain yze
    > > select record
    > > from agent.record
    > > where host_luid = 3::bigint
    > > and log_luid = 2::bigint
    > > and error is null
    > > order by host_luid desc, log_luid desc, luid desc
    > > limit 1
    >
    > > Limit (cost=0.00..1.47 rows=1 width=286) (actual time=249064.949..249064.950 rows=1 loops=1)
    > > -> Index Scan Backward using record_to_process_idx on record (cost=0.00..13106.73 rows=8898 width=286) (actual time=249064.944..249064.944 rows=1 loops=1)
    > > Index Cond: ((host_luid = 3::bigint) AND (log_luid = 2::bigint))
    > > Filter: (error IS NULL)
    > > Total runtime: 249065.004 ms
    >
    > Are there a whole lotta rows with that host_luid/log_luid combination?
    >
    > What's happening is that the index search initially finds the first such
    > row, and then it has to step to the last such row to start the backwards
    > scan. This is fixed as of 8.0, but all earlier releases are going to be
    > slow in that scenario. It's got nothing to do with single vs multi
    > column indexes, it is just a shortcoming of the startup code for
    > backwards index scans. (I get the impression that the original
    > implementation of Postgres' btree indexes only supported unique indexes,
    > because there were a number of places where it was horridly inefficient
    > for large numbers of equal keys. I think this 8.0 fix is the last such
    > issue.)
    >
    > Since your index has an additional column, there is a hack you can use
    > to get decent performance in 7.4 and before. Add a dummy condition on
    > the last column:
    > where host_luid = 3::bigint
    > and log_luid = 2::bigint
    > AND LUID <= someverylargevalue::bigint
    > and error is null
    > order by host_luid desc, log_luid desc, luid desc
    > limit 1
    > Now, instead of positioning to the first row with value (3,2,anything)
    > the initial btree descent will home in on the end of that range, and
    > so the expensive stepping over all the rows between is avoided.
    >
    > regards, tom lane
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 8: explain yze is your friend
    >
    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain yze is your friend

    Edmund Dengler Guest

Similar Threads

  1. How to Populate multi column datagrid
    By leotemp in forum Macromedia Flex General Discussion
    Replies: 1
    Last Post: June 26th, 01:39 AM
  2. Delete with a multi-column join?
    By leon-pg@comvision.com in forum PostgreSQL / PGSQL
    Replies: 2
    Last Post: January 25th, 10:03 PM
  3. multi column index and order by
    By Mage in forum PostgreSQL / PGSQL
    Replies: 2
    Last Post: January 5th, 06:43 PM
  4. changing column from int4 to int8, what happens with indexes?
    By David Teran in forum PostgreSQL / PGSQL
    Replies: 2
    Last Post: January 4th, 04:36 PM
  5. Multi Column List Box
    By John Devine in forum Microsoft Access
    Replies: 0
    Last Post: July 8th, 12:21 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139