Professional Web Applications Themes

does "select count(*) from mytable" always do a seq scan? - PostgreSQL / PGSQL

Hi, I am using Postgresql 7.4. I have a table with 1.5 million rows. It has a primary key. VACUUM FULL YZE is run every night. There are 2000-5000 inserts on this table every day but very few updates and deletes. When I select count(*) from this table it is using a sequence scan. Is this just life or is there some way to get this to do an index scan? culley ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]...

  1. #1

    Default does "select count(*) from mytable" always do a seq scan?

    Hi,

    I am using Postgresql 7.4. I have a table with 1.5 million rows. It
    has a primary key. VACUUM FULL YZE is run every night. There are
    2000-5000 inserts on this table every day but very few updates and
    deletes. When I select count(*) from this table it is using a
    sequence scan. Is this just life or is there some way to get this to
    do an index scan?

    culley

    ---------------------------(end of broadcast)---------------------------
    TIP 5: Have you checked our extensive FAQ?

    [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]

    Culley Harrelson Guest

  2. #2

    Default Re: does "select count(*) from mytable" always do a seq

    Am Freitag, den 07.01.2005, 06:45 -0800 schrieb Culley Harrelson:
    > Hi,
    >
    > I am using Postgresql 7.4. I have a table with 1.5 million rows. It
    > has a primary key. VACUUM FULL YZE is run every night. There are
    > 2000-5000 inserts on this table every day but very few updates and
    > deletes. When I select count(*) from this table it is using a
    > sequence scan. Is this just life or is there some way to get this to
    > do an index scan?
    How do you think an index would help if you do an unconditional
    count(*)?

    Regards
    Tino


    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Tino Wildenhain Guest

  3. #3

    Default Re: does "select count(*) from mytable" always do a seq

    On Fri, Jan 07, 2005 at 16:17:16 +0100,
    Tino Wildenhain <tinowildenhain.de> wrote:
    > Am Freitag, den 07.01.2005, 06:45 -0800 schrieb Culley Harrelson:
    > > Hi,
    > >
    > > I am using Postgresql 7.4. I have a table with 1.5 million rows. It
    > > has a primary key. VACUUM FULL YZE is run every night. There are
    > > 2000-5000 inserts on this table every day but very few updates and
    > > deletes. When I select count(*) from this table it is using a
    > > sequence scan. Is this just life or is there some way to get this to
    > > do an index scan?
    >
    > How do you think an index would help if you do an unconditional
    > count(*)?
    Some systems can just run through the index without having to access the
    tuples. This can result in you having to read significantly fewer disk blocks
    to get the count. Unfortunately, postgres still needs to check visibility
    for each tuple and so an using index scan for count will be slower than
    a sequential scan if a significant fraction of the table is being counted.

    If an approximate answer is OK there is some information calculated when
    you vacuum a table and you could query this value in the pg catalog.
    I don't remember the name of what you want, but this should be in the
    archives.

    Another solution is to use a trigger to keep a count in another table.
    from what you say above, this might be a practical solution for you.
    Doing this has also been discussed in the archives.

    ---------------------------(end of broadcast)---------------------------
    TIP 6: Have you searched our list archives?

    [url]http://archives.postgresql.org[/url]

    Bruno Wolff III Guest

  4. #4

    Default Re: does "select count(*) from mytable" always do a seq scan?

    On Fri, 07 Jan 2005 16:17:16 +0100, Tino Wildenhain <tinowildenhain.de> wrote:
    >
    > How do you think an index would help if you do an unconditional
    > count(*)?
    I really don't know <grin>. I don't know the inner workings of
    database internals but I would guess that there would be some
    optimized way of counting the nodes in an index tree that would be
    faster than sequentially going through a table.... I suppose there is
    no free lunch.

    One row, two rows, three rows, four rows, five rows.... <snore>

    culley

    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Culley Harrelson Guest

  5. #5

    Default Re: does "select count(*) from mytable" always do a seq scan?

    This is interesting... Perhaps a more knowledgable person for pgsql
    could help us here...

    I seem to remember something to do with the fact that You can't use
    aggregate functions over an index... I'm not sure why though.

    You can do:
    create index foo on my_table (lower(my_column))

    but not
    create index foo on my_table(min(my_column)) - I guess it wouldn't be
    much of an index - it would be a single value.
    You could reproduce that functionality with a trigger that updated a
    table that contained the value of (min(my_column)), and I guess you
    could do the same fo count(*) too.

    I guess what I"m really asking is why can't you run aggregates over an index?

    Alex Turner
    NetEconomist

    On Fri, 7 Jan 2005 09:09:49 -0800, Culley Harrelson <harrelson> wrote:
    > On Fri, 07 Jan 2005 16:17:16 +0100, Tino Wildenhain <tinowildenhain.de> wrote:
    > >
    > > How do you think an index would help if you do an unconditional
    > > count(*)?
    >
    > I really don't know <grin>. I don't know the inner workings of
    > database internals but I would guess that there would be some
    > optimized way of counting the nodes in an index tree that would be
    > faster than sequentially going through a table.... I suppose there is
    > no free lunch.
    >
    > One row, two rows, three rows, four rows, five rows.... <snore>
    >
    > culley
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 4: Don't 'kill -9' the postmaster
    >
    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Alex Turner Guest

  6. #6

    Default Re: does "select count(*) from mytable" always do a seq

    > I guess what I"m really asking is why can't you run aggregates over an index?

    It's got to do with MVCC and transaction consistency. Running count(*) or an
    aggregate function on an index could include records that should not be
    visible to your current transaction.


    --
    Scott Ribe
    [email]scott_ribekillerbytes.com[/email]
    [url]http://www.killerbytes.com/[/url]
    (303) 665-7007 voice



    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain yze is your friend

    Scott Ribe Guest

  7. #7

    Default Re: does "select count(*) from mytable" always do a seq

    No offense or anything, but that doesn't make any sense. If you are
    running count(*) against a table, it still has to worry about MVCC,
    and which rows are visible to your transaction. What difference does
    it make, table or index, the system still has to figure out which rows
    are visible in the current transaction, so why not use the index?

    (The example is really count(pkey) because count(*) is always going to
    do a seq scan I reckon - and could probably never use an index).

    Alex Turner
    NetEconomist


    On Fri, 07 Jan 2005 11:17:32 -0700, Scott Ribe
    <scott_ribekillerbytes.com> wrote:
    > > I guess what I"m really asking is why can't you run aggregates over an index?
    >
    > It's got to do with MVCC and transaction consistency. Running count(*) or an
    > aggregate function on an index could include records that should not be
    > visible to your current transaction.
    >
    > --
    > Scott Ribe
    > [email]scott_ribekillerbytes.com[/email]
    > [url]http://www.killerbytes.com/[/url]
    > (303) 665-7007 voice
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 8: explain yze is your friend
    >
    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain yze is your friend

    Alex Turner Guest

  8. #8

    Default Re: does "select count(*) from mytable" always do a seq

    > No offense or anything, but that doesn't make any sense. If you are
    > running count(*) against a table, it still has to worry about MVCC,
    > and which rows are visible to your transaction. What difference does
    > it make, table or index, the system still has to figure out which rows
    > are visible in the current transaction, so why not use the index?
    Your mistake seems to be assuming that row visibility is tracked in the
    index. As was stated earlier in the thread, row visibility information is
    not available in the index, therefore rows have to be looked at to determine
    whether they're visible. What this means is that using the index would only
    add an additional unnecessary step.
    > (The example is really count(pkey) because count(*) is always going to
    > do a seq scan I reckon - and could probably never use an index).
    No, if there is an index on a column that is required, such as a primary
    key, then count(pkey) is equal to count(*). Many databases make use of this
    fact to optimize performance of count(*) by using an index scan.


    --
    Scott Ribe
    [email]scott_ribekillerbytes.com[/email]
    [url]http://www.killerbytes.com/[/url]
    (303) 665-7007 voice



    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain yze is your friend

    Scott Ribe Guest

  9. #9

    Default Re: does "select count(*) from mytable" always do a seq

    > (The example is really count(pkey) because count(*) is always going to
    > do a seq scan I reckon - and could probably never use an index).
    postgres knows that count(*) is just "count the rows", you can use
    count(1), it makes no difference...
    >
    > Alex Turner
    > NetEconomist
    >
    >
    > On Fri, 07 Jan 2005 11:17:32 -0700, Scott Ribe
    > <scott_ribekillerbytes.com> wrote:
    >> > I guess what I"m really asking is why can't you run aggregates over
    >> an index?
    >>
    >> It's got to do with MVCC and transaction consistency. Running count(*)
    >> or an
    >> aggregate function on an index could include records that should not be
    >> visible to your current transaction.
    >>
    >> --
    >> Scott Ribe
    >> [email]scott_ribekillerbytes.com[/email]
    >> [url]http://www.killerbytes.com/[/url]
    >> (303) 665-7007 voice
    >>
    >> ---------------------------(end of broadcast)---------------------------
    >> TIP 8: explain yze is your friend
    >>
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 8: explain yze is your friend
    >


    ---------------------------(end of broadcast)---------------------------
    TIP 5: Have you checked our extensive FAQ?

    [url]http://www.postgresql.org/docs/faqs/FAQ.html[/url]

    Pierre-Frédéric Caillaud Guest

  10. #10

    Default Re: does "select count(*) from mytable" always do a seq

    Forgive my ignorance, but I'm still learning about much of this stuff.
    If you perform:

    select an_id, int_value from my_table where int_value>400;

    The table has an index on int_value and there are enough rows to
    warrant using it. Doesn't the database perform in index scan on
    int_value followed by a retrieve for the datablocks with relavent oids
    to get the an_id field?

    If another transaction has inserted rows into this table, won't the
    index have been updated, and contain new row references? Does this
    imply that the database must retrieve the row information to determine
    if the row is a row from a different transaction?

    thanks,

    Alex Turner


    On Sat, 08 Jan 2005 12:39:41 -0700, Scott Ribe
    <scott_ribekillerbytes.com> wrote:
    > > No offense or anything, but that doesn't make any sense. If you are
    > > running count(*) against a table, it still has to worry about MVCC,
    > > and which rows are visible to your transaction. What difference does
    > > it make, table or index, the system still has to figure out which rows
    > > are visible in the current transaction, so why not use the index?
    >
    > Your mistake seems to be assuming that row visibility is tracked in the
    > index. As was stated earlier in the thread, row visibility information is
    > not available in the index, therefore rows have to be looked at to determine
    > whether they're visible. What this means is that using the index would only
    > add an additional unnecessary step.
    >
    > > (The example is really count(pkey) because count(*) is always going to
    > > do a seq scan I reckon - and could probably never use an index).
    >
    > No, if there is an index on a column that is required, such as a primary
    > key, then count(pkey) is equal to count(*). Many databases make use of this
    > fact to optimize performance of count(*) by using an index scan.
    >
    > --
    > Scott Ribe
    > [email]scott_ribekillerbytes.com[/email]
    > [url]http://www.killerbytes.com/[/url]
    > (303) 665-7007 voice
    >
    >
    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to [email]majordomopostgresql.org[/email] so that your
    message can get through to the mailing list cleanly

    Alex Turner Guest

  11. #11

    Default Re: does "select count(*) from mytable" always do a seq

    On Mon, Jan 10, 2005 at 10:26:46 -0500,
    Alex Turner <armtuk> wrote:
    > Forgive my ignorance, but I'm still learning about much of this stuff.
    > If you perform:
    >
    > select an_id, int_value from my_table where int_value>400;
    >
    > The table has an index on int_value and there are enough rows to
    > warrant using it. Doesn't the database perform in index scan on
    > int_value followed by a retrieve for the datablocks with relavent oids
    > to get the an_id field?
    I don't think that oids are used in the process, but if the planner thinks
    an index scan would be better it will use one.
    > If another transaction has inserted rows into this table, won't the
    > index have been updated, and contain new row references? Does this
    > imply that the database must retrieve the row information to determine
    > if the row is a row from a different transaction?
    When doing an index scan, the heap tuples still need to be checked for
    visibility to the current transaction.

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to [email]majordomopostgresql.org[/email]

    Bruno Wolff III Guest

  12. #12

    Default Re: does "select count(*) from mytable" always do a seq

    I'm no database writing guru, but wouldn't it just be a matter of
    adding a transaction number to an index entry so as to determine it's
    newness and only retrieve entries with an older transaction number?

    I'm guessing that the theory is that most insert transactions will be
    committed, or only contain a small number of rows relative to the
    overall size of the table, and therefore the extra overhead of
    checking newer tuples won't impact the overall performance that much?

    I know I'm asking kind of deep questions that really don't affect much
    of anything, but I'm a devilishly curious individual, and I like
    understanding things that I use well. Feel free to tell me that it's
    irrelavant, or that I'm full of hot air and I don't have a good
    question ;)

    Alex Turner
    NetEconomist


    On Mon, 10 Jan 2005 10:34:46 -0600, Bruno Wolff III <brunowolff.to> wrote:
    > On Mon, Jan 10, 2005 at 10:26:46 -0500,
    > Alex Turner <armtuk> wrote:
    > > Forgive my ignorance, but I'm still learning about much of this stuff.
    > > If you perform:
    > >
    > > select an_id, int_value from my_table where int_value>400;
    > >
    > > The table has an index on int_value and there are enough rows to
    > > warrant using it. Doesn't the database perform in index scan on
    > > int_value followed by a retrieve for the datablocks with relavent oids
    > > to get the an_id field?
    >
    > I don't think that oids are used in the process, but if the planner thinks
    > an index scan would be better it will use one.
    >
    > > If another transaction has inserted rows into this table, won't the
    > > index have been updated, and contain new row references? Does this
    > > imply that the database must retrieve the row information to determine
    > > if the row is a row from a different transaction?
    >
    > When doing an index scan, the heap tuples still need to be checked for
    > visibility to the current transaction.
    >
    >
    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Alex Turner Guest

  13. #13

    Default Re: does "select count(*) from mytable" always do a seq

    On Mon, Jan 10, 2005 at 11:51:51AM -0500, Alex Turner wrote:
    > I'm no database writing guru, but wouldn't it just be a matter of
    > adding a transaction number to an index entry so as to determine it's
    > newness and only retrieve entries with an older transaction number?
    No, it's more complex than that. Index entries would have to be labeled
    with both a creation transaction Id and a destruction transaction Id
    (xmin and xmax. Probably it'd also need Cmin and Cmax to be completely
    consistent.) Keeping them in sync would be prone to deadlock because
    it'd have to simultaneously update the table proper and the possibly
    multiple indexes. Plus, having all those identifiers in the index file
    would imply more I/O costs.
    > I'm guessing that the theory is that most insert transactions will be
    > committed, or only contain a small number of rows relative to the
    > overall size of the table, and therefore the extra overhead of
    > checking newer tuples won't impact the overall performance that much?
    The choice is yours (or whoever's): if you absolutely need exact
    numbers, you pay the cost of having a trigger. OTOH if you can do with
    estimates, you can use the reltuples column from pg_class.

    --
    Alvaro Herrera (<alvherre[]dcc.uchile.cl>)
    "There is evil in the world. There are dark, awful things. Occasionally, we get
    a glimpse of them. But there are dark corners; horrors almost impossible to
    imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)

    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Alvaro Herrera Guest

  14. #14

    Default Re: does "select count(*) from mytable" always do a seq

    On Mon, Jan 10, 2005 at 11:51:51 -0500,
    Alex Turner <armtuk> wrote:
    > I'm no database writing guru, but wouldn't it just be a matter of
    > adding a transaction number to an index entry so as to determine it's
    > newness and only retrieve entries with an older transaction number?
    No, because transactions don't complete in the order they are started.
    So you will need to be able to maintain some kind of list to cover
    exceptions.
    > I'm guessing that the theory is that most insert transactions will be
    > committed, or only contain a small number of rows relative to the
    > overall size of the table, and therefore the extra overhead of
    > checking newer tuples won't impact the overall performance that much?
    >
    > I know I'm asking kind of deep questions that really don't affect much
    > of anything, but I'm a devilishly curious individual, and I like
    > understanding things that I use well. Feel free to tell me that it's
    > irrelavant, or that I'm full of hot air and I don't have a good
    > question ;)
    There have been discussions in the past about why the core developers
    feel that moving visibility status into indexes would be a net loss
    on average. I don't think there has been one for a while, but you can
    try searching the hackers archive.

    ---------------------------(end of broadcast)---------------------------
    TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [email]majordomopostgresql.org[/email])

    Bruno Wolff III Guest

  15. #15

    Default Re: does "select count(*) from mytable" always do a seq

    On Mon, Jan 10, 2005 at 11:23:42AM -0600, Bruno Wolff III wrote:
    > There have been discussions in the past about why the core developers
    > feel that moving visibility status into indexes would be a net loss
    > on average. I don't think there has been one for a while, but you can
    > try searching the hackers archive.
    I wonder if a technical discussion or a link to one could find its
    way into the FAQ. Has anybody ever put together a single doent
    that describes in detail the issue, various proposed solutions and
    their tradeoffs, and why the developers think the chosen implementation
    is best?

    --
    Michael Fuhr
    [url]http://www.fuhr.org/~mfuhr/[/url]

    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain yze is your friend

    Michael Fuhr Guest

Similar Threads

  1. Replies: 0
    Last Post: August 26th, 06:01 AM
  2. what is javascript syntax of 'member("world").model.count'?
    By lucihuge in forum Macromedia Director 3D
    Replies: 0
    Last Post: August 12th, 02:27 AM
  3. "Select" & "Order By" OK- ''WHERE'' Does'nt Work !
    By ian justice in forum PHP Development
    Replies: 23
    Last Post: November 20th, 06:10 AM
  4. Replies: 28
    Last Post: October 21st, 04:48 PM
  5. DataGrid Issue: "Cannot compute Count" for AllowCustomPaging
    By Steve Litvack in forum ASP.NET Data Grid Control
    Replies: 4
    Last Post: September 1st, 10:16 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139