Strange Index behavior

Ask a Question related to PostgreSQL / PGSQL, Design and Development.

  1. #1

    Default Strange Index behavior

    Hi,
    Is it a normal behavior that if I give a where clause with an existent index
    key, then postgres uses the index, but if I give it a non existent value
    than it refuses to use the index.

    An example to make it more clear:

    CREATE TABLE measured_1
    (
    tstamp timestamp(0) NOT NULL,
    meterid int4 NOT NULL,
    pp numeric NOT NULL DEFAULT 0,
    pm numeric NOT NULL DEFAULT 0,
    qp numeric NOT NULL DEFAULT 0,
    qm numeric NOT NULL DEFAULT 0,
    status bit(5),
    CONSTRAINT measured_1_pkey PRIMARY KEY (tstamp, meterid)
    ) ;

    --
    -- The table contains rows with tstamp values from '2004.12.22 12:00' so the
    00:00 row does not exists!!!
    --

    # explain analyze select meterid, tstamp, pp, pm, status from measured_1
    where tstamp >= '2004.12.22 30:00' and tstamp <= '2004.12.22 23:59' order by
    tstamp, meterid;
    "Index Scan using measured_1_pkey on measured_1 (cost=0.00..5.34 rows=1
    width=42) (actual time=0.000..111.000 rows=6016 loops=1)"
    " Index Cond: ((tstamp >= '2004-12-22 13:00:00'::timestamp without time
    zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    "Total runtime: 111.000 ms"
    -- This is quite an acceptable result time

    -- BUT!!!!

    # explain analyze select meterid, tstamp, pp, pm, status from measured_1
    where tstamp >= '2004.12.22 00:00' and tstamp <= '2004.12.22 23:59' order by
    tstamp, meterid;
    "Sort (cost=2619.02..2622.78 rows=1505 width=42) (actual
    time=1672.000..1682.000 rows=14523 loops=1)"
    " Sort Key: tstamp, meterid"
    " -> Seq Scan on measured_1 (cost=0.00..2539.59 rows=1505 width=42)
    (actual time=0.000..1292.000 rows=14523 loops=1)"
    " Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
    zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    "Total runtime: 1802.000 ms"
    -- this is definitely not acceptable.

    Is this normal??? Or what do I wrong???

    Thanks,

    -- Csaba

    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Együd Csaba Guest

  2. Similar Questions and Discussions

    1. Strange behavior when setting gridview selected index
      Hello there, I have an asp.net 2.0 web app that uses gridview and formview. When a user selects an item in the gridview the formview appears to...
    2. Strange behavior
      The problem seems to be in c code calling ruby calling c code. ======== start test.rb puts "about to require curses" require "curses" puts...
    3. Strange behavior of $.
      Apparently $. is not always set correct (see second ruby 1liner). Is this a bug? 12:12:42 : cat -n n 1 2 3 BAR="hello" 4 12:12:47 : ruby...
    4. Strange cookie behavior
      Maybe the remote server is in a different time zone. Ray at work -- Will trade ASP help for SQL Server help "AHN" <anerse@excite.com>...
    5. Why strange IF...ELSE behavior
      Hi all, I'm getting a strange result with the following IF statement: $bar = ($foo == 'last') ? true : false; In my script $foo normaly has...
  3. #2

    Default Re: Strange Index behavior

    =?iso-8859-2?Q?Egy=FCd_Csaba?= <csegyud@vnet.hu> writes:
    > # explain analyze select meterid, tstamp, pp, pm, status from measured_1
    > where tstamp >= '2004.12.22 00:00' and tstamp <= '2004.12.22 23:59' order by
    > tstamp, meterid;
    > "Sort (cost=2619.02..2622.78 rows=1505 width=42) (actual
    > time=1672.000..1682.000 rows=14523 loops=1)"
    > " Sort Key: tstamp, meterid"
    > " -> Seq Scan on measured_1 (cost=0.00..2539.59 rows=1505 width=42)
    > (actual time=0.000..1292.000 rows=14523 loops=1)"
    > " Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
    > zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    > "Total runtime: 1802.000 ms"
    Why do you say there are no such rows, when the explain output clearly
    shows there are 14523 of them? The plan shift looks fairly reasonable
    to me given the large number of rows to be retrieved.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 6: Have you searched our list archives?

    [url]http://archives.postgresql.org[/url]

    Tom Lane Guest

  4. #3

    Default Re: Strange Index behavior

    On Wed, Dec 22, 2004 at 03:09:08PM +0100, Együd Csaba wrote:
    > Hi,
    > Is it a normal behavior that if I give a where clause with an existent index
    > key, then postgres uses the index, but if I give it a non existent value
    > than it refuses to use the index.
    Whether the value exists is irrelevent. What matters is the number of
    rows expected to be returned. Notice:
    > "Index Scan using measured_1_pkey on measured_1 (cost=0.00..5.34 rows=1
    > width=42) (actual time=0.000..111.000 rows=6016 loops=1)"
    > " Index Cond: ((tstamp >= '2004-12-22 13:00:00'::timestamp without time
    > zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    > "Total runtime: 111.000 ms"
    > -- This is quite an acceptable result time
    > " -> Seq Scan on measured_1 (cost=0.00..2539.59 rows=1505 width=42)
    > (actual time=0.000..1292.000 rows=14523 loops=1)"
    > " Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
    > zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    > "Total runtime: 1802.000 ms"
    > -- this is definitely not acceptable.
    Note how the first one expected to return only one value, so an index
    in a good choice. The second expected 1500 matches, so it decided a seq
    scan would be faster.

    Maybe you need to review your value for random_page_cost and maybe an
    ANALYZE would be good too... Look at the effects of enabling and
    disabling index scans.

    Hope this helps,
    --
    Martijn van Oosterhout <kleptog@svana.org> [url]http://svana.org/kleptog/[/url]
    > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
    > tool for doing 5% of the work and then sitting around waiting for someone
    > else to do the other 95% so you can sue them.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.0.6 (GNU/Linux)
    Comment: For info see [url]http://www.gnupg.org[/url]

    iD8DBQFByan0Y5Twig3Ge+YRAv1FAJ4jYQ9SK1V2zK+ZhZAJ2X wFmKJV3gCgryNC
    a4fTU5ZQ0EAOuT++FMs+ics=
    =oADR
    -----END PGP SIGNATURE-----

    Martijn van Oosterhout Guest

  5. #4

    Default Re: Strange Index behavior

    Hi Tom,
    may be it was misunderstandable...

    I meant that there is no rows with tstamp='2004.12.22 00:00'. Certainly
    there are rows matching the range (00:00 - 23:59); from 13:00 to 23:00, but
    not before 13:00.


    The only difference betwen the two queries is this value. If the begining of
    the range is a non existent value, the index is not used.
    The point is that there are cases where a primary key index is not used -
    even if the condition is formaly good. I was wonder what can be the reason
    of this. What can I do in a different way to avoid this.

    Thank you.

    bye,
    -- Csaba



    -----Original Message-----
    From: [email]pgsql-general-owner@postgresql.org[/email]
    [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Lane
    Sent: Wednesday, December 22, 2004 5:47 PM
    To: Együd Csaba
    Cc: [email]pgsql-general@postgresql.org[/email]
    Subject: Re: [GENERAL] Strange Index behavior

    =?iso-8859-2?Q?Egy=FCd_Csaba?= <csegyud@vnet.hu> writes:
    > # explain analyze select meterid, tstamp, pp, pm, status from
    > measured_1 where tstamp >= '2004.12.22 00:00' and tstamp <=
    > '2004.12.22 23:59' order by tstamp, meterid; "Sort
    > (cost=2619.02..2622.78 rows=1505 width=42) (actual
    > time=1672.000..1682.000 rows=14523 loops=1)"
    > " Sort Key: tstamp, meterid"
    > " -> Seq Scan on measured_1 (cost=0.00..2539.59 rows=1505 width=42)
    > (actual time=0.000..1292.000 rows=14523 loops=1)"
    > " Filter: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
    > zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    > "Total runtime: 1802.000 ms"
    Why do you say there are no such rows, when the explain output clearly shows
    there are 14523 of them? The plan shift looks fairly reasonable to me given
    the large number of rows to be retrieved.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 6: Have you searched our list archives?

    [url]http://archives.postgresql.org[/url]



    --
    No virus found in this incoming message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    --
    No virus found in this incoming message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    ---------------------------(end of broadcast)---------------------------
    TIP 7: don't forget to increase your free space map settings

    Együd Csaba Guest

  6. #5

    Default Re: Strange Index behavior

    =?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
    > The point is that there are cases where a primary key index is not used -
    > even if the condition is formaly good.
    You haven't actually shown us such a case. In the case you gave,
    I think the planner probably made the right choice not to use the index.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [email]majordomo@postgresql.org[/email])

    Tom Lane Guest

  7. #6

    Default Re: Strange Index behavior

    But why? I thought the planner is for choose the quicker way to the target
    point. If there is an index which is probably would speed up the query then
    why does the planner ignore that?

    The difference between the result times is 16x. I can't understand why the
    planner thinks it is the better way...

    -- Csaba

    -----Original Message-----
    From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    Sent: Wednesday, December 22, 2004 6:34 PM
    To: Együd Csaba (Freemail)
    Cc: 'Együd Csaba'; [email]pgsql-general@postgresql.org[/email]
    Subject: Re: [GENERAL] Strange Index behavior

    =?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
    > The point is that there are cases where a primary key index is not
    > used - even if the condition is formaly good.
    You haven't actually shown us such a case. In the case you gave, I think
    the planner probably made the right choice not to use the index.

    regards, tom lane



    --
    No virus found in this incoming message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    --
    No virus found in this incoming message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to [email]majordomo@postgresql.org[/email]

    Együd Csaba Guest

  8. #7

    Default Re: Strange Index behavior

    =?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
    > The difference between the result times is 16x. I can't understand why the
    > planner thinks it is the better way...
    The planner thinks that because it thinks the second query will fetch
    1500 times as many rows as the first. Now that I look more carefully
    I see that both of those row estimates are off, the first by a lot.
    Have you ANALYZEd this table lately? If you have, it might be a good
    idea to increase the statistics target for the tstamp column.

    You might also need to play around with random_page_cost, but it will be
    counterproductive to touch that until the rowcount estimates are in the
    vicinity of reality.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 6: Have you searched our list archives?

    [url]http://archives.postgresql.org[/url]

    Tom Lane Guest

  9. #8

    Default Re: Strange Index behavior

    On Wed, Dec 22, 2004 at 18:44:00 +0100,
    "Együd Csaba (Freemail)" <csegyud@freemail.hu> wrote:
    > But why? I thought the planner is for choose the quicker way to the target
    > point. If there is an index which is probably would speed up the query then
    > why does the planner ignore that?
    Because doing an index scan is slower if you are retrieving a significant
    (very roughly 5%) fraction of a table.

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to [email]majordomo@postgresql.org[/email] so that your
    message can get through to the mailing list cleanly

    Bruno Wolff III Guest

  10. #9

    Default Re: Strange Index behavior

    Yes, I vacuum analyze it once in every hour.

    Thank you for the suggestions, now I'm getting understand the point...

    With default_statistics_target=500 and random_page_cost=0.1 and reanalyzing
    the db the planner always uses the index. Certainly if there were more rows
    in the table the planner somewhere would switch to seq scan.


    Some exlpain results:
    ------------------------
    #explain analyze select tstamp, pp, pm, status from measured_1 where tstamp
    >= '2004.12.22 00:00' and tstamp <= '2004.12.22 23:59' order by tstamp;
    "Index Scan using measured_1_pkey on measured_1 (cost=0.00..2258.91
    rows=26769 width=38) (actual time=0.000..271.000 rows=26743 loops=1)"
    " Index Cond: ((tstamp >= '2004-12-22 00:00:00'::timestamp without time
    zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    "Total runtime: 361.000 ms"

    #explain analyze select tstamp, pp, pm, status from measured_1 where tstamp
    >= '2004.12.22 13:00' and tstamp <= '2004.12.22 23:59' order by tstamp;
    "Index Scan using measured_1_pkey on measured_1 (cost=0.00..1560.35
    rows=18377 width=38) (actual time=0.000..140.000 rows=18424 loops=1)"
    " Index Cond: ((tstamp >= '2004-12-22 13:00:00'::timestamp without time
    zone) AND (tstamp <= '2004-12-22 23:59:00'::timestamp without time zone))"
    "Total runtime: 200.000 ms"


    Tom and Martijn,
    many thanks.

    -- Csaba

    -----Original Message-----
    From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
    Sent: Wednesday, December 22, 2004 7:01 PM
    To: Együd Csaba (Freemail)
    Cc: 'Együd Csaba'; [email]pgsql-general@postgresql.org[/email]
    Subject: Re: [GENERAL] Strange Index behavior

    =?iso-8859-2?Q?Egy=FCd_Csaba_=28Freemail=29?= <csegyud@freemail.hu> writes:
    > The difference between the result times is 16x. I can't understand why
    > the planner thinks it is the better way...
    The planner thinks that because it thinks the second query will fetch 1500
    times as many rows as the first. Now that I look more carefully I see that
    both of those row estimates are off, the first by a lot.
    Have you ANALYZEd this table lately? If you have, it might be a good idea
    to increase the statistics target for the tstamp column.

    You might also need to play around with random_page_cost, but it will be
    counterproductive to touch that until the rowcount estimates are in the
    vicinity of reality.

    regards, tom lane


    --
    No virus found in this incoming message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    --
    No virus found in this incoming message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.




    --
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.296 / Virus Database: 265.6.3 - Release Date: 2004.12.21.


    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Együd Csaba Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139