Professional Web Applications Themes

'distinct on' and 'order by' conflicts of interest - PostgreSQL / PGSQL

It has come up several times on the various postgresql lists that in order to get around the requirement of DISTINCT ON parameters matching the first ORDER BY parameters, wrap the distinct query in a new 'order by' query: select * from (select distinct on (a) a,b,c from foo order by a) order by c however, this will not work when ordering by 'a' will put the wrong record first, making it choose the wrong record via distinct on. The 'order by c' superquery no longer has the correct recordset to sort. I cannot figure out how to have postgresql ...

  1. #1

    Default 'distinct on' and 'order by' conflicts of interest

    It has come up several times on the various postgresql lists that in order
    to get around the requirement of DISTINCT ON parameters matching the first
    ORDER BY parameters, wrap the distinct query in a new 'order by' query:

    select * from (select distinct on (a) a,b,c from foo order by a) order by c

    however, this will not work when ordering by 'a' will put the wrong record
    first, making it choose the wrong record via distinct on. The 'order by
    c' superquery no longer has the correct recordset to sort.

    I cannot figure out how to have postgresql first sort the results (on
    something other than the 'distinct on' parameters) and then do a recordset
    culling by only part of the record. Does anyone have the syntax for this?

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to [email]majordomopostgresql.org[/email]

    stephen@thunkit.com Guest

  2. #2

    Default Re: 'distinct on' and 'order by' conflicts of interest

    On Fri, Dec 31, 2004 at 10:48:21AM -0600, [email]stephenthunkit.com[/email] wrote:
    > It has come up several times on the various postgresql lists that in order
    > to get around the requirement of DISTINCT ON parameters matching the first
    > ORDER BY parameters, wrap the distinct query in a new 'order by' query:
    >
    > select * from (select distinct on (a) a,b,c from foo order by a) order by c
    >
    > however, this will not work when ordering by 'a' will put the wrong record
    > first, making it choose the wrong record via distinct on. The 'order by
    > c' superquery no longer has the correct recordset to sort.
    You should be ordering by more than just "a" in the DISTINCT ON
    query. SELECT DISTINCT ON (a) gives you the first row for each "a"
    value; if you don't specify an order for additional fields then
    you'll get some arbitrary row. See the weather_reports example in
    the doentation for SELECT in the "SQL Commands" part of the
    doentation.
    > I cannot figure out how to have postgresql first sort the results (on
    > something other than the 'distinct on' parameters) and then do a recordset
    > culling by only part of the record. Does anyone have the syntax for this?
    Define the problem, not how you think it should be solved. What
    are you trying to do? If you can't get the query to work, then
    please post SQL statements to create and populate a table and
    describe the query results you'd like to see.

    --
    Michael Fuhr
    [url]http://www.fuhr.org/~mfuhr/[/url]

    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain yze is your friend

    Michael Fuhr Guest

  3. #3

    Default Re: 'distinct on' and 'order by' conflicts of interest


    On Fri, 31 Dec 2004 [email]stephenthunkit.com[/email] wrote:
    > It has come up several times on the various postgresql lists that in order
    > to get around the requirement of DISTINCT ON parameters matching the first
    > ORDER BY parameters, wrap the distinct query in a new 'order by' query:
    >
    > select * from (select distinct on (a) a,b,c from foo order by a) order by c
    >
    > however, this will not work when ordering by 'a' will put the wrong record
    > first, making it choose the wrong record via distinct on. The 'order by
    > c' superquery no longer has the correct recordset to sort.
    I'm not entirely sure from the above which record you want to get from the
    distinct on. In general for something like the above, in the subselect you
    order by the distinct on column(s) and then the column(s) which control
    the which row you want (for example order by a,c). Then you can use the
    outer order by to change the ordering to no longer use the distinct on
    columns.

    ---------------------------(end of broadcast)---------------------------
    TIP 7: don't forget to increase your free space map settings

    Stephan Szabo Guest

  4. #4

    Default Re: 'distinct on' and 'order by' conflicts of interest

    > Define the problem, not how you think it should be solved. What
    > are you trying to do? If you can't get the query to work, then
    > please post SQL statements to create and populate a table and
    > describe the query results you'd like to see.
    the situation is i have a set of records in a table (actually, their
    values span multiple tables, but that is not important for this example).
    I then have a second table that allows me to draw n-ary relationships
    between them. so:

    create table nodes (
    id sequence not null,
    property1 varchar(12),
    property2 varchar(9),
    etc....,

    primary key (id)
    );

    create table arcs (
    arcstart integer not null,
    arcend integer not null,
    type varchar(12) not null,
    ordinal integer,

    primary key( arcstart,arcend,type )
    );

    if i have one node loaded, and i want to retrieve nodes related to it via
    the arcs table where the 'type' field is of a certain value, and ordered
    by the 'ordinal' field, i'd do something like this:

    select id,property1,property2,ordinal from nodes,arcs where id=arcstart
    and type='contains' order by ordinal;

    which would be great, except that arc types can have inverses. That is,
    if type is 'contains', there's also a 'contained_by' where the arcstart
    and arcend fields are flipped. This isn't data duplication, because
    depending on which way the arc is drawn between the two nodes, the ordinal
    information is different. So now i have:

    select id,property1,property2,ordinal from nodes,arcs where (id=arcstart
    and type='contains') or (id=arcend and type='contained_by').

    This way I don't have to worry about which end of the arc got defined. As
    long as it's defined from one node's point of view, the other one can find
    it. But, if it's been defined by both nodes, because they needed to both
    specify ordinal information, then I get duplicates. actual results from my
    db:

    select nodes.title, nodes.name, nodes.id, arcs.ordinal, from Nodes,Arcs
    where (Arcs.ArcEnd=Nodes.id and Arcs.ArcStart in ('638') and Arcs.Type=
    'contains') or (Arcs.ArcStart=Nodes.id and Arcs.ArcEnd in ('638') and
    Arcs.type = 'contained_by' ) order by arcs.ordinal

    title | name | id | ordinal
    ------------------+------------------------------+-----+---------
    Recent | Public: Recent Exhibitions | 870 | 0
    Upcoming | Public: Upcoming Exhibitions | 852 | 0
    Current | Public: Current Exhibitions | 802 | 0
    Upcoming | Public: Upcoming Exhibitions | 852 | 1 *
    Recent | Public: Recent Exhibitions | 870 | 2 *
    Hands-on History | Public: Hands-on History | 931 | 3 *
    Current | Public: Current Exhibitions | 802 | 4 *
    Hands-on History | Public: Hands-on History | 931 | 5

    I've put an '*' next to the rows I want. So my dilemma is two part.
    First, I want to sort by the ordinal information only when the arc is
    pointing from the source object (id 638) to the other objects. Well, it's
    pretty easy to determine which arcs are pointing the right way with this
    addition:

    select nodes.title, nodes.name, nodes.id, arcs.ordinal, CASE WHEN
    nodes.id=arcstart THEN '1' ELSE '0' END as direction from Nodes,Arcs where
    (Arcs.ArcEnd=Nodes.id and Arcs.ArcStart in ('638') and Arcs.Type=
    'contains') or (Arcs.ArcStart=Nodes.id and Arcs.ArcEnd in ('638') and
    Arcs.type = 'contained_by' ) order by direction, arcs.ordinal

    title | name | id | ordinal | direction
    ------------------+------------------------------+-----+---------+-----------
    Upcoming | Public: Upcoming Exhibitions | 852 | 1 | 0 *
    Recent | Public: Recent Exhibitions | 870 | 2 | 0 *
    Hands-on History | Public: Hands-on History | 931 | 3 | 0 *
    Current | Public: Current Exhibitions | 802 | 4 | 0 *
    Current | Public: Current Exhibitions | 802 | 0 | 1
    Recent | Public: Recent Exhibitions | 870 | 0 | 1
    Upcoming | Public: Upcoming Exhibitions | 852 | 0 | 1
    Hands-on History | Public: Hands-on History | 931 | 5 | 1

    So i've got a sort order i like, now i just want to use 'distinct on (id)'
    to give me back the first four rows. But that won't work, because it will
    sort by id first, giving me:

    title | name | id | ordinal | direction
    ------------------+------------------------------+-----+---------+-----------
    Current | Public: Current Exhibitions | 802 | 4 | 0 *
    Upcoming | Public: Upcoming Exhibitions | 852 | 1 | 0 *
    Recent | Public: Recent Exhibitions | 870 | 2 | 0 *
    Hands-on History | Public: Hands-on History | 931 | 3 | 0 *

    thus destroying the 'ordinal' field's ability to sort them in the
    direction i wish.

    I hope that clearly defines the problem defined. I'm not able to modify
    the overal db schema because it is central to how the software depending
    on it operates. So i really need a query-level solution for this problem.

    -Stephen

    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    stephen@thunkit.com Guest

  5. #5

    Default Re: 'distinct on' and 'order by' conflicts of interest

    On Fri, Dec 31, 2004 at 15:02:56 -0600,
    [email]stephenthunkit.com[/email] wrote:
    >
    > I've put an '*' next to the rows I want. So my dilemma is two part.
    > First, I want to sort by the ordinal information only when the arc is
    > pointing from the source object (id 638) to the other objects. Well, it's
    > pretty easy to determine which arcs are pointing the right way with this
    > addition:
    >
    > select nodes.title, nodes.name, nodes.id, arcs.ordinal, CASE WHEN
    > nodes.id=arcstart THEN '1' ELSE '0' END as direction from Nodes,Arcs where
    > (Arcs.ArcEnd=Nodes.id and Arcs.ArcStart in ('638') and Arcs.Type=
    > 'contains') or (Arcs.ArcStart=Nodes.id and Arcs.ArcEnd in ('638') and
    > Arcs.type = 'contained_by' ) order by direction, arcs.ordinal
    You want to use this ordering to do the distinct and make it a subselect
    so that you get the output order you want.

    Something like:
    SELECT
    title, name, id, ordinal, direction
    FROM
    (SELECT
    DISTINCT ON (nodes.id)
    nodes.title, nodes.name, nodes.id, arcs.ordinal,
    CASE WHEN nodes.id=arcstart THEN '1' ELSE '0' END as direction
    FROM Nodes, Arcs
    WHERE
    (Arcs.ArcEnd=Nodes.id
    AND Arcs.ArcStart in ('638')
    AND Arcs.Type= 'contains')
    OR
    (Arcs.ArcStart=Nodes.id
    AND Arcs.ArcEnd in ('638')
    AND Arcs.type = 'contained_by')
    ORDER BY nodes.id, direction, arcs.ordinal
    ) AS a
    ORDER BY ordinal

    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Bruno Wolff III Guest

Similar Threads

  1. Using DISTINCT wirh ORDER BY
    By Rustywater in forum Coldfusion - Advanced Techniques
    Replies: 2
    Last Post: July 17th, 06:35 AM
  2. Interest calculation
    By Allan in forum Macromedia Flash
    Replies: 1
    Last Post: August 11th, 06:56 PM
  3. Callback: any interest in this?
    By Kevin Michael Vail in forum PERL Modules
    Replies: 0
    Last Post: September 28th, 02:58 AM
  4. Replies: 3
    Last Post: April 18th, 12:52 PM
  5. Replies: 0
    Last Post: April 15th, 01:22 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139