Professional Web Applications Themes

Large Dataset Question - MySQL

Hi all. I have a decent sized dataset (300,000 entries, give or take) that I am trying to use in a web page I am coding. In a nutshell: I have a website that displays information about various makes of cars, including the highest selling make for a given year. In one table (called 'name'), I have the name of the car, with columns linked to a maker table, engine, etc. In another table (called 'rank'), I have the sales ranking for all cars, with columns 'name' (linked back to the 'name' table above), year and sales rank. Finally, this ...

  1. #1

    Default Large Dataset Question

    Hi all.

    I have a decent sized dataset (300,000 entries, give or take) that I am
    trying to use in a web page I am coding.

    In a nutshell: I have a website that displays information about various
    makes of cars, including the highest selling make for a given year.

    In one table (called 'name'), I have the name of the car, with columns
    linked to a maker table, engine, etc.

    In another table (called 'rank'), I have the sales ranking for all cars,
    with columns 'name' (linked back to the 'name' table above), year and sales
    rank. Finally, this table has data for the top 100 selling cars of each
    year.

    I can easily pull the top 100 ranked cars out for a given year and do an
    inside join to retrieve the relative name, maker, etc. The problem is that
    it leaves out any names of cars that did not have a ranking for that year,
    which is what I want to ultimately do (i.e. the 'top cars' for 2004 would be
    ranked first through 100, then all other cars not ranked for 2004 would be
    listed below this in the query results). A left join would work if the
    ranking table contained only rankings for a given year, but as they contain
    rankings for all years, in my left joined query I still need to specify a
    where clause item that says "where year=2004", which of course then excludes
    those cars that have no entry in the ranking table for 2004.

    Any ideas how I could solve this issue with displaying all cars, ordered by
    rank where only a small percentage of the cars have a rank, and all the
    ranks are in one shared table? Should I break the rank table into multiple
    tables which only have entries by year (i.e. rank_2004, rank_2003)? I
    believe a left join would work in this case, but I would need to add several
    dozen tables, and this solution does not seem very elegant. Sub-queries are
    not an option either, as I am using MYSQL 4.0.24, which does not support
    them. And I would prefer to do this with a single query, if possible.

    I have been trying to solve this for a few days, and am stumped at this
    point.

    Any suggestions are greatly appreciated!

    GS


    GS Guest

  2. #2

    Default Re: Large Dataset Question


    GS wrote:
    > Hi all.
    >
    > I have a decent sized dataset (300,000 entries, give or take) that I am
    > trying to use in a web page I am coding.
    >
    > In a nutshell: I have a website that displays information about various
    > makes of cars, including the highest selling make for a given year.
    >
    > In one table (called 'name'), I have the name of the car, with columns
    > linked to a maker table, engine, etc.
    >
    > In another table (called 'rank'), I have the sales ranking for all cars,
    > with columns 'name' (linked back to the 'name' table above), year and sales
    > rank. Finally, this table has data for the top 100 selling cars of each
    > year.
    >
    > I can easily pull the top 100 ranked cars out for a given year and do an
    > inside join to retrieve the relative name, maker, etc. The problem is that
    > it leaves out any names of cars that did not have a ranking for that year,
    > which is what I want to ultimately do (i.e. the 'top cars' for 2004 would be
    > ranked first through 100, then all other cars not ranked for 2004 would be
    > listed below this in the query results). A left join would work if the
    > ranking table contained only rankings for a given year, but as they contain
    > rankings for all years, in my left joined query I still need to specify a
    > where clause item that says "where year=2004", which of course then excludes
    > those cars that have no entry in the ranking table for 2004.
    >
    > Any ideas how I could solve this issue with displaying all cars, ordered by
    > rank where only a small percentage of the cars have a rank, and all the
    > ranks are in one shared table? Should I break the rank table into multiple
    > tables which only have entries by year (i.e. rank_2004, rank_2003)? I
    > believe a left join would work in this case, but I would need to add several
    > dozen tables, and this solution does not seem very elegant. Sub-queries are
    > not an option either, as I am using MYSQL 4.0.24, which does not support
    > them. And I would prefer to do this with a single query, if possible.
    >
    > I have been trying to solve this for a few days, and am stumped at this
    > point.
    >
    > Any suggestions are greatly appreciated!
    >
    > GS
    So, just so I understand;

    At the moment unranked cars will have a NULL rating - and therefore
    appear at the top of the list?

    strawberry Guest

  3. #3

    Default Re: Large Dataset Question


    strawberry wrote:
    > GS wrote:
    > > Hi all.
    > >
    > > I have a decent sized dataset (300,000 entries, give or take) that I am
    > > trying to use in a web page I am coding.
    > >
    > > In a nutshell: I have a website that displays information about various
    > > makes of cars, including the highest selling make for a given year.
    > >
    > > In one table (called 'name'), I have the name of the car, with columns
    > > linked to a maker table, engine, etc.
    > >
    > > In another table (called 'rank'), I have the sales ranking for all cars,
    > > with columns 'name' (linked back to the 'name' table above), year and sales
    > > rank. Finally, this table has data for the top 100 selling cars of each
    > > year.
    > >
    > > I can easily pull the top 100 ranked cars out for a given year and do an
    > > inside join to retrieve the relative name, maker, etc. The problem is that
    > > it leaves out any names of cars that did not have a ranking for that year,
    > > which is what I want to ultimately do (i.e. the 'top cars' for 2004 would be
    > > ranked first through 100, then all other cars not ranked for 2004 would be
    > > listed below this in the query results). A left join would work if the
    > > ranking table contained only rankings for a given year, but as they contain
    > > rankings for all years, in my left joined query I still need to specify a
    > > where clause item that says "where year=2004", which of course then excludes
    > > those cars that have no entry in the ranking table for 2004.
    > >
    > > Any ideas how I could solve this issue with displaying all cars, ordered by
    > > rank where only a small percentage of the cars have a rank, and all the
    > > ranks are in one shared table? Should I break the rank table into multiple
    > > tables which only have entries by year (i.e. rank_2004, rank_2003)? I
    > > believe a left join would work in this case, but I would need to add several
    > > dozen tables, and this solution does not seem very elegant. Sub-queries are
    > > not an option either, as I am using MYSQL 4.0.24, which does not support
    > > them. And I would prefer to do this with a single query, if possible.
    > >
    > > I have been trying to solve this for a few days, and am stumped at this
    > > point.
    > >
    > > Any suggestions are greatly appreciated!
    > >
    > > GS
    >
    > So, just so I understand;
    >
    > At the moment unranked cars will have a NULL rating - and therefore
    > appear at the top of the list?
    scrub my last post

    Not sure I understand what you're after - or the structure of the
    tables. That's quite a 'nutshell'!

    But, *if* i do understand it correctly then maybe something like this
    will work:

    SELECT n.name,r.year,r.rank,
    IF (r.rank IS NULL or r.rank='', 1, 0)
    AS isnull
    FROM rank r WHERE r.year = 2004
    LEFT JOIN name n USING name
    ORDER BY isnull`ASC,rank DESC;

    strawberry Guest

  4. #4

    Default Re: Large Dataset Question

    GS wrote:
    > A left join would work if the
    > ranking table contained only rankings for a given year, but as they contain
    > rankings for all years, in my left joined query I still need to specify a
    > where clause item that says "where year=2004", which of course then excludes
    > those cars that have no entry in the ranking table for 2004.
    Put the year condition into the ON clause in the LEFT JOIN. That will
    apply the condition to the "outer" table in the join before the join is
    made.

    SELECT ...
    FROM `name` n LEFT JOIN rank r
    ON n.name = r.name AND r.year = 2004;

    Regards,
    Bill K.
    Bill Karwin Guest

  5. #5

    Default Re: Large Dataset Question

    "Bill Karwin" <billkarwin.com> wrote in message
    news:ebtfd703edenews1.newsguy.com...
    > GS wrote:
    >> A left join would work if the ranking table contained only rankings for a
    >> given year, but as they contain rankings for all years, in my left joined
    >> query I still need to specify a where clause item that says "where
    >> year=2004", which of course then excludes those cars that have no entry
    >> in the ranking table for 2004.
    >
    > Put the year condition into the ON clause in the LEFT JOIN. That will
    > apply the condition to the "outer" table in the join before the join is
    > made.
    >
    > SELECT ...
    > FROM `name` n LEFT JOIN rank r
    > ON n.name = r.name AND r.year = 2004;
    >
    > Regards,
    > Bill K.
    Bingo! This looks like it will work! I will test it tonight and let you
    know.

    Thank you.


    GS Guest

  6. #6

    Default Re: Large Dataset Question

    "strawberry" <zac.carey> wrote in message
    news:1155678185.348652.153800m79g2000cwm.googlegr oups.com...
    >
    > strawberry wrote:
    >> GS wrote:
    >> > Hi all.
    >> >
    >> > I have a decent sized dataset (300,000 entries, give or take) that I am
    >> > trying to use in a web page I am coding.
    >> >
    >> > In a nutshell: I have a website that displays information about various
    >> > makes of cars, including the highest selling make for a given year.
    >> >
    >> > In one table (called 'name'), I have the name of the car, with columns
    >> > linked to a maker table, engine, etc.
    >> >
    >> > In another table (called 'rank'), I have the sales ranking for all
    >> > cars,
    >> > with columns 'name' (linked back to the 'name' table above), year and
    >> > sales
    >> > rank. Finally, this table has data for the top 100 selling cars of
    >> > each
    >> > year.
    >> >
    >> > I can easily pull the top 100 ranked cars out for a given year and do
    >> > an
    >> > inside join to retrieve the relative name, maker, etc. The problem is
    >> > that
    >> > it leaves out any names of cars that did not have a ranking for that
    >> > year,
    >> > which is what I want to ultimately do (i.e. the 'top cars' for 2004
    >> > would be
    >> > ranked first through 100, then all other cars not ranked for 2004 would
    >> > be
    >> > listed below this in the query results). A left join would work if the
    >> > ranking table contained only rankings for a given year, but as they
    >> > contain
    >> > rankings for all years, in my left joined query I still need to specify
    >> > a
    >> > where clause item that says "where year=2004", which of course then
    >> > excludes
    >> > those cars that have no entry in the ranking table for 2004.
    >> >
    >> > Any ideas how I could solve this issue with displaying all cars,
    >> > ordered by
    >> > rank where only a small percentage of the cars have a rank, and all the
    >> > ranks are in one shared table? Should I break the rank table into
    >> > multiple
    >> > tables which only have entries by year (i.e. rank_2004, rank_2003)? I
    >> > believe a left join would work in this case, but I would need to add
    >> > several
    >> > dozen tables, and this solution does not seem very elegant.
    >> > Sub-queries are
    >> > not an option either, as I am using MYSQL 4.0.24, which does not
    >> > support
    >> > them. And I would prefer to do this with a single query, if possible.
    >> >
    >> > I have been trying to solve this for a few days, and am stumped at this
    >> > point.
    >> >
    >> > Any suggestions are greatly appreciated!
    >> >
    >> > GS
    >>
    >> So, just so I understand;
    >>
    >> At the moment unranked cars will have a NULL rating - and therefore
    >> appear at the top of the list?
    >
    > scrub my last post
    >
    > Not sure I understand what you're after - or the structure of the
    > tables. That's quite a 'nutshell'!
    >
    > But, *if* i do understand it correctly then maybe something like this
    > will work:
    >
    > SELECT n.name,r.year,r.rank,
    > IF (r.rank IS NULL or r.rank='', 1, 0)
    > AS isnull
    > FROM rank r WHERE r.year = 2004
    > LEFT JOIN name n USING name
    > ORDER BY isnull`ASC,rank DESC;
    >
    Sorry for the confusion.... it is actually a more complicated problem that
    I am trying to distill down to something manageable.

    Let me give this a whirl and get back to you tonight. Thanks for the
    thought!


    GS Guest

  7. #7

    Default Re: Large Dataset Question

    >> SELECT n.name,r.year,r.rank,
    >> IF (r.rank IS NULL or r.rank='', 1, 0)
    >> AS isnull
    >> FROM rank r WHERE r.year = 2004
    >> LEFT JOIN name n USING name
    >> ORDER BY isnull`ASC,rank DESC;
    >>
    >
    > Sorry for the confusion.... it is actually a more complicated problem
    > that I am trying to distill down to something manageable.
    >
    > Let me give this a whirl and get back to you tonight. Thanks for the
    > thought!
    >
    Follow-up, it worked, thanks a million!


    GS Guest

  8. #8

    Default Re: Large Dataset Question

    "GS" <goldstandard> wrote in message
    news:vErEg.7339$%j7.2540newssvr29.news.prodigy.ne t...
    > "Bill Karwin" <billkarwin.com> wrote in message
    > news:ebtfd703edenews1.newsguy.com...
    >> GS wrote:
    >>> A left join would work if the ranking table contained only rankings for
    >>> a given year, but as they contain rankings for all years, in my left
    >>> joined
    >>> query I still need to specify a where clause item that says "where
    >>> year=2004", which of course then excludes those cars that have no entry
    >>> in the ranking table for 2004.
    >>
    >> Put the year condition into the ON clause in the LEFT JOIN. That will
    >> apply the condition to the "outer" table in the join before the join is
    >> made.
    >>
    >> SELECT ...
    >> FROM `name` n LEFT JOIN rank r
    >> ON n.name = r.name AND r.year = 2004;
    >>
    >> Regards,
    >> Bill K.
    >
    > Bingo! This looks like it will work! I will test it tonight and let you
    > know.
    >
    > Thank you.
    >
    Your response in combination with the info from 'strawberry' solved it
    perfectly.

    Thanks!


    GS Guest

Similar Threads

  1. DataSet Change Question
    By B.Prior in forum Macromedia Flash Data Integration
    Replies: 0
    Last Post: August 3rd, 11:04 PM
  2. Large dataset displayed but kills application
    By nkunkov@plusfunds.com in forum ASP.NET Data Grid Control
    Replies: 2
    Last Post: May 30th, 01:39 PM
  3. Typed DataSet Question
    By Calvin Willman in forum Dreamweaver AppDev
    Replies: 0
    Last Post: March 17th, 10:26 AM
  4. Replies: 2
    Last Post: June 1st, 03:43 AM
  5. dataset efficiency question
    By Trevor Hartman in forum ASP.NET General
    Replies: 0
    Last Post: July 3rd, 08:20 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139