Professional Web Applications Themes

simple SQL select optimisation problem for sql wizzard ;-) - MySQL

Hello everybody, here is a simple SQL problem. Imagine a table which describes classes of students. -- -- Structure of the table `students` -- CREATE TABLE `students` ( `IdStudent` int(10) unsigned NOT NULL auto_increment, `IdClass` int(10) unsigned NOT NULL default '0', `FirstName` char(30) NOT NULL default '', PRIMARY KEY (`IdStudent`) ) ENGINE=MyISAM DEFAULT CHT=latin1 AUTO_INCREMENT=11 ; -- -- Content of the table `students` -- INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (1, 1, 'Pierre'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (2, 1, 'Paul'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (3, 1, 'Jacques'); INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) ...

  1. #1

    Default simple SQL select optimisation problem for sql wizzard ;-)

    Hello everybody,

    here is a simple SQL problem.

    Imagine a table which describes classes of students.

    --
    -- Structure of the table `students`
    --

    CREATE TABLE `students` (
    `IdStudent` int(10) unsigned NOT NULL auto_increment,
    `IdClass` int(10) unsigned NOT NULL default '0',
    `FirstName` char(30) NOT NULL default '',
    PRIMARY KEY (`IdStudent`)
    ) ENGINE=MyISAM DEFAULT CHT=latin1 AUTO_INCREMENT=11 ;

    --
    -- Content of the table `students`
    --

    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (1, 1,
    'Pierre');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (2, 1,
    'Paul');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (3, 1,
    'Jacques');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (4, 2,
    'Yann');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (5, 3,
    'Damien');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (6, 3,
    'Sam');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (7, 3,
    'George');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (8, 3,
    'Robert');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (9, 4,
    'Michel');
    INSERT INTO `students` (`IdStudent`, `IdClass`, `FirstName`) VALUES (10, 4,
    'Toto');

    The goal is to get by the simplest and fastest way a maximum of N distinct
    students of each class.

    For N=1, a simple solution is : SELECT * FROM students GROUP BY IdClass

    IdStudent IdClass FirstName
    1 1 Pierre
    4 2 Yann
    5 3 Damien
    9 4 Michel


    For N=2 with the same data, it would give :

    IdStudent IdClass FirstName
    1 1 Pierre
    2 1 Paul
    4 2 Yann
    5 3 Damien
    6 3 Sam
    9 4 Michel
    10 4 Toto


    For N=3, we would get:

    IdStudent IdClass FirstName
    1 1 Pierre
    2 1 Paul
    3 1 Jacques
    4 2 Yann
    5 3 Damien
    6 3 Sam
    7 3 George
    9 4 Michel
    10 4 Toto


    Question :

    What are the best solutions for N between 1 and 10?

    I would like to avoid to do:

    SELECT DISTINCT IdClass FROM student ;
    For each IdClass "id" found do the following query:
    SELECT * FROM student WHERE IdClass="id" LIMIT N ;

    Thank you in advance for your help.

    Paul.


    paul Guest

  2. #2

    Default Re: simple SQL select optimisation problem for sql wizzard ;-)

    "paul" <zerawarezfree.fr> wrote in message
    news:440c73bd$0$27055$626a54cenews.free.fr...
    > The goal is to get by the simplest and fastest way a maximum of N distinct
    > students of each class.
    Note that using a single SQL query to generate the result set is not always
    the best or fastest solution to every problem. Sometimes it's better to use
    a procedural language to issue several queries to generate the full result
    set. It's certainly more straightforward to most programmers, thus it is
    easier to find someone who can implement and maintain the software.

    The "Top-N per group" problem is tricky. Oracle has a pseudocolumn called
    ROWNUM (which is offensive to relational purists) that can be used to
    achieve this. MS SQL Server has its "SELECT TOP n ..." proprietary
    extension to the SQL language.

    MySQL has neither of these extensions, so there's a classic
    correlated-subquery method:

    SELECT s.*
    FROM students AS s
    WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE
    s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent)

    Note that you didn't state whether you wanted any specific top 2 students
    per class, so I'm assuming it's based on IdStudent.

    Regards,
    Bill K.


    Bill Karwin Guest

  3. #3

    Default Re: simple SQL select optimisation problem for sql wizzard ;-)


    "Bill Karwin" <billkarwin.com> wrote in message
    news:dui51h0s2denews3.newsguy.com...
    > "paul" <zerawarezfree.fr> wrote in message
    > news:440c73bd$0$27055$626a54cenews.free.fr...
    > > The goal is to get by the simplest and fastest way a maximum of N
    distinct
    > > students of each class.
    >
    > Note that using a single SQL query to generate the result set is not
    always
    > the best or fastest solution to every problem. Sometimes it's better to
    use
    > a procedural language to issue several queries to generate the full result
    > set. It's certainly more straightforward to most programmers, thus it is
    > easier to find someone who can implement and maintain the software.

    You are right, but in my case, I want to avoid to do "Number Of Class"+1
    queries to get my result set.

    SELECT DISTINCT IdClass FROM student ;
    For each IdClass "id" found do the following query:
    SELECT * FROM student WHERE IdClass="id" LIMIT N ;


    > The "Top-N per group" problem is tricky. Oracle has a pseudocolumn called
    > ROWNUM (which is offensive to relational purists) that can be used to
    > achieve this. MS SQL Server has its "SELECT TOP n ..." proprietary
    > extension to the SQL language.
    You are right (again ;), I did not know this ROWNUM functionnality. Thank
    you.
    There is a work around with user variables in SQL to simultate Rownum.

    SET a=0;
    SELECT a:=a+1 as rownum, s.* from students;

    But my need is a little more specific. I would need an array of user
    variables, but it does not seem possible to have this in MySQL.
    So I would be able to write something like:

    SELECT s.IdStudent, s.IdClass, s.FirstName, myArray[ s.IdClass ]:=
    myArray[ s.IdClass ] + 1
    FROM student s
    WHERE myArray[ s.IdClass ] < N

    Do you see a way to write this?

    > MySQL has neither of these extensions, so there's a classic
    > correlated-subquery method:
    >
    > SELECT s.*
    > FROM students AS s
    > WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE
    > s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent)
    >
    > Note that you didn't state whether you wanted any specific top 2 students
    > per class, so I'm assuming it's based on IdStudent.
    Yes, congratulation, It works! You are a wizzard! :) I did not know such
    conditions in subqueries were possible.
    Indeed, I don't mind which students are returned in each class.

    But, in terms of efficiency, the subselect of your query seems to be
    evaluated for each row in the table, isn't it?
    So I fear this solution is even worse in terms of speed and load than the
    one I wanted to avoid...

    SELECT DISTINCT IdClass FROM student ;
    For each IdClass "id" found do the following query:
    SELECT * FROM student WHERE IdClass="id" LIMIT N ;

    What do you think?


    Thanks a lot for your help!
    Best regards,

    paul.


    paul Guest

  4. #4

    Default Re: simple SQL select optimisation problem for sql wizzard ;-)

    "paul" <zerawarezfree.fr> wrote in message
    news:440d9a02$0$12856$626a54cenews.free.fr...
    >> SELECT s.*
    >> FROM students AS s
    >> WHERE 2 >= (SELECT COUNT(DISTINCT IdStudent) FROM students AS s2 WHERE
    >> s.IdClass = s2.IdClass AND s2.IdStudent >= s.IdStudent)
    >
    > Yes, congratulation, It works! You are a wizzard! :) I did not know such
    > conditions in subqueries were possible.
    Yep! That's called a correlated subquery.
    > But, in terms of efficiency, the subselect of your query seems to be
    > evaluated for each row in the table, isn't it?
    Right. A non-correlated subquery can be factored out the subquery, executed
    once, and its results used repeatedly, for the rows of the outer query. But
    a correlated subquery must be evaluated for each matching row in the outer
    query. Make sure the relevant columns (IdClass and IdStudent) are indexed
    and there's a good chance of being pretty efficient.
    > So I fear this solution is even worse in terms of speed and load than the
    > one I wanted to avoid...
    Not necessarily; try both methods, do some timing measurements, and see
    which one works best.

    And I agree with Peter Coffin's point. Don't get too hung up on optimizing
    every query. Don't spend days working on a query that will be run
    infrequently and with no strong requirement for speed.

    Regards,
    Bill K.


    Bill Karwin Guest

  5. #5

    Default Re: simple SQL select optimisation problem for sql wizzard ;-)

    > Why? I mean, sure, if you're going to be running this job several times
    > per minute, then some attention may be necessary, but it looks like
    > you're running a class list. Who cares if it takes 30 second and 200
    > queries instead of four seconds and one query, when the job's only going
    > to run a couple of times per year?
    The problem I am exposing is conceptually the same as the one I am facing in
    my developpements. I reassure you, it has nothing to do with students or
    classes... It was just a way to make the problem look easier ;)
    Thanks.

    paul.


    paul Guest

  6. #6

    Default Re: simple SQL select optimisation problem for sql wizzard ;-)


    "Bill Karwin" <billkarwin.com> wrote in message
    news:dukimd0id0enews1.newsguy.com...
    > And I agree with Peter Coffin's point. Don't get too hung up on
    optimizing
    > every query. Don't spend days working on a query that will be run
    > infrequently and with no strong requirement for speed.
    Unfortunately, I do have strong requirement for speed. And this query might
    be triggered often ;)
    Do you have an idea of work around to implement a dynamic array of MySQL
    user variables as described here below?

    There is a work around with user variables in MySQL to simultate Rownum.

    SET a=0;
    SELECT a:=a+1 as rownum, s.* from students;

    But my need is a little more specific. I would need an array of user
    variables, but it does not seem possible to have this in MySQL.
    So I would be able to write something like:

    SELECT s.IdStudent, s.IdClass, s.FirstName, myArray[ s.IdClass ]:=
    myArray[ s.IdClass ] + 1
    FROM student s
    WHERE myArray[ s.IdClass ] < N

    Do you see a way to write this?

    Thanks a lot for your kind help.
    Regards,

    paul.


    paul Guest

  7. #7

    Default Re: simple SQL select optimisation problem for sql wizzard ;-)

    "paul" <zerawarezfree.fr> wrote in message
    news:440eb90b$0$11675$626a54cenews.free.fr...
    > There is a work around with user variables in MySQL to simultate Rownum.
    > Do you have an idea of work around to implement a dynamic array of
    > MySQL user variables ...?
    There's often a way to use pre-initialized data to achieve performance.

    You could create a new column in the table, and make sure it contains the
    ordinal number of the student in his/her class; this is the equivalent of a
    rownum per group.

    ALTER TABLE students ADD COLUMN ordinal INTEGER UNSIGNED NOT NULL;
    SET a = 0; UPDATE students SET ordinal = a := a + 1 WHERE IdClass = 1;
    SET a = 0; UPDATE students SET ordinal = a := a + 1 WHERE IdClass = 2;
    SET a = 0; UPDATE students SET ordinal = a := a + 1 WHERE IdClass = 3;
    SET a = 0; UPDATE students SET ordinal = a := a + 1 WHERE IdClass = 4;
    ... repeat for each class to initialize the ordinals ...

    Now you can get the top N per group very easily:

    SELECT * FROM students WHERE ordinal <= 2;

    Update this sequence of numbers anytime you insert or delete records from a
    group.
    For instance, after adding or removing a student in class 3, re-initialize
    the ordinals for that class:

    SET a = 0; UPDATE students SET ordinal = a := a + 1 WHERE IdClass = 3;

    Regards,
    Bill K.


    Bill Karwin Guest

  8. #8

    Default Re: simple SQL select optimisation problem for sql wizzard ;-)


    "Bill Karwin" <billkarwin.com> wrote in message
    news:dung3k01m9genews4.newsguy.com...
    > There's often a way to use pre-initialized data to achieve performance.
    > [...]
    Your solution is great. It fits well to my needs. Thanks a lot Bill.
    You are very good at finding solutions! Congratulation!
    Best regards,

    paul.



    paul Guest

Similar Threads

  1. Combing two simple select statements. There must be a way?
    By ukr_bend@yahoo.com in forum MySQL
    Replies: 5
    Last Post: March 24th, 06:10 PM
  2. DateTime search optimisation
    By Salagir in forum MySQL
    Replies: 2
    Last Post: February 14th, 08:31 AM
  3. functions - optimisation
    By hairybobby webforumsuser@macromedia.com in forum Macromedia Director Lingo
    Replies: 3
    Last Post: July 21st, 09:28 AM
  4. A simple problem with a really simple form
    By Simon Harvey in forum Macromedia Dreamweaver
    Replies: 0
    Last Post: July 13th, 04:34 PM
  5. Replies: 0
    Last Post: July 13th, 11:41 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139