Professional Web Applications Themes

query optimization - MySQL

i've got a huge table that contains part numbers with matching descriptions. it's a standard price master type of table. i have to use this file to simply tell if a repair has used OEM or NON-OEM parts. if a part used on a repair has a matching record in the price master, then that part is considered an OEM part. there are millions of repairs and millions of parts. i've opted to put a flag (IsOEM INT NULL DEFAULT 0) on the repair table. there is no manual entry of data, only imports into mysql from both repair xml ...

  1. #1

    Default query optimization

    i've got a huge table that contains part numbers with matching descriptions.
    it's a standard price master type of table. i have to use this file to
    simply tell if a repair has used OEM or NON-OEM parts. if a part used on a
    repair has a matching record in the price master, then that part is
    considered an OEM part.

    there are millions of repairs and millions of parts. i've opted to put a
    flag (IsOEM INT NULL DEFAULT 0) on the repair table. there is no manual
    entry of data, only imports into mysql from both repair xml files and
    price master files. i am trying to optimize a query that runs against all
    repairs to set the IsOEM flag when either a repair or price master file
    is imported.

    this query is one that i can write too, however it takes a long damn
    time to run...almost as bad as a cartesian. is there any advice you may
    have that could speed things up? here's my basic update query:

    UPDATE Repair_Order_Lines
    SET IsOEM = 0
    ;
    UPDATE Repair_Order_Lines
    SET IsOEM = 1
    WHERE PartNumber IN
    (
    SELECT DISTINCT
    PartNumber
    FROM priceMaster
    UNION
    SELECT DISTINCT
    PrecedentPartNumber
    FROM priceMaster
    )
    ;

    tia.

    btw, both PartNumber fields and the PrecedentPartNumber field are/is a
    VARCHAR(50).



    Steve Guest

  2. #2

    Default Re: query optimization

    On 2 May, 15:08, "Steve" <com> wrote: 

    Well you seem to have lots of scope to improve on this.
    1) If a repair used an OEM part then it will have used that OEM part
    always. So there is to need to run the query
    UPDATE Repair_Order_Lines
    SET IsOEM = 0
    ;
    which makes all previously located repairs with OEM parts seem as if
    they have not used OEM parts, only for them to be set back to having
    used OEM parts again.

    2) You are using a subselect with a union. Subselects are bad!
    Especially the way you are using them. I would change them to LEFT
    JOINS (with NOT NULL checks - I would use a COALESCE() with this too)

    3) To help support the LEFT JOINS, I would suggest indexes on the
    PartNumber and PrecedentPartNumber columns.

    4) Since you are no longer initialising all the IsOEM fields, you can
    build an index on this column and use a WHERE clause to efficiently
    process only those rows in Repair_Order_Lines where IsOEM = 0, since
    if IsOEM = 1 you don't need to try to set it to 1 again.

    In general it is a good idea to think carefully about what the data
    means and what you are intending to do about it. It's easy to build a
    query to give you a required end result, the real talent is in
    thinking about what is likely to have happened to the data and to
    construct an efficient query doing only the work that is actually
    required.

    Also regarding your last question it should be:
    "btw, both the fields PartNumber and PrecedentPartNumber are
    VARCHAR(50)."
    Or
    "btw, both the PartNumber and PrecedentPartNumber fields are
    VARCHAR(50)."

    Hope this helps.

    Captain Guest

  3. #3

    Default Re: query optimization

    thanks for the comments...read my in-line follow-up, please.

    | Well you seem to have lots of scope to improve on this.

    tis true...i've posted below a newer version.

    | 1) If a repair used an OEM part then it will have used that OEM part
    | always. So there is to need to run the query
    | UPDATE Repair_Order_Lines
    | SET IsOEM = 0
    | ;
    | which makes all previously located repairs with OEM parts seem as if
    | they have not used OEM parts, only for them to be set back to having
    | used OEM parts again.

    makes sense...that's my v-8 moment for the morning.

    | 2) You are using a subselect with a union. Subselects are bad!
    | Especially the way you are using them. I would change them to LEFT
    | JOINS (with NOT NULL checks - I would use a COALESCE() with this too)

    haven't used coalesce yet. the queries i write don't stray too far from
    ansi-sql so that if my boss wants to move to sql server, oracle, terdata,
    etc. then there is little work to do programmatically. anyway, here's my
    newer version:

    UPDATE Repair_Order_Lines l ,
    priceMaster p
    SET l.IsOEM = 1
    WHERE l.PartNumber = p.PartNumber
    OR l.PartNumber = p.PrecedentPartNumber

    i was trying to originally do this with joins but forgot the syntax...hence
    the above. really, if i'm just setting IsOEM to 1, i only want to do that as
    written in the where clause above (appropriately changing WHERE to JOIN)
    then i don't want a left join...a join would be perfect as fewer rows would
    be involved.

    | 3) To help support the LEFT JOINS, I would suggest indexes on the
    | PartNumber and PrecedentPartNumber columns.

    i'll put an index on the table.

    | 4) Since you are no longer initialising all the IsOEM fields, you can
    | build an index on this column and use a WHERE clause to efficiently
    | process only those rows in Repair_Order_Lines where IsOEM = 0, since
    | if IsOEM = 1 you don't need to try to set it to 1 again.

    true enough. i'll put that in the where clause as well. one other change
    that i've made is by incorporating a date/time stamp on the repair table.
    i'm only updating repairs that were just imported. that should help. that
    won't help when i import the price master though.

    anyway, yes, i'll IsOEM = 0.

    | In general it is a good idea to think carefully about what the data
    | means and what you are intending to do about it. It's easy to build a
    | query to give you a required end result, the real talent is in
    | thinking about what is likely to have happened to the data and to
    | construct an efficient query doing only the work that is actually
    | required.

    true enough. that's what i'm going for...just couldn't remember the join
    syntax on the update query - so the two methods posted are my time tests
    until i find a syntactially correct joined update query example. :)

    | Also regarding your last question it should be:
    | "btw, both the fields PartNumber and PrecedentPartNumber are
    | VARCHAR(50)."
    | Or
    | "btw, both the PartNumber and PrecedentPartNumber fields are
    | VARCHAR(50)."

    lol...yes, subject verb agreement. 'both' is the subject denoting a group,
    which requires plurality of being; 'are'. :) up till 3.30 with an early
    start of 6.00...i'll be lenient on myself for now.

    cheers...and thanks for the suggestions.


    Steve Guest

  4. #4

    Default Re: query optimization

    Steve wrote: 
    if you take the union away from your sql, i think it'll run a lot
    faster. you can create two stored procedures each with one table
    referenced in your union statement and then have them run right after
    your imports.

    --
    lark -- net
    To reply to me directly, delete "despam".
    lark Guest

  5. #5

    Default Re: query optimization

    | > UPDATE Repair_Order_Lines
    | > SET IsOEM = 0
    | > ;
    | > UPDATE Repair_Order_Lines
    | > SET IsOEM = 1
    | > WHERE PartNumber IN
    | > (
    | > SELECT DISTINCT
    | > PartNumber
    | > FROM priceMaster
    | > UNION
    | > SELECT DISTINCT
    | > PrecedentPartNumber
    | > FROM priceMaster
    | > )
    | > ;
    | >
    | > tia.
    | >
    | > btw, both PartNumber fields and the PrecedentPartNumber field are/is a
    | > VARCHAR(50).
    | >
    | >
    | >
    | if you take the union away from your sql, i think it'll run a lot
    | faster. you can create two stored procedures each with one table
    | referenced in your union statement and then have them run right after
    | your imports.

    thanks lark. i didn't want to use a sub-select but couldn't remember the syntax for what i wanted to do. i worked up this (which still runs like a dog even with indicies):

    UPDATE Repair_Order_Lines l ,
    priceMaster p
    SET l.IsOEM = 1
    WHERE l.IsOEM = 0
    AND
    (
    l.PartNumber = p.PartNumber
    OR l.PartNumber = p.PrecedentPartNumber
    )

    as for using stored procs., they are not easy to port and don't easily scale. if i use the query in-line and called in my app, i can maintain it and implement it in a far more flexible and transparent manner.

    i do appreciate the suggestion though.

    cheers.
    Steve Guest

  6. #6

    Default Re: query optimization

    Steve wrote: 

    would you run explain on this puppy and let the group see the results?

    --
    lark -- net
    To reply to me directly, delete "despam".
    lark Guest

  7. #7

    Default Re: query optimization


    "lark" <net> wrote in message news:bX2_h.2878$news.prodigy.net...
    | Steve wrote:
    | > 
    | > | > UPDATE Repair_Order_Lines
    | > | > SET IsOEM = 0
    | > | > ;
    | > | > UPDATE Repair_Order_Lines
    | > | > SET IsOEM = 1
    | > | > WHERE PartNumber IN
    | > | > (
    | > | > SELECT DISTINCT
    | > | > PartNumber
    | > | > FROM priceMaster
    | > | > UNION
    | > | > SELECT DISTINCT
    | > | > PrecedentPartNumber
    | > | > FROM priceMaster
    | > | > )
    | > | > ;
    | > | >
    | > | > tia.
    | > | >
    | > | > btw, both PartNumber fields and the PrecedentPartNumber field are/is a
    | > | > VARCHAR(50).
    | > | >
    | > | >
    | > | >
    | > | if you take the union away from your sql, i think it'll run a lot
    | > | faster. you can create two stored procedures each with one table
    | > | referenced in your union statement and then have them run right after
    | > | your imports.
    | >
    | > thanks lark. i didn't want to use a sub-select but couldn't remember the
    | > syntax for what i wanted to do. i worked up this (which still runs like
    | > a dog even with indicies):
    | >
    | > UPDATE Repair_Order_Lines l ,
    | > priceMaster p
    | > SET l.IsOEM = 1
    | > WHERE l.IsOEM = 0
    | > AND
    | > (
    | > l.PartNumber = p.PartNumber
    | > OR l.PartNumber = p.PrecedentPartNumber
    | > )
    | >
    | > as for using stored procs., they are not easy to port and don't easily
    | > scale. if i use the query in-line and called in my app, i can maintain
    | > it and implement it in a far more flexible and transparent manner.
    | >
    | > i do appreciate the suggestion though.
    | >
    | > cheers.
    |
    | would you run explain on this puppy and let the group see the results?

    here they are...nothing but errors. however executing it shows none.

    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'UPDATE Repair_Order_Lines l ,
    priceMaster p
    SET l.' at line 1
    Steve Guest

  8. #8

    Default Re: query optimization

    Steve wrote: 


    how about this:

    update repair_order_lines
    set isoem = 1
    where isoem =0
    and partnumer in (
    (select distinct partnumber from pricemaster) or
    (select distince precedentpartnumer from pricemaster)
    )

    run explain on this first and see what the results are. also to make
    this run optimally you'd need a separate index on partnumber and
    precedentpartnumber. don't need an index on isoem because its
    cardinality is only 2.

    --
    lark -- net
    To reply to me directly, delete "despam".
    lark Guest

  9. #9

    Default Re: query optimization

    this query:

    SELECT IsOEM
    FROM Repair_Order_Lines
    WHERE IsOEM = 0
    AND PartNumber IN
    (
    (
    SELECT DISTINCT
    PartNumber
    FROM priceMaster
    )
    OR
    (
    SELECT DISTINCT
    PrecedentPartNumber
    FROM priceMaster
    )
    )

    is explained this way:

    1, 'PRIMARY', 'Repair_Order_Lines', 'ALL', '', '', , '', 188267, 'Using
    where'

    3, 'SUBQUERY', 'priceMaster', 'index', '', 'priceMaster_IDX', 511, '',
    418728, 'Using index; Using temporary'

    2, 'SUBQUERY', 'priceMaster', 'index', '', 'priceMaster_IDX', 511, '',
    418728, 'Using index'


    i can't get it to explain it as an update statement...but a select should
    let you see what you wanted.

    thx,

    me


    Steve Guest

  10. #10

    Default Re: query optimization

    Steve wrote: 

    doesn't look good. no wonder it's taking a while. ok how about this:

    select isoem from repair_order_lines join pricemaster on
    repair_order_lines.partnumber = pricemaster.partnumber or
    repair_order_lines.partnumber = pricemaster.precedentpartnumber
    where repair_order_lines = 0

    post the results of explain. also did you add the two index fields (i
    see one but i don't see the other)?

    --
    lark -- net
    To reply to me directly, delete "despam".
    lark Guest

  11. #11

    Default Re: query optimization

    lark wrote: 
    >
    > doesn't look good. no wonder it's taking a while. ok how about this:
    >
    > select isoem from repair_order_lines join pricemaster on
    > repair_order_lines.partnumber = pricemaster.partnumber or
    > repair_order_lines.partnumber = pricemaster.precedentpartnumber
    > where repair_order_lines = 0
    >
    > post the results of explain. also did you add the two index fields (i
    > see one but i don't see the other)?[/ref]

    I thought we'd already solved this one with a JOIN and a couple of other
    things


    Paul Guest

Similar Threads

  1. need help with query optimization
    By Bosconian in forum MySQL
    Replies: 8
    Last Post: October 10th, 10:30 PM
  2. Assistance with Query Optimization
    By GS in forum MySQL
    Replies: 4
    Last Post: September 13th, 09:27 PM
  3. MySQL Query Optimization
    By Bacci in forum PHP Development
    Replies: 4
    Last Post: February 7th, 05:57 PM
  4. Query Optimization ... Is this all I can get??
    By Anthony in forum Informix
    Replies: 7
    Last Post: January 16th, 07:51 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139