Professional Web Applications Themes

Comparing rows in 2 tables - Oracle Server

Jake wrote: > Hi, I'm trying to merge data from 2 tables. One table exists locally, > and the other lives across the network on another box. Each table has ~1M > rows. I > want to update the remote table with the values from the local table > where the key is the same but the data is different. > > My example has been (greatly) simplified. the table is created on both > database like: > > create table TABLEA > ( > ID Number not null, > DATA Varchar2(200) not null, > constraint PK_TABLEA primary key (ID) ...

  1. #1

    Default Re: Comparing rows in 2 tables

    Jake wrote:
    > Hi, I'm trying to merge data from 2 tables. One table exists locally,
    > and the other lives across the network on another box. Each table has ~1M
    > rows. I
    > want to update the remote table with the values from the local table
    > where the key is the same but the data is different.
    >
    > My example has been (greatly) simplified. the table is created on both
    > database like:
    >
    > create table TABLEA
    > (
    > ID Number not null,
    > DATA Varchar2(200) not null,
    > constraint PK_TABLEA primary key (ID)
    > )
    > /
    >
    > when I do:
    > select count(*)
    > from tablea local,
    > tableadblink remote
    > where local.id = remote.id
    > and nvl(local.data,'1') != nvl(remote.data,'1')
    >
    > I get my result back pretty quickly, ~2 mins (with a count of 212
    > different rows).
    > But when I update using a correlated subquery it takes 30ish mins.
    >
    > update tableadblink remote
    > set (data) =
    > (select data from tablea local
    > where local.id = remote.id)
    > where exists
    > (select 1 from
    > tablea local2
    > where local2.id = remote.id
    > and nvl(local.data,'1') != nvl(remote.data,'1')
    > )
    > /
    >
    > I think there has got to be a way to speed it up. I want to drive the
    > update off the small set of different rows. I can do it programmatically
    > but I'd like a SQL solution.
    >
    > Thanks
    > Jay
    Why? Why not use the solution you have in hand that works?

    Daniel Morgan

    DA Morgan Guest

  2. #2

    Default Re: Comparing rows in 2 tables

    Thanks Sybrand,

    I don't know how to give the /*+ driving_site */ hint (or which site should
    be driving). But I think you want me to try something like:

    connect scott/tigerremotesite

    update tablea local
    set (data) =
    (select data from tableareverselink remote1
    where local.id = remote.id)
    where id in
    (select id
    from tablea local2,
    tableareverselink remote2
    where local2.id = remote2.id
    and nvl(local2.data,'1') != nvl(remote2.data,'1')
    )

    right?

    Thanks
    Jake

    "Sybrand Bakker" <postbussybrandb.demon.nl> wrote in message
    news:a20d28ee.0212170031.1de8d939posting.google.c om...
    > "Jake" <Imnottelling.com> wrote in message
    news:<atm3qm$t8a$1bob.news.rcn.net>...
    > > Hi, I'm trying to merge data from 2 tables. One table exists locally,
    > > and the other lives across the network on another box. Each table has
    ~1M
    > > rows. I
    > > want to update the remote table with the values from the local table
    > > where the key is the same but the data is different.
    > >
    > > My example has been (greatly) simplified. the table is created on both
    > > database like:
    > >
    > > create table TABLEA
    > > (
    > > ID Number not null,
    > > DATA Varchar2(200) not null,
    > > constraint PK_TABLEA primary key (ID)
    > > )
    > > /
    > >
    > > when I do:
    > > select count(*)
    > > from tablea local,
    > > tableadblink remote
    > > where local.id = remote.id
    > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > >
    > > I get my result back pretty quickly, ~2 mins (with a count of 212
    > > different rows).
    > > But when I update using a correlated subquery it takes 30ish mins.
    > >
    > > update tableadblink remote
    > > set (data) =
    > > (select data from tablea local
    > > where local.id = remote.id)
    > > where exists
    > > (select 1 from
    > > tablea local2
    > > where local2.id = remote.id
    > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > > )
    > > /
    > >
    > > I think there has got to be a way to speed it up. I want to drive the
    > > update off the small set of different rows. I can do it
    programmatically
    > > but I'd like a SQL solution.
    > >
    > > Thanks
    > > Jay
    >
    > I would try to reverse the process: ie update locally and select
    > remotely, so connect to the remote database and use a reverse link.
    > In that case you don't have a distributed transaction (so no need for
    > 2 phase commit) and you can use the /*+ driving_site */ hint in your
    > subqueries.
    > Given the nature of your statement in that case you would probably do
    > better to rewrite the exists subquery to an in subquery and make sure
    > that this is executed once.
    >
    > Hth
    >
    > Sybrand Bakker
    > Senior Oracle DBA

    Jake Guest

  3. #3

    Default Re: Comparing rows in 2 tables

    I think that solution is to create temporary table on remote serwer and then apply update.

    connect scott/tigerremotesite

    create temp_tableName * from my_tablemy_dblink;

    and then update at remotesite locally.

    sorry for english.
    avram

    Jake <Imnottelling.com> wrote in message news:atna7c$c2j$1bob.news.rcn.net...
    > Thanks Sybrand,
    >
    > I don't know how to give the /*+ driving_site */ hint (or which site should
    > be driving). But I think you want me to try something like:
    >
    > connect scott/tigerremotesite
    >
    > update tablea local
    > set (data) =
    > (select data from tableareverselink remote1
    > where local.id = remote.id)
    > where id in
    > (select id
    > from tablea local2,
    > tableareverselink remote2
    > where local2.id = remote2.id
    > and nvl(local2.data,'1') != nvl(remote2.data,'1')
    > )
    >
    > right?
    >
    > Thanks
    > Jake
    >
    > "Sybrand Bakker" <postbussybrandb.demon.nl> wrote in message
    > news:a20d28ee.0212170031.1de8d939posting.google.c om...
    > > "Jake" <Imnottelling.com> wrote in message
    > news:<atm3qm$t8a$1bob.news.rcn.net>...
    > > > Hi, I'm trying to merge data from 2 tables. One table exists locally,
    > > > and the other lives across the network on another box. Each table has
    > ~1M
    > > > rows. I
    > > > want to update the remote table with the values from the local table
    > > > where the key is the same but the data is different.
    > > >
    > > > My example has been (greatly) simplified. the table is created on both
    > > > database like:
    > > >
    > > > create table TABLEA
    > > > (
    > > > ID Number not null,
    > > > DATA Varchar2(200) not null,
    > > > constraint PK_TABLEA primary key (ID)
    > > > )
    > > > /
    > > >
    > > > when I do:
    > > > select count(*)
    > > > from tablea local,
    > > > tableadblink remote
    > > > where local.id = remote.id
    > > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > > >
    > > > I get my result back pretty quickly, ~2 mins (with a count of 212
    > > > different rows).
    > > > But when I update using a correlated subquery it takes 30ish mins.
    > > >
    > > > update tableadblink remote
    > > > set (data) =
    > > > (select data from tablea local
    > > > where local.id = remote.id)
    > > > where exists
    > > > (select 1 from
    > > > tablea local2
    > > > where local2.id = remote.id
    > > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > > > )
    > > > /
    > > >
    > > > I think there has got to be a way to speed it up. I want to drive the
    > > > update off the small set of different rows. I can do it
    > programmatically
    > > > but I'd like a SQL solution.
    > > >
    > > > Thanks
    > > > Jay
    > >
    > > I would try to reverse the process: ie update locally and select
    > > remotely, so connect to the remote database and use a reverse link.
    > > In that case you don't have a distributed transaction (so no need for
    > > 2 phase commit) and you can use the /*+ driving_site */ hint in your
    > > subqueries.
    > > Given the nature of your statement in that case you would probably do
    > > better to rewrite the exists subquery to an in subquery and make sure
    > > that this is executed once.
    > >
    > > Hth
    > >
    > > Sybrand Bakker
    > > Senior Oracle DBA
    >
    >

    avram Guest

  4. #4

    Default Re: Comparing rows in 2 tables

    "Jake" <Imnottelling.com> wrote in message
    news:atna7c$c2j$1bob.news.rcn.net...
    > Thanks Sybrand,
    >
    > I don't know how to give the /*+ driving_site */ hint (or which site
    should
    > be driving). But I think you want me to try something like:
    >
    > connect scott/tigerremotesite
    >
    > update tablea local
    > set (data) =
    > (select data from tableareverselink remote1
    > where local.id = remote.id)
    > where id in
    > (select id
    > from tablea local2,
    > tableareverselink remote2
    > where local2.id = remote2.id
    > and nvl(local2.data,'1') != nvl(remote2.data,'1')
    > )
    >
    > right?
    >
    > Thanks
    > Jake
    >
    > "Sybrand Bakker" <postbussybrandb.demon.nl> wrote in message
    > news:a20d28ee.0212170031.1de8d939posting.google.c om...
    > > "Jake" <Imnottelling.com> wrote in message
    > news:<atm3qm$t8a$1bob.news.rcn.net>...
    > > > Hi, I'm trying to merge data from 2 tables. One table exists locally,
    > > > and the other lives across the network on another box. Each table has
    > ~1M
    > > > rows. I
    > > > want to update the remote table with the values from the local table
    > > > where the key is the same but the data is different.
    > > >
    > > > My example has been (greatly) simplified. the table is created on
    both
    > > > database like:
    > > >
    > > > create table TABLEA
    > > > (
    > > > ID Number not null,
    > > > DATA Varchar2(200) not null,
    > > > constraint PK_TABLEA primary key (ID)
    > > > )
    > > > /
    > > >
    > > > when I do:
    > > > select count(*)
    > > > from tablea local,
    > > > tableadblink remote
    > > > where local.id = remote.id
    > > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > > >
    > > > I get my result back pretty quickly, ~2 mins (with a count of 212
    > > > different rows).
    > > > But when I update using a correlated subquery it takes 30ish mins.
    > > >
    > > > update tableadblink remote
    > > > set (data) =
    > > > (select data from tablea local
    > > > where local.id = remote.id)
    > > > where exists
    > > > (select 1 from
    > > > tablea local2
    > > > where local2.id = remote.id
    > > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > > > )
    > > > /
    > > >
    > > > I think there has got to be a way to speed it up. I want to drive the
    > > > update off the small set of different rows. I can do it
    > programmatically
    > > > but I'd like a SQL solution.
    > > >
    > > > Thanks
    > > > Jay
    > >
    > > I would try to reverse the process: ie update locally and select
    > > remotely, so connect to the remote database and use a reverse link.
    > > In that case you don't have a distributed transaction (so no need for
    > > 2 phase commit) and you can use the /*+ driving_site */ hint in your
    > > subqueries.
    > > Given the nature of your statement in that case you would probably do
    > > better to rewrite the exists subquery to an in subquery and make sure
    > > that this is executed once.
    > >
    > > Hth
    > >
    > > Sybrand Bakker
    > > Senior Oracle DBA
    >
    >
    Jake,

    I agree with Sybrand; I'd do it as a 'pull' rather than a 'push'.

    But two questions:

    a) Have you tried minus rather than in or exists?
    b) Did you consider Oracle tion?

    Regards,
    Paul



    Paul Brewer Guest

  5. #5

    Default Re: Comparing rows in 2 tables

    Sybrand was correct (though I didn't need the SQL hint). The query looks
    like:

    connect scott/tigerremote.world

    update tablea local
    set (data) =
    (select data from tableareversedblink remote
    where remote.id = local.id)
    where (id) in
    (select id from tableareversedblink remote
    where remote.id = local.id
    and nvl(local.data,'1') <> nvl(remote.data,'1')
    )

    Wall clock drops from 6 minutes programmatically to 20sec in SQL

    Thanks
    Jay

    "avram" <aa181078zodiac.mimuw.edu.pl> wrote in message
    news:atnik2$ed7$1news.tpi.pl...
    > I think that solution is to create temporary table on remote serwer and
    then apply update.
    >
    > connect scott/tigerremotesite
    >
    > create temp_tableName * from my_tablemy_dblink;
    >
    > and then update at remotesite locally.
    >
    > sorry for english.
    > avram
    >
    > Jake <Imnottelling.com> wrote in message
    news:atna7c$c2j$1bob.news.rcn.net...
    > > Thanks Sybrand,
    > >
    > > I don't know how to give the /*+ driving_site */ hint (or which site
    should
    > > be driving). But I think you want me to try something like:
    > >
    > > connect scott/tigerremotesite
    > >
    > > update tablea local
    > > set (data) =
    > > (select data from tableareverselink remote1
    > > where local.id = remote.id)
    > > where id in
    > > (select id
    > > from tablea local2,
    > > tableareverselink remote2
    > > where local2.id = remote2.id
    > > and nvl(local2.data,'1') != nvl(remote2.data,'1')
    > > )
    > >
    > > right?
    > >
    > > Thanks
    > > Jake
    > >
    > > "Sybrand Bakker" <postbussybrandb.demon.nl> wrote in message
    > > news:a20d28ee.0212170031.1de8d939posting.google.c om...
    > > > "Jake" <Imnottelling.com> wrote in message
    > > news:<atm3qm$t8a$1bob.news.rcn.net>...
    > > > > Hi, I'm trying to merge data from 2 tables. One table exists
    locally,
    > > > > and the other lives across the network on another box. Each table
    has
    > > ~1M
    > > > > rows. I
    > > > > want to update the remote table with the values from the local table
    > > > > where the key is the same but the data is different.
    > > > >
    > > > > My example has been (greatly) simplified. the table is created on
    both
    > > > > database like:
    > > > >
    > > > > create table TABLEA
    > > > > (
    > > > > ID Number not null,
    > > > > DATA Varchar2(200) not null,
    > > > > constraint PK_TABLEA primary key (ID)
    > > > > )
    > > > > /
    > > > >
    > > > > when I do:
    > > > > select count(*)
    > > > > from tablea local,
    > > > > tableadblink remote
    > > > > where local.id = remote.id
    > > > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > > > >
    > > > > I get my result back pretty quickly, ~2 mins (with a count of 212
    > > > > different rows).
    > > > > But when I update using a correlated subquery it takes 30ish mins.
    > > > >
    > > > > update tableadblink remote
    > > > > set (data) =
    > > > > (select data from tablea local
    > > > > where local.id = remote.id)
    > > > > where exists
    > > > > (select 1 from
    > > > > tablea local2
    > > > > where local2.id = remote.id
    > > > > and nvl(local.data,'1') != nvl(remote.data,'1')
    > > > > )
    > > > > /
    > > > >
    > > > > I think there has got to be a way to speed it up. I want to drive
    the
    > > > > update off the small set of different rows. I can do it
    > > programmatically
    > > > > but I'd like a SQL solution.
    > > > >
    > > > > Thanks
    > > > > Jay
    > > >
    > > > I would try to reverse the process: ie update locally and select
    > > > remotely, so connect to the remote database and use a reverse link.
    > > > In that case you don't have a distributed transaction (so no need for
    > > > 2 phase commit) and you can use the /*+ driving_site */ hint in your
    > > > subqueries.
    > > > Given the nature of your statement in that case you would probably do
    > > > better to rewrite the exists subquery to an in subquery and make sure
    > > > that this is executed once.
    > > >
    > > > Hth
    > > >
    > > > Sybrand Bakker
    > > > Senior Oracle DBA
    > >
    > >
    >
    >

    Jake Guest

Similar Threads

  1. comparing 2 tables 1 database
    By adult.swim in forum Coldfusion Database Access
    Replies: 6
    Last Post: April 28th, 04:47 PM
  2. Replies: 3
    Last Post: October 10th, 03:16 PM
  3. Updating number of rows in tables
    By Scott Brown in forum PHP Development
    Replies: 1
    Last Post: September 14th, 05:05 AM
  4. Comparing two tables
    By mkarja in forum Microsoft SQL / MS SQL Server
    Replies: 2
    Last Post: July 11th, 12:42 PM
  5. comparing rows in a multidimensional array
    By perl user in forum PERL Miscellaneous
    Replies: 5
    Last Post: June 25th, 01:13 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139