Professional Web Applications Themes

data access in a udf table function - IBM DB2

Toralf Kirsten <tkirstenizbi.uni-leipzig.de> wrote: > Hi, > it is possible to refer to the data of an existing table in a udf table > function? I only saw examples which includes the data in a special > definition part of the function. I'm not really sure what's the sence of > it. Any ideas are welcome. Yes, you can access tables. For example: CREATE FUNCTION all_tables() RETURNS TABLE ( table_schema VARCHAR(128), table_name VARCHAR(128) ) LANGUAGE SQL DETERMINISTIC NO EXTERNAL ACTION READS SQL DATA RETURN SELECT tabschema, tabname FROM syscat.tables; SELECT * FROM TABLE ( all_tables() ) AS t; Or a ...

  1. #1

    Default Re: data access in a udf table function

    Toralf Kirsten <tkirstenizbi.uni-leipzig.de> wrote:
    > Hi,
    > it is possible to refer to the data of an existing table in a udf table
    > function? I only saw examples which includes the data in a special
    > definition part of the function. I'm not really sure what's the sence of
    > it. Any ideas are welcome.
    Yes, you can access tables. For example:

    CREATE FUNCTION all_tables()
    RETURNS TABLE ( table_schema VARCHAR(128), table_name VARCHAR(128) )
    LANGUAGE SQL
    DETERMINISTIC
    NO EXTERNAL ACTION
    READS SQL DATA
    RETURN SELECT tabschema, tabname
    FROM syscat.tables;

    SELECT *
    FROM TABLE ( all_tables() ) AS t;


    Or a bit more complex:

    -- get all the tables that refer directly or indirectly to the given table
    CREATE FUNCTION dep_tables(
    table_schema VARCHAR(128), table_name VARCHAR(128) )
    RETURNS TABLE ( table_schema VARCHAR(128), table_name VARCHAR(128) )
    LANGUAGE SQL
    DETERMINISTIC
    NO EXTERNAL ACTION
    READS SQL DATA
    RETURN WITH t(schema, name) AS
    ( VALUES ( table_schema, table_name )
    UNION ALL
    SELECT r.tabschema, r.tabname
    FROM syscat.references AS r, t
    WHERE ( r.reftabschema, r.reftabname ) =
    ( t.schema, t.name) )
    SELECT schema, name
    FROM t
    WHERE schema <> table_schema OR
    name <> table_name;

    CREATE TABLE s.a ( a INT NOT NULL PRIMARY KEY );
    CREATE TABLE s.b ( b INT NOT NULL PRIMARY KEY,
    FOREIGN KEY (b) REFERENCES s.a(a) );
    CREATE TABLE s.c ( c INT NOT NULL PRIMARY KEY,
    FOREIGN KEY (c) REFERENCES s.b(b) );

    SELECT *
    FROM TABLE ( dep_tables('S', 'A') ) AS t;

    TABLE_SCHEMA TABLE_NAME
    ------------------------ ----------------------------
    S B
    S C

    --
    Knut Stolze
    Information Integration
    IBM Germany / University of Jena
    Knut Stolze Guest

  2. #2

    Default Re: data access in a udf table function

    Hi Knut,
    thank you for your fast response.
    I'd like to use the scratchpad in order to store and manipulate the
    value from the last row.
    May be I should describe it in more detail. What I want to do is as follows.
    Given is the following table:
    --------------------
    | category | value |
    --------------------
    | A | 1 |
    | A | 4 |
    | A | 11 |
    | A | 12 |
    | B | 2 |
    ....

    Now, I want to group all values for each category by a given distance.
    For example, when I specify 5 as distance value I'd like to have the
    following result

    ----------------------------
    | category | value | group |
    ----------------------------
    | A | 1 | 1 |
    | A | 4 | 1 |
    | A | 11 | 2 |
    | A | 12 | 2 |
    | B | 2 | 1 |
    ....

    My be it is very easy and I don't see the forest for the trees ;-)
    What do you think?
    Tori


    Knut Stolze wrote:
    > Toralf Kirsten <tkirstenizbi.uni-leipzig.de> wrote:
    >
    >
    >>Hi,
    >>it is possible to refer to the data of an existing table in a udf table
    >>function? I only saw examples which includes the data in a special
    >>definition part of the function. I'm not really sure what's the sence of
    >>it. Any ideas are welcome.
    >
    >
    > Yes, you can access tables. For example:
    >
    > CREATE FUNCTION all_tables()
    > RETURNS TABLE ( table_schema VARCHAR(128), table_name VARCHAR(128) )
    > LANGUAGE SQL
    > DETERMINISTIC
    > NO EXTERNAL ACTION
    > READS SQL DATA
    > RETURN SELECT tabschema, tabname
    > FROM syscat.tables;
    >
    > SELECT *
    > FROM TABLE ( all_tables() ) AS t;
    >
    >
    > Or a bit more complex:
    >
    > -- get all the tables that refer directly or indirectly to the given table
    > CREATE FUNCTION dep_tables(
    > table_schema VARCHAR(128), table_name VARCHAR(128) )
    > RETURNS TABLE ( table_schema VARCHAR(128), table_name VARCHAR(128) )
    > LANGUAGE SQL
    > DETERMINISTIC
    > NO EXTERNAL ACTION
    > READS SQL DATA
    > RETURN WITH t(schema, name) AS
    > ( VALUES ( table_schema, table_name )
    > UNION ALL
    > SELECT r.tabschema, r.tabname
    > FROM syscat.references AS r, t
    > WHERE ( r.reftabschema, r.reftabname ) =
    > ( t.schema, t.name) )
    > SELECT schema, name
    > FROM t
    > WHERE schema <> table_schema OR
    > name <> table_name;
    >
    > CREATE TABLE s.a ( a INT NOT NULL PRIMARY KEY );
    > CREATE TABLE s.b ( b INT NOT NULL PRIMARY KEY,
    > FOREIGN KEY (b) REFERENCES s.a(a) );
    > CREATE TABLE s.c ( c INT NOT NULL PRIMARY KEY,
    > FOREIGN KEY (c) REFERENCES s.b(b) );
    >
    > SELECT *
    > FROM TABLE ( dep_tables('S', 'A') ) AS t;
    >
    > TABLE_SCHEMA TABLE_NAME
    > ------------------------ ----------------------------
    > S B
    > S C
    >
    Toralf Kirsten Guest

  3. #3

    Default Re: data access in a udf table function

    Toralf Kirsten <tkirstenizbi.uni-leipzig.de> wrote:
    > Hi Knut,
    > thank you for your fast response.
    > I'd like to use the scratchpad in order to store and manipulate the
    > value from the last row.
    That's not possible. Once a table function returns a row to the DB2 engine,
    the row cannot be modified subsequently. After all, the row could already
    have been further progressed or shipped to another node in an MPP system or
    whatever.
    > May be I should describe it in more detail. What I want to do is as
    > follows. Given is the following table:
    > --------------------
    > | category | value |
    > --------------------
    > | A | 1 |
    > | A | 4 |
    > | A | 11 |
    > | A | 12 |
    > | B | 2 |
    > ...
    >
    > Now, I want to group all values for each category by a given distance.
    > For example, when I specify 5 as distance value I'd like to have the
    > following result
    >
    > ----------------------------
    > | category | value | group |
    > ----------------------------
    > | A | 1 | 1 |
    > | A | 4 | 1 |
    > | A | 11 | 2 |
    > | A | 12 | 2 |
    > | B | 2 | 1 |
    > ...
    >
    > My be it is very easy and I don't see the forest for the trees ;-)
    > What do you think?
    I don't know how exactly you imagine the computation of the groups and
    categories takes place and how the numbers are assigned. So it's a bit
    difficult to answer your question.

    I see two options for you (there might be more I don't know yet):
    (1) use SQL to express your logic. Using a GROUP BY or the ROW_NUMBER()
    function could give you the desired group numbers.
    (2) use an external (C/C++/Java code) function which gets all the input,
    computes the results you want. The result is then stored in some (shared)
    memory area. The problem is that you need to know when the final input was
    received and the whole result is to be flushed and returned to DB2. Given
    that SQL is a set-oriented language and sets are, per definition, not
    ordered, you don't know what the last row would be.

    Your description sounds a bit like user-defined aggregates/column functions.
    Is that right?

    --
    Knut Stolze
    Information Integration
    IBM Germany / University of Jena
    Knut Stolze Guest

  4. #4

    Default Re: data access in a udf table function



    <snip>
    Given is the following table:
    >>--------------------
    >>| category | value |
    >>--------------------
    >>| A | 1 |
    >>| A | 4 |
    >>| A | 11 |
    >>| A | 12 |
    >>| B | 2 |
    >>...
    >>
    >>Now, I want to group all values for each category by a given distance.
    >>For example, when I specify 5 as distance value I'd like to have the
    >>following result
    >>
    >>----------------------------
    >>| category | value | group |
    >>----------------------------
    >>| A | 1 | 1 |
    >>| A | 4 | 1 |
    >>| A | 11 | 2 |
    >>| A | 12 | 2 |
    >>| B | 2 | 1 |
    >>...
    >>
    The actions to get this result set are:
    1) select category, value from x order by category, value
    2) for each category:
    first time: group = 1
    value_old = select min(value) from x group by category
    all the other fetches: if ((value_old + distance) <= value)
    then print(group++)
    else print(group)
    value_old = value
    > I see two options for you (there might be more I don't know yet):
    > (1) use SQL to express your logic. Using a GROUP BY or the ROW_NUMBER()
    > function could give you the desired group numbers.
    > (2) use an external (C/C++/Java code) function which gets all the input,
    > computes the results you want. The result is then stored in some (shared)
    > memory area. The problem is that you need to know when the final input was
    > received and the whole result is to be flushed and returned to DB2. Given
    > that SQL is a set-oriented language and sets are, per definition, not
    > ordered, you don't know what the last row would be.
    >
    > Your description sounds a bit like user-defined aggregates/column functions.
    > Is that right?
    That's right, therefore I find it very hard to crack the nut.
    I agree with you, may be there is only the way to write an external
    application which get the whole result set of the sql statement
    select category, value, min(value) over (partition by category) from x
    and compute the group numbers.
    By the way, what do you think promises the highest performance (the
    result set of the statement above contains at least 200,000 rows) :
    a) writing back the computed result set to a temporary table
    b) handling the result set in the application to join with other
    selected data / apply some filter conditions
    c) compute the result set in an external function / app which returns
    the whole result set to db2 (if this way is possible)

    In the first attempt we only want the rows which comes from the group
    with the highest number of rows per category (like a rank per category)
    and join these with other related data.
    Thanks, Tori

    Toralf Kirsten Guest

  5. #5

    Default Re: data access in a udf table function

    Thank you for your response. It works great.
    Please allow me a further stupid question: How can I apply this
    statement in a table function?
    I just get the error DB21034E, SQL0104N, SQLSTATE=42601
    I use the following table function definition which differ in the second
    select where I replaced the value 5 with the variable distance from the
    function parameter

    CREATE FUNCTION cat_group(distance INTEGER)
    RETURNS TABLE (category char(1), value INTEGER, group INTEGER)
    LANGUAGE SQL
    DETERMINISTIC
    NO EXTERNAL ACTION
    READS SQL DATA
    RETURN
    WITH t(category, value) AS(
    SELECT category, value from x
    )
    SELECT category,
    value,
    SUM(new_group) OVER(
    partition by category
    order by value range
    between unbounded preceding and current row) as group
    FROM
    (
    SELECT t.category
    , t.value
    , CASE WHEN
    MIN(value) OVER(partition by category order by value
    range between distance preceding and current row)
    = value
    THEN 1
    ELSE
    0
    END As New_group
    from
    t
    )
    AS SS
    ;

    Thank you for your help, Tori

    Paul Vernon wrote:
    > You can do this kind 'sequential' processing using the OLAP functions as well as more
    > the more standard SQL that Kunt used. Often it will be much quicker as well. Not
    > sure if the SQL is easier or more difficult to understand however....
    >
    > WITH t(category, value) AS( VALUES
    > ('A',1)
    > , ('A',4)
    > , ('A',5)
    > , ('A',11)
    > , ('A',12)
    > , ('A',18)
    > , ('B',2)
    > , ('B',3)
    > , ('B',13)
    > )
    > SELECT category
    > , value
    > , SUM(new_group) OVER(
    > partition by category
    > order by value range
    > between unbounded preceding and current row) as group
    > FROM
    > (
    > SELECT t.category
    > , t.value
    > , CASE WHEN
    > MIN(value) OVER(partition by category order by value
    > range between 5 preceding and current row)
    > = value
    > THEN 1
    > ELSE
    > 0
    > END As New_group
    > from
    > t
    > )
    > AS SS
    > ;
    >
    >
    > CATEGORY VALUE GROUP
    > -------- ----------- -----------
    > A 1 1
    > A 4 1
    > A 5 1
    > A 11 2
    > A 12 2
    > A 18 3
    > B 2 1
    > B 3 1
    > B 13 2
    >
    > 9 record(s) selected.
    >
    >
    > Regards
    > Paul Vernon
    > Business Intelligence, IBM Global Services
    >
    >
    Toralf Kirsten Guest

  6. #6

    Default Re: data access in a udf table function

    "Toralf Kirsten" <tkirstenizbi.uni-leipzig.de> wrote in message
    news:bg6o0s$9f7$1news.uni-leipzig.de...
    > Thank you for your response. It works great.
    > Please allow me a further stupid question: How can I apply this
    > statement in a table function?
    > I just get the error DB21034E, SQL0104N, SQLSTATE=42601
    > I use the following table function definition which differ in the second
    > select where I replaced the value 5 with the variable distance from the
    > function parameter
    Ah well the value in a RANGE clause needs to be a constant so my guess is that you
    won't be able to parameterise it in a SQL Function.

    Bit of a shame really. Sometimes one does need to use a variable RANGE (or ROWS).

    Regards
    Paul Vernon
    Business Intelligence, IBM Global Services


    Paul Vernon Guest

  7. #7

    Default Re: data access in a udf table function

    Toralf Kirsten <tkirstenizbi.uni-leipzig.de> wrote:
    > BTW, I use Chamberlin's Complete Guide to DB2 UDB which describes
    > designing functions very well. It includes also an example for creating
    > an aggregate function using the scratchpad.
    Well, there are a few restrictions with the approach described in the book.
    Have a look at the other thread "Column UDFs" where this is discussed.

    --
    Knut Stolze
    Information Integration
    IBM Germany / University of Jena
    Knut Stolze Guest

Similar Threads

  1. Replies: 0
    Last Post: June 1st, 03:15 AM
  2. Saving arabic data back into an Access table
    By gafoorgk in forum Macromedia Flash Data Integration
    Replies: 0
    Last Post: March 30th, 10:15 PM
  3. Replies: 4
    Last Post: October 27th, 03:01 PM
  4. Newbie Q: calling access data on onclick function
    By Jeremy in forum ASP Database
    Replies: 1
    Last Post: July 8th, 11:35 AM
  5. Converting data in a PDF doent into an Access table
    By L. T. Portella in forum Adobe Acrobat Windows
    Replies: 0
    Last Post: April 16th, 03:02 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139