Lee Lindley

I'm Just Another Perl Hacker who wound up in a big Oracle Database playground

Cost of UDT Object Methods in SQL

02 Apr 2022 » oracle, sql, plsql

Introduction

In a prior post, CSV Clob and Private Temporary Table, I described how one could use some new tools I created to parse a CSV clob into records and fields, then use that parsed data in a SQL statement, perhaps including DML.

The article was about a work in progress and the code has changed, but it will give you an idea of how we got here if you are interested.

I ran into performance issues for large datasets while testing it. This article is about the techniques I tried for tuning it, and a fact about Oracle Object Type methods that I had not thought about much before.

Note: you will see what may be unfamiliar methods of perlish_util_udt Object type in this code. They work like Perl operators on lists.

SQL Engine and PL/SQL Method Calls

Here is an example query that is built as dynamic SQL in a PL/SQL procedure. It returns a resultset built from reading CSV data provided to the program as a CLOB.

You can find the packages at https://github.com/lee-lindley/plsql_utilities. The procedure I’m showing examples from is app_csv_pkg.create_ptt_csv.

WITH a AS (
    SELECT perlish_util_udt(
            app_csv_pkg.split_csv(t.s, p_separator => :p_separator, p_strip_dquote => :p_strip_dquote, p_keep_nulls => 'Y')
        ) AS p
    FROM TABLE( app_csv_pkg.split_clob_to_lines(:p_clob, p_skip_lines => 1) ) t
) SELECT 
    x.p.get(1) AS "Employee ID"
    ,x.p.get(2) AS "Last Name"
    ,x.p.get(3) AS "First Name"
    ,x.p.get(4) AS "nickname"
FROM a x

The pipelined table function split_clob_to_lines is efficient. What turned out to be not efficient is the call to the perlish_util_udt object conststructor and/or the call to the function split_csv in the SQL SELECT list. Those are both efficient when called inside a PL/SQL program.

This should not have surprised me. I know that the SQL engine must do a context switch for each call to a PL/SQL program unless it is cached. In this case the cost was much, much larger than I expected.

For a 10K row CLOB of CSV data (20 fields each row) the above construct ran for 20 minutes to process the query on an AIX based database, and 6 minutes on my Intel NUC running in a Hyper-V Linux partition. It is difficult to tell what it is doing from the WAIT events, but I suspected it was in-memory operations and context switching.

Removing the Object

I also suspected the object method get calls might be doing context switches. My first refinement attempt was to remove the object from the picture and use a WITH FUNCTION to snag the field values from the array returned by split_csv.

WITH FUNCTION wget(
    p_arr   ARR_VARCHAR2_UDT
    ,p_i    NUMBER
) RETURN VARCHAR2
AS
BEGIN
    RETURN p_arr(p_i);
END;
a AS (
    SELECT app_csv_pkg.split_csv(t.s
                    , p_separator => :p_separator, p_strip_dquote => :p_strip_dquote, p_keep_nulls => 'Y'
            ) AS arr
    FROM TABLE( app_csv_pkg.split_clob_to_lines(:p_clob, p_skip_lines => 1) ) t
) SELECT 
     get(a.arr, 1) AS "Employee ID"
    ,get(a.arr, 2) AS "Last Name"
    ,get(a.arr, 3) AS "First Name"
    ,get(a.arr, 4) AS "nickname"
FROM a x

This also ran for 6 minutes on my Intel NUC database. We haven’t ruled out that the object could play a part, but we know that the function call split_csv is a problem.

If you are thinking about using a UDF pragma on that puppy (split_csv), I could not get any of the variations I tried to give an improvement. There are many examples shown of the limitations of that pragma (or using a WITH function) that does anything very complicated.

Make the Function Pipelined in a Chain

The next refactoring I tried was moving the call to split_csv into another Pipelined Table Function to be called in a chain with split_clob_to_lines. I called it split_lines_to_fields (catcalls from the peanut gallery may be deserved). Maybe I’ll come up with a better name before I merge this code.

From the package header:

    TYPE t_csv_fields_rec IS RECORD(
        arr ARR_VARCHAR2_UDT
        ,rn NUMBER
    );
    TYPE t_arr_csv_fields_rec IS TABLE OF t_csv_fields_rec;

    FUNCTION split_lines_to_fields(
        p_curs          t_curs_csv_row_rec
        ,p_separator    VARCHAR2    DEFAULT ','
        ,p_strip_dquote VARCHAR2    DEFAULT 'Y' -- also unquotes \" and "" pairs within the field to just "
        ,p_keep_nulls   VARCHAR2    DEFAULT 'Y'
    ) 
    RETURN t_arr_csv_fields_rec
    PIPELINED
    ;

and from the package body:

    FUNCTION split_lines_to_fields(
        p_curs          t_curs_csv_row_rec
        ,p_separator    VARCHAR2    DEFAULT ','
        ,p_strip_dquote VARCHAR2    DEFAULT 'Y' -- also unquotes \" and "" pairs within the field to just "
        ,p_keep_nulls   VARCHAR2    DEFAULT 'Y'
    ) 
    RETURN t_arr_csv_fields_rec
    PIPELINED
    IS
        v_row       t_csv_fields_rec;
        v_in_row    t_csv_row_rec;
    BEGIN
        LOOP
            FETCH p_curs INTO v_in_row;
            EXIT WHEN p_curs%NOTFOUND;
            v_row.rn := v_in_row.rn;
            v_row.arr := split_csv(v_in_row.s
                                    , p_separator       => p_separator
                                    , p_strip_dquote    => p_strip_dquote
                                    , p_keep_nulls      => p_keep_nulls
            );
            PIPE ROW(v_row);
        END LOOP;
        RETURN;
    END split_lines_to_fields
    ;

The documentation never shows a Bulk Collect in the chained pipeline examples, and the descriptions it uses and other hints make me believe it is not needed. I wish it was more explicit.

As recommended in the documentation, it is called in a chain of pipelined table functions directly in SQL with CURSOR casts as

WITH FUNCTION wget(
    p_arr   ARR_VARCHAR2_UDT
    ,p_i    NUMBER
) RETURN VARCHAR2
AS
BEGIN
    RETURN p_arr(p_i);
END;
a AS (
    SELECT t.arr 
    FROM TABLE(
                app_csv_pkg.split_lines_to_fields(
                    CURSOR(SELECT * 
                           FROM TABLE( app_csv_pkg.split_clob_to_lines(:p_clob, p_skip_lines => 1) )
                    )
                    , p_separator => :p_separator, p_strip_dquote => :p_strip_dquote, p_keep_nulls => 'Y'
                )
    ) t
) SELECT 
     get(a.arr, 1) AS "Employee ID"
    ,get(a.arr, 2) AS "Last Name"
    ,get(a.arr, 3) AS "First Name"
    ,get(a.arr, 4) AS "nickname"
FROM a

This ran in 23 seconds. Moving the function call into a pipelined table chain solves the problem, but I still want to know whether or how much the Object plays a part.

Call the Object Constructor in SQL

I could have moved the object constructor into the pipeline chain (and did try that), but it turns out to not be necessary.

WITH a AS (
    SELECT perlish_util_udt(t.arr) AS p
    FROM TABLE(
                app_csv_pkg.split_lines_to_fields(
                    CURSOR(SELECT * 
                           FROM TABLE( app_csv_pkg.split_clob_to_lines(:p_clob, p_skip_lines => 1) )
                    )
                    , p_separator => :p_separator, p_strip_dquote => :p_strip_dquote, p_keep_nulls => 'Y'
                )
    ) t
) SELECT 
     X.p.get(1) AS "Employee ID"
    ,X.p.get(2) AS "Last Name"
    ,X.p.get(3) AS "First Name"
    ,X.p.get(4) AS "nickname"
FROM a X

This also ran in 23 seconds. I prefer the object way better than the WITH FUNCTION way, but if you don’t already have an object wrapped around your array with a handy method, the WITH FUNCTION is just dandy.

Conclusion

Calls to a PL/SQL function in a SELECT list can lead to significant cost via context switching. This is well known, and there are several caching strategies (DETERMINISTIC, RESULT_CACHE, Scalar Subquery) to mitigate it. Using a Pipelined Table Function chain as I did here is also a viable mitigation strategy, and I believe the appropriate one for this use case.

My concern that object construction and simple method function calls in the SQL engine could context switch seems to be unfounded. I re-read the Object Relational Developer Guide and searched Google several ways looking for more information about how object methods are implemented, but it is not clearly stated. They seem to be neither fish nor fowl, SQL engine nor PL/SQL engine.

I am wondering why using an object type is not touted as another mitigation strategy for PL/SQL Function context switching (at least for when the method does not need to call anything other than builtin functions)? I realize we have WITH FUNCTIONs, UDF pragma, and with Oracle 21c, SQL Macros, but it would be nice to know if Object methods are an alternative.

I may explore this as a context switching mitigation strategy another day.

Appendix - The Full Monty

If you have a use case that calls for the ultimate in performance (this one does not, but I happened to try this early in my analysis), you can get down and dirty with DBMS_SQL and I believe what is called “Method 4” dynamic SQL. That classification is because we have a variable number of bind elements.

Unlike the other examples, this one shows the entire procedure that creates a Private Temporary Table (PTT) and populates it from the CSV Clob.

The best I could achieve with the code shown in the article was 23 seconds. This version operates in 19 seconds. It isn’t worth the complexity for this package, but it is nice to know how to do it for the rare occassions when you might need to squeeze out that last little bit of efficiency.

The first part is mostly the same as the code I didn’t show you for the other variations. It parses the first row of the CLOB to get the column names and creates the PTT.

    PROCEDURE create_ptt_csv (
         --
         -- creates private temporary table "ora$ptt_csv" with columns named in first row of data (case preserved).
         -- from a CLOB containing CSV lines.
         -- All fields are varchar2(4000)
         --
	     p_clob         CLOB
	    ,p_separator    VARCHAR2    DEFAULT ','
	    ,p_strip_dquote VARCHAR2    DEFAULT 'Y' -- also unquotes \" and "" pairs within the field to just "
    ) IS
        v_cols          perlish_util_udt; -- for manipulating column names into SQL statement
        v_sql           CLOB;
        v_first_row     VARCHAR2(32767);
        v_ins_curs      INT;
        v_num_rows      INT;
        v_last_row_cnt  BINARY_INTEGER := 0;
        v_col_cnt       BINARY_INTEGER;
        v_vals_1_row    ARR_VARCHAR2_UDT;  -- from split_csv on 1 line
        v_rows          DBMS_SQL.varchar2a;     -- from split_clob_to_lines fetch
        --
        -- variable number of columns, each of which has a bind array.
        --
        TYPE varchar2a_tab  IS TABLE OF DBMS_SQL.varchar2a INDEX BY BINARY_INTEGER;
        v_vals          varchar2a_tab;          -- array of columns each of which holds array of values
        --
        -- We get all but the header row when we read the clob in a loop.
        --
        CURSOR c_read_rows IS
            SELECT t.s
            FROM TABLE(app_csv_pkg.split_clob_to_lines(p_clob, p_skip_lines => 1))  t
            ;
    BEGIN
        BEGIN
            -- read the first row only
            SELECT s INTO v_first_row 
            FROM TABLE( app_csv_pkg.split_clob_to_lines(p_clob, p_max_lines => 1) )
            ;
            IF v_first_row IS NULL THEN
                raise_application_error(-20222,'app_csv_pkg.create_ptt_csv did not find csv rows in input clob.');
            END IF;
        EXCEPTION WHEN NO_DATA_FOUND THEN
            raise_application_error(-20222,'app_csv_pkg.create_ptt_csv did not find csv rows in input clob.');
        END;
        -- split the column header values into collection
        v_cols := perlish_util_udt(split_csv(v_first_row, p_separator => p_separator, p_strip_dquote => 'Y'));
        v_col_cnt := v_cols.arr.COUNT;

        -- create the private global temporary table with "known" name and columns matching names found
        -- in csv first record
        --
        v_sql := 'DROP TABLE ora$ptt_csv';
        BEGIN
            EXECUTE IMMEDIATE v_sql;
        EXCEPTION WHEN OTHERS THEN NULL;
        END;

        v_sql := 'CREATE PRIVATE TEMPORARY TABLE ora$ptt_csv(
'
            ||v_cols.map('"$_"    VARCHAR2(4000)').join('
,')
            ||'
)'
            ;
        DBMS_OUTPUT.put_line(v_sql);
        EXECUTE IMMEDIATE v_sql;

The next part populates the PTT. This variation uses bulk collect to read the data as lines, splits the lines into columns, stuffs the column data into individual arrays, then executes a DBMS_SQL INSERT cursor bound to those arrays. Pretty slick, if a little hard core.

        -- 
        -- Dynamic sql for dbms_sql. will be used with bind arrays.
        -- Of note is that it reports conventional load even if specify append.
        -- I don't understand that as I've seen other reports that direct path load works.
        -- Does not seem to matter though.
        --
        v_sql := 'INSERT INTO ora$ptt_csv(
'
        ||v_cols.map('"$_"').join(', ')             -- the column names in dquotes
        ||'
) VALUES (
'
        ||v_cols.map(':$##index_val##').join(', ')  -- :1, :2, :3, etc... bind placeholders
        ||'
)';
        DBMS_OUTPUT.put_line(v_sql);
        v_ins_curs := DBMS_SQL.open_cursor;
        DBMS_SQL.parse(v_ins_curs, v_sql, DBMS_SQL.native);

        OPEN c_read_rows;
        LOOP
            FETCH c_read_rows BULK COLLECT INTO v_rows LIMIT 100;
            EXIT WHEN v_rows.COUNT = 0;
            FOR i IN 1..v_rows.COUNT
            LOOP
                v_vals_1_row := app_csv_pkg.split_csv(v_rows(i), p_separator => p_separator, p_strip_dquote => p_strip_dquote, p_keep_nulls => 'Y');
                -- j is column number
                FOR j IN 1..v_col_cnt
                LOOP
                    v_vals(j)(i) := v_vals_1_row(j);
                END LOOP;
            END LOOP;

            IF v_last_row_cnt != v_rows.COUNT THEN -- will be true on first loop iteration and maybe last
                v_last_row_cnt := v_rows.COUNT;
                -- bind each column array. v_vals has an array for every column
                FOR j IN 1..v_col_cnt
                LOOP
                    DBMS_SQL.bind_array(v_ins_curs, ':'||TO_CHAR(j), v_vals(j), 1, v_last_row_cnt);
                END LOOP;
            END IF;

            v_num_rows := DBMS_SQL.execute(v_ins_curs);

        END LOOP;
        DBMS_SQL.close_cursor(v_ins_curs);
        CLOSE c_read_rows;
    END create_ptt_csv
    ;