SQL passthrough in SAS


Our main data source at work is an Oracle data warehouse, which we (mainly) query from SAS. There are two main ways we can do this – one is to use a libname statement with the OLEDB engine. You can then extract and process data directly from the Oracle tables exactly as you would from a SAS dataset, in a data step, like so:

libname oralib oledb
    Provider=MSDAORA
    Properties=("User ID"=&name;
    Password=&password; "Data Source"=oradw);

data newset;
  set oralib.oldset;
run;

This, however, is far less efficient than the alternative, which is to use the Proc SQL pass-through facility to extract the data before processing it:

proc sql;
  connect to oledb
    init_string="Provider=msdaora; User ID=&name;
    Password=&password; Data Source=oradw";
  create table newset as
  select * from connection to oledb
  (select * from oldset);
quit;

Here’s a nice visual explanation of why this is a more efficient data access method:


To paraphrase, with the pass-through any selection or processing of data is done in the database before pulling it into SAS. This makes it significantly faster, and the more tables being referenced the better the gain in speed over the use of the libname engine.

However, one thing you do miss out on with the pass-through is the ability to use results from an earlier query stored in a temporary dataset to restrict data extracted from a later one. For example, the following stores a dataset id_list in the SAS workspace:

proc sql;
  create table id_list as
  select
    a.id
  from
    oralib.table1 a
    inner join oralib.table2 b
        on a.emp_id = b.sp_emp_id
  where
    b.rate >= 10000
  group by a.id
  ;
quit;

id_list can then be used in subsequent queries, like so:

proc sql;
  create table using_ids as
  select *
  from oralib.table3 c
  inner join id_list d
    on c.id = d.id
  ;
quit;

Of course, the same end result can be achieved in a pass-through by using a subquery:

proc sql;
  connect to oledb
    (init_string="Provider=MSDAORA; User ID=&name;
    Password=&password; Data Source=oradw");
  create table using_ids as
  select * from connection to oledb
  (select *
  from table3 c
  where c.id in
    (select
        a.id
     from
        table1 a
        inner join table2 b
            on a.emp_id = b.sp_emp_id
     where
        b.rate >= 10000
     group by a.id
    )
  );
quit;

What I don’t like about this approach is the overhead of evaluating the inner query before going to work on the outer one. With a simple query like the above, the slow-down is not likely to be that noticeable, but as more tables and joined and more complex selection criteria are added, this becomes more of a concern. In terms of time taken, SQL code like this can often be greater than the sum of its parts.
An alternative approach is to create a temporary table on the database, which can then be referenced in the pass-through:

proc sql;
  create table oralib.id_list as
  select
    a.id
  from
    oralib.table1 a
    inner join oralib.table2 b
        on a.emp_id = b.sp_emp_id
  where
    b.rate >= 10000
  group by a.id
  ;
quit;

libname oralib clear; /*the libname can interfere with the pass-through*/

proc sql;
  connect to oledb
    (init_string="Provider=MSDAORA; User ID=&name;
    Password=&password; Data Source=oradw");
  create table using_ids as
  select * from connection to oledb
  (select *
  from table3 c
  inner join id_list d
    on c.id = d.id
  );
quit;

This of course assumes that the user has permissions to create tables on the database.

Advertisements

9 thoughts on “SQL passthrough in SAS

  1. Rich says:

    Thanks for the analysis. It’s funny that I came to your site looking for something else and found this post. I just learned that you can create a libname using an oracle table last week and was wondering what was more efficient.
    One other thing for me is that I prefer the simplicity of using the proc sql method, which allows me to easily see the database, table and schema being looking at, versus a libname which requires me (or anyone else working through the code) to go back to the original reference to find out where the data is originating from.

  2. Chris says:

    Hey, I recognize that video! That’s my then-two-year-old daughter who is threatening my data integrity by trying to eat a few records (near the end of the video).

    Thanks for sharing your tips.

    Chris

  3. Britt says:

    You may not have the solution and maybe there isn’t one but I’m looking to see if there’s a way to use an earlier SAS dataset when pulling data from an Oracle database without write access to the database. I believe the Oracle is currently 10g and SAS is 9.1.3 in case that helps any. Nothing horribly complex; the dataset only has one field and I want to pull claims from the Oracle database when there’s a match to one of the values in that field.

    • geoffness says:

      Britt – one method you could use is to create a macro variable with proc sql, then refer to it in your pass-through:

      proc sql;
      select var 
      into :varlist
      separated by ', '
      from dsname;
      quit;
      
      proc sql;
        connect to oledb
          (init_string="Provider=MSDAORA; User ID=&name;
          Password=&password; Data Source=oradw");
        create table using_vars as
        select * from connection to oledb
        (select *
        from tablename a
        where a.var in (&varlist.)
        );
      quit;
      

      Whether this is appropriate would depend on how many values you’re putting into the macro variable – there is a limit of 65,534 characters on the length of the variable value.

      Hope this helps 🙂
      Geoff

      • John says:

        I have the same problem as Brit, however I have more values that would fit in a macro var. Ideally I would LEFT JOIN my values, but I do not have write access to the Oracle DB.

        Any alternative approaches outside of segmenting into macro vars blocks of 65k and looping?

      • geoffness says:

        Sorry John, if there is one I’m not aware of it. The alternative of course is to avoid the explicit pass-through and assign a libname to the Oracle DB. It’s possible this will be a little slower, but if you have SAS /ACCESS to Oracle installed (rather than oledb or odbc), it might not be too bad.

      • Santhosh says:

        Iam new to SAS. I have following code:
        proc sql ;
        reset noprint;
        select data_type as datt, proc_code as procc
        into :datatype1 – :datatype20, :proccode1 – :proccode20
        from UCR1.testresult1;
        reset print;
        quit;

        proc sql;
        reset noprint;

        connect to odbc as mydb
        (datasrc=”uatdb01″ user=Santhosh
        password=Santhosh);
        create table UCR1.testraw1 AS
        select * from connection to mydb
        (select * from geico_ucr_file_split_good
        where Procedure_Code = ‘&proccode20’);

        disconnect from mydb;
        quit;

        proccode20 has a value of 87205
        but when the passthrough query is going to sqlserver, it is
        getting zero records. But if I am putting the actual value
        instead of the &proccode20, I am getting some 400 records.
        I am unable to find what is wrong with the passthrough query.

        I will appreciate your help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s