Friday, May 18, 2007

SAS - 3 : General Arithematics

SAS sample statistic functions

Sample statistics for a single variable across all observations are simple to obtain using, for example, PROC MEANS, PROC UNIVARIATE, etc. The simplest method to obtain similar statistics across several variables within an observation is with a 'sample statistics function'.

For example:

sum_wt=sum(of weight1 weight2 weight3 weight4 weight5);

Note that this is equivalent to

sum_wt=sum(of weight1-weight5);

but is not equivalent to

sum_wt=weight1 + weight2 + weight3 + weight4 + weight5;

since the SUM function returns the sum of non-missing arguments, whereas the '+' operator returns a missing value if any of the arguments are missing.

The following are all valid arguments for the SUM function:
sum(of variable1-variablen) where n is an integer greater than 1
sum(of x y z)
sum(of array-name{*})
sum(of _numeric_)
sum(of x--a)
where x precedes a in the PDV order

A comma delimited list is also a valid argument, for example:
sum(x, y, z)

However, I recommend always using an argument preceded by OF, since this minimises the chance that you write something like
sum_wt=sum(weight1-weight5);

which is a valid SAS expression, but evaluates to the difference between weight1 and weight5.

Other useful sample statistic functions are:

MAX(argument,...) returns the largest value

MIN(argument,...) returns the smallest value

MEAN(argument,...) returns the arithmetic mean (average)

N(argument,....) returns the number of nonmissing arguments

NMISS(argument,...) returns the number of missing values

STD(argument,...) returns the standard deviation

STDERR(argument,...) returns the standard error of the mean

VAR(argument,...) returns the variance

Example usage
You may, for example, have collected weekly test scores over a 20 week period and wish to calculate the average score for all observations with the proviso that a maximum of 2 scores may be missing.

if nmiss(of test1-test20) le 2 then
testmean=mean(of test1-test20);
else testmean=.;

Friday, May 11, 2007

SAS - 2 : Proc Print n Freq

Objectives

  1. Use PROC PRINT

  2. Use PROC FREQ

Procedure (PROC) statements specify the procedure to be used on the data set you created. The general form of the statements needed to execute a SAS procedure is:

    PROC procedure_name options parameters;
    Additional procedure information statements;

Example:

    PROC PRINT N;
    VAR NAME AGE;

Beginning SAS users should become familiar with these procedures:

    PROC PRINT;
    PROC FREQ;
    PROC UNIVARIATE;
    PROC MEANS;
    PROC PLOT;
    PROC SORT;

In this Session you will learn to use PROC PRINT and PROC FREQ.

PROC PRINT statement

    Use:

    Lists data as a table of observations

    Syntax:

    PROC PRINT;

    Result:

    SAS is asked to print a table of all observations in your data set

PROC PRINT lists data as a table of observations by variables. The general form of the PRINT procedure is:

    PROC PRINT;
    VAR NAME AGE SEX Q1 Q2;

VAR is an additional procedure information statement in the PRINT procedure that allows you to pick out specific variables to be printed in a certain order. Without the VAR statement, all variables in the data set would be output in the print listing.

Sample output from the PRINT procedure

ToC

PROC FREQ statement

    Use:

    To produce a frequency table or 2-way table of your data

    Syntax:

    PROC FREQ;
    TABLES var_name;

PROC FREQ shows the distribution of variable values through a one-way table or through crosstabulation tables. The general form of the PROC FREQ statement is:

    PROC FREQ;
    TABLES list of table requests/options and parameters;

Sample output from the frequency on sex


In order to produce a two-way crosstabulation table the variables to be used in the table should be entered in the TABLES statement as follows:

    PROC FREQ;
    TABLES SEX*Q1;

ToC

Session 3 Exercise

  1. Using Pico, edit your SAS program file called survey.sas.

  2. Add a PROC PRINT statement and a PROC FREQ statement to the end of your program by typing:

      PROC PRINT;
      VAR NAME AGE SEX Ql Q2;
      PROC FREQ;
      TABLES AGE SEX Q1 Q2;

    Your SAS program should now look like this:

  3. Now save your changes and submit your program for processing by SAS, at the $ prompt, type:

      sas survey.sas

    You will know your job is completed when you see the $ prompt again. Now edit the survey.log file and check for errors, warnings, and notes. If your program ran without errors go on to the next session. If your program output shows errors, check the program carefully to make sure it is exactly like the examples given in this tutorial. If you find the errors and correct them, make sure you save your changes and then resubmit your job as above.

SAS - 1 : Data Step

Objectives

  1. Learn to use the DATA statement

  2. Learn to use the INPUT statement

  3. Learn to use the CARDS statement

  4. Learn how to use the semicolon (;)

  5. Learn how to include TITLES on your output

  6. Learn how to run a SAS program

DATA statement

    Use:

    Names the SAS data set

    Syntax:

    DATA SOMENAME;

    Result:

    A temporary SAS data set named SOMENAME is created

The DATA statement signals the beginning of a DATA step. The general form of the SAS DATA statement is:

    DATA SOMENAME;

The DATA statement names the data set you are creating. The name should be from 1-8 characters and must begin with a letter or underscore.

ToC

INPUT statement

    Use:

    Defines names and order of variables to SAS

    Syntax:

    INPUT variable variable_type column(s);

    Result:

    Input data are defined for SAS

The INPUT statement specifies the names and order of the variables in your data. Although there are three types of INPUT statements which can be mixed, the beginning SAS user should only be concerned with learning how to use the Column Input style.

The INPUT statement should indicate in which columns each variable may be found. In addition, the INPUT statement should indicate how many lines it takes to record all of the information for each case or observation. The general form of the SAS INPUT statement is:

    INPUT NAME $ 1-14 AGE 16-17 SEX 19 Q1 21 Q2 23;

The variable NAME is a character variable as is indicated by the dollar sign ($) after the variable name. All of the other variables are numeric.

If there are multiple lines of data for each observation, use a forward slash ('/') in the INPUT statement to indicate where a new data line begins.

The general form of the SAS INPUT statement with multiple lines of data per observation is:

    INPUT NAME $ 1-14 AGE 16-17 / SEX 1 Q1 3 Q2 5;

Note: When describing the second line of input data, you begin with column one again. Each piece of data, or variable, will be read from the same columns for each of your observations. Only one INPUT statement is necessary to describe the data for all of your cases.

ToC

CARDS statement

    Use:

    Signals that input data will follow

    Syntax:

    CARDS;

    Result:

    Data can be processed for the SAS data set

The CARDS statement signals that the data will follow next and immediately precedes your input data lines. The general form of the CARDS statement is:

    DATA SURVEY;
    INPUT NAME $ 1-14 AGE 16-17 SEX 19 Q1 21 Q2 23;
    CARDS;

    (the data goes here)

Note: If the data is contained in an external file, instead of the CARDS, you will usse an INFILE statement to specify where that file resides. (Example: INFILE 'survey.dat';).

ToC

Semicolon

    Use:

    Signals the end of any SAS statement

    Syntax:

    A DATA Step or PROCedure statement;

    Result:

    SAS is signaled that the statement is complete

The semicolon (;) is used as a delimiter to indicate the end of SAS statements.

    DATA SURVEY;
    INPUT NAME $ 1-14 AGE 16-17 SEX 19 Ql 21 Q2 23;
    CARDS;

ToC

TITLE statement

    Use:

    Puts TITLES on your output

    Syntax:

    TITLE 'some title';

    Result:

    A TITLE is added at the top of each page

The TITLE statement assigns a title which appears at the top of the output page. The general form of the TITLE statement is:

    TITLE 'some title';

ToC

How to run a SAS program

Once you have used your editor to type in a SAS program and have saved that program you're ready to run the program. At the Linux ($) prompt, type:

    sas programname.sas

When SAS is finished running the program, it will return two files to the current directory: 1.) programname.log, which contains a log of the job execution, including errors, warnings and notes, and 2) programname.lst, which contains output from the procedures in the program.

For example, let's say you have used the Pico editor to enter a SAS program named survey.sas and have saved it in your root directory. To run that program from the Linux ($) prompt, type:

    sas survey.sas

You will know when SAS has finished running the program when the $ prompt reappears. You will have two new files in your directory, survey.log and survey.lst. Now you may edit the .log file and the .lst file to check for warnings, errors, and notes and to look at the output from the procedures.

A SAS Program Example

ToC

Session 2 Exercise

In the following exercise you will enter the first five SAS statements in your SAS program and you will enter the data from the sample survey.

  1. Invoke the Pico editor to create a new file. At the $ prompt, type:

      pico survey.sas

  2. Enter the first five lines of a SAS program. In order for the exercises in this tutorial to work successfully, you must type the program statements exactly as they are presented here.

      Type:

      TITLE 'Sample SAS Program';

      Result:

      A title is added to the program

      Type:

      DATA SURVEY;

      Result:

      The SAS data set named SURVEY is created

      Type:

      INPUT NAME $ 1-14 AGE 16-17 SEX 19 Q1 21 Q2 23;

      Result:

      The format for your data is described to SAS

      Type:

      CARDS;

      Result:

      SAS is given the signal that data will follow directly

      Type:


      JANE 20 2 2 5
      MICHAEL 18 1 5 2
      MARIA 21 2 2 4
      JUAN 26 1 4 3
      MILDRED 28 2 3 4
      GUNTHER 30 1 5 2
      JOSEPH 25 1 4 4
      JULIA 19 2 2 2
      CODY 27 1 1 1
      AARON 29 1 2 2

      Result:

      Your raw data are entered to be read by your INPUT statement.

    Note: Remember to enter the data in the exact columns you have specified in your INPUT statement. For example, AGE must be in columns 16 and 17.

  3. Now, save this program under the name survey.sas. (In Pico, you will type , answer yes when asked if you want to save changes and press when asked if you want to call the program survey.sas.)

    Your SAS program should look like this: