How to use SAS statistical software: SAS 101

The following guide will walk you through:

  1. How to install a free copy of SAS University Edition on your personal computer

  2. How to enter data into an excel spreadsheet so that SAS can read it.

  3. How to have SAS read your data

  4. How to have SAS interpret your data.

SAS is probably the most powerful tool out there for analyzing data. Although people can be afraid of SAS because coding is involved, I will walk you through all the steps here to get you analyzing your data in no time even without any prior coding experience.

Step 1: Install a free copy of SAS university edition

Go to and click on “Get free software” then follow all the instructions.

You will have to install VirtualBox and then SAS.

SAS will run within a web browser. I would recommend using Microsoft Edge if you have Windows because Chrome did not work for me (SAS was not working as intended inside Chrome).

After you followed all the instructions you will turn on the Virtual Box and once you are connected then you will open your browser (Microsoft Edge or internet explorer) to start SAS per the instructions provided.


Step 2: How to enter data into an excel spread sheath to be readable by SAS

Each line on excel is an observation (patient) and each column is a variable. You don’t want to have information about the same patient in multiple lines. For example, a study where one patient information is included in 2 lines (one for each kidney) can be a mess to analyze. I had this situation happen to me and because it was a large study I had to find a SAS expert to help me code my way around it. So try to get one patient per line if possible.


Variable titles

Capitalize the variable title and do not put any spaces or special characters (do not put Date of Surgery, put DATEOFSURGERY). Variable titles have to be one word only. You can have numbers in the word but it all has to be one word (PROCEDURE1, PROCEDURE2 etc).

Because the title of the variable sometimes is not self-explanatory (PROCEDURE) you need a separate data dictionary where you explain what the variable is supposed to be: PROCEDURE 0=Deflux, 1=Open reimplant, 2=Robotic reimplant; SIDE 0=left, 1=right, 2=left etc. Its good to have a data dictionary for each variable where it is perfectly clear what the options are. I usually put the data dictionary on a new sheet within the same document:

One you are about to get SAS to analyze the data you have to copy/paste the master data into another excel spreadsheet without any fancy formatting and more importantly save the sheet at Excel 97-2003 Workbook:

You might be thinking why do I have to save it to excel 97-2003 and not just excel? The reason is that that is the way it works for me and the way I learned (i.e, I am lazy to find the workaround). SAS is extremely redundant so there are tons of ways to do the same thing on SAS.

Format each column accordingly on excel

Right click on the excel column and then click on Format cell:

Numbers can be categorical variables like VUR grade or continuous variables like milliliter of Deflux injected. If you are dealing with a categorical variable format it as general but if it is a continuous variable format it as a number with however many decimals you want.

Step 3: How to have SAS read your data

During the installation of SAS you created a folder named SASUniversityEdition:

Inside that folder you have another folder named myfolders:

Inside myfolders is where you will put your Excel 97-2003 file. In the following example, the file name is penilecurvature. So the path to this file would be C:\SASUniversityEdition\myfolders.

The following code will tell SAS to read your data:

extraction code for sas

Here is above code in a way you can copy-paste:







When I learned SAS at my master’s program, we used the paid version of SAS and the code to get the data was different than the one above. It took me a few Google searches to find the code above that would work with SAS university edition.


Step 4: How to analyze your data with SAS:

Download the SAS cheating sheet with most of the info described below:


This part actually consists of 2 subparts:

Part 1: Creating a data set

This is where you can eliminate some entries that are incomplete or do not meet criteria, create new variables by using existing variables etc.

For dataset creation, you will use logical statements. For example, if you want to exclude bilateral cases you will say:

If SIDE=2 then delete;

Or to create a variable you will say:

Assuming today is 7/30/2018, you can create a variable counts days since surgery:


Each statement in SAS ends with a semicolon. If you forget this you will get an error. Not including a semicolon is the most common mistake people do in SAS. 

The code to create a data set is as follows:

data NAMEOFDATASET (drop=F16);
set work.import;

Then you can add all the requirements for the dataset:

data NAMEOFDATASET (drop=F16);
set work.import;
if vent=0 then lma=0;if vent=1 then lma=1;

It will look like this on SAS:

The data set above did not require too much manipulation. However, some datasets can be more complex:

Note: gree lines are not read by SAS. They work as reminders or notes-to-self put in between code lines. To create a greenline start with an * and ended with a semicolon.

Here are a couple more examples of dataset creation. If you can think of something SAS can do it and Google will tell you how to do it:

Part 2: Analysing your data set (The fun part at last…)

PROC is the name of the game now.

PROC commands SAS to do something with your dataset.

For example, you can just have SAS show you the new variables you created on the dataset by using PROC PRINT or you can have SAS organize and sort the data:


PROC FREQ is super useful. It tells you how many patients per group or subgroup and can also tell you if there is a statistical difference (Chi-Square):

With PROC MEANS you get means, medians, interquartile ranges (p25, p75), standard deviations and if you add class and types you can get all the above for all your different subgroups.

Logistic regression is the “meat” of most retrospective studies. PROC LOGISTICS is how you do it.

Let’s get a little fancy now and go beyond 20% of what you need to know. The closest thing to a randomized clinical trial you can do with retrospectively collected data is to do a PROPENSITY SCORE ANALYSIS. This is how you do it:

PROC UNIVARIATE I don’t use as much but here is how to use it:

You can create charts on SAS like on Excel using PROC CHART:

If you are a purist then you don’t do t-tests (because most biologic data does not follow a normal distribution). A purist does non-parametric tests using PROC NPAR1WAY:

Pearson and Spearman’s correlations are all over the scientific literature, PROC CORR:


PROC PLOT to make a linear regression graph:


And last but not least the extremely useful Survival curves PROC LIFETEST and Cox regression modeling PROC PHREG:



Of course all the above is not simple, but with some patience, persistence, trial and error and Google is very doable. Make any questions below or send us an email and we will gladly help out with your specific project!






Leave a Comment