Biostatistics for Busy Doctors: The Basics

Clinicians need a basic understanding of biostatistical variables and the tests used to compare them. Following are the 20% of statistical terms mentioned in 80% of the journal articles.



  1. Fixed characteristics (sex, gender) 
  1. Modifiable characteristics (blood pressure, blood glucose) 
  1. Events (death, surgical complication, infection, metastasis-free survival).  


Different statistical tests are used depending on the type of variable in question.

  • For modifiable characteristic like blood pressure, parametric (t-test, ANOVA) or nonparametric tests (Mann-U-Whitney) can be used.  
  • For events, a chi-square testFisher exact test or a survival analysis (Kaplan-Meyer) can be done.  


A regular t-test is used for unpaired data –comparing cholesterol levels among college students from 2 different cities.

A paired t-test is used for paired data –comparing cholesterol levels in the same subject before and after a diet.


Linear Regression 


Another way of analyzing data is in an XY plot, where the value of one modifiable variable in one subject (blood cholesterol) is plotted against the value of another modifiable variable in the same subject (consumption of butter) to see if one is associated with the other. These associations are usually referred to as correlations. When linear, they can be represented by an equation with a Y-intercept and a slope (simple linear regression).  


Statistical tests when controlling for other variables: “models” 

 A “model” represents an equation that includes a set of independent or predictive variables that together help to calculate a dependent, or outcome variable. There is a lot of freedom as to how many variables go into the model, which are rarely pre-specified. Researchers can then test various models (by adding or removing variables from it) to see which one predicts the outcome the best.   

Multiple linear regression

Multiple linear regression is represented in an equation where multiple independent variables are used to calculate the value of a dependent modifiable variable. For example, if blood pressure is the dependent or outcome variable, the independent variables can be weight, race, sex, cholesterol level, experimental drug (0=no, 1=yes) etc. Multiple linear regression is the equivalent of the t-test or simple linear regression when no covariates are used.  

Multiple logistic regression

Multiple logistic regression is used to predict the chances (odds ratio) of an event happening. Similar to multiple linear regression,  multiple variables predict the odds ratio of the outcome variable, i.e stroke or infection. Logistic regression is analogous to the chi-square or Fisher exact tests when not dealing with confounding variables.  

A Cox-regression model predicts the time to the event or survival time (hazard ratio).  Kaplan-Meyer curves and the log-rank test are similar to cox-regression but without the covariates.  



 3 variables exist: fixed, modifiable and events. Data can be paired or unpaired. Models are mathematical equations were 2 or more variables predict either the value of a modifiable variable or the risk/survival time of an event.

Leave a Comment