# Research Statistics

### T-Test: Independent Samples

A t-test is used to determine if two sets of data are different enough to conclude that the underlying populations from which the data were drawn are also different. In a controlled experiment, a t-test could be used to determine if control and experimental conditions differed after the application of some treatment, such as evaluating the effectiveness of some learning intervention. In survey research, a t-test could be used to determine if responses drawn from different populations differ, such as evaluating whether domestic and international students view the Georgia Tech OMS program equally favorably. The following sources explain simple t-tests.

The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-group randomized experimental design. Trochim, W. M. K. (2006) The T-Test. Retrieved from: http://www.socialresearchmethods.net/kb/stat_t.php |

What Is a t-test? And Why Is It Like Telling a Kid to Clean Up that Mess in the Kitchen? A t-test is one of the most frequently used procedures in statistics. But even people who frequently use t-tests often don?t know exactly what happens when their data are wheeled away and operated upon behind the curtain using statistical software like Minitab. Runkel, P. (2013) What Is a t-test? And Why Is It Like Telling a Kid to Clean Up that Mess in the Kitchen? Retrieved from: http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen |

A t-test helps you compare whether two groups have different average values (for example, whether men and women have different average heights). T-Test (Independent Samples) (n.d.) Retrieved from: http://docs.statwing.com/examples-and-definitions/t-test/ |

### T-Test: Matched Pairs

Simple t-tests will take care of many analyses, but there are slightly more complicated versions for more complex analyses. For example, t-tests assume that the two groups don't interact. But what if you wanted to test the effectiveness of a learning tool without doing a controlled experiment? What if you simply wanted to evaluate whether the students knew more after using the tool than they knew before? For that, you would use a matched t-test, where you pair up connected values. The following sources explain matched t-tests.

Inferences from Matched Pairs, From Chapter 8 of Elementary Statistics Triola, M. F. (2010). Elementary statistics. Boston: Addison-Wesley. |

Dependent T-Test for Paired Samples The dependent t-test is testing the null hypothesis that there are no differences between the means of the two related groups. If we get a statistically significant result, we can reject the null hypothesis that there are no differences between the means in the population and accept the alternative hypothesis that there are differences between the means in the population. Dependent T-Test for Paired Samples (n.d.) Retrieved from: https://statistics.laerd.com/statistical-guides/dependent-t-test-statistical-guide-3.php |

Hypothesis Test: Difference Between Paired Means This lesson explains how to conduct a hypothesis test for the difference between paired means. Hypothesis Test: Difference Between Paired Means. (n.d.). Retrieved April 30, 2017, from http://stattrek.com/hypothesis-test/paired-means.aspx?Tutorial=AP |

This tutorial will walk you through the use of the paired t-test. Paired t-test (n.d.) Retrieved from: http://sites.nicholas.duke.edu/statsreview/means/paired/ |

### T-Test: One-Sample

A third kind of t-test, the one-sample t-test, can be used when we know the population mean and want to evaluate whether or not a particular sample matches that population mean. For example, we may want to evaluate whether incoming Georgia Tech OMS students have average GRE scores that match the GRE average - the GRE average is known, and we may take one sample of incoming OMS students and compare their mean GRE scores to the GRE average. The following sources explain one-sample t-tests.

The One Sample t Test determines whether the sample mean is statistically different from a known or hypothesized population mean. LibGuides: SPSS Tutorials: One Sample t Test. (n.d.). Retrieved April 30, 2017, from http://libguides.library.kent.edu/SPSS/OneSampletTest |

The one-sample t-test is used when we want to know whether our sample comes from a particular population but we do not have full population information available to us. One-Sample t-Test. (n.d.). Retrieved April 29, 2017, from http://www.psychology.emory.edu/clinical/bliwise/Tutorials/TOM/meanstests/tone.htm |

An independent one-sample t-test is used to test whether the average of a sample differ significantly from a population mean, a specified value ?0. Independent One-Sample T-Test (n.d.) Retrieved from: https://explorable.com/independent-one-sample-t-test |

### T-Test: Calculators

Generally, though, you won't do the math for t-tests by hand. You might use advanced statistical like SPSS or R, but you can also take advantage of simple online calculators like the ones below.

T-Test Calculator for 2 Dependent Means From SocialScienceStatistics.com |

### ANOVA

t-tests test for differences between groups. However, if you read that information carefully, you may notice something problematic. t-tests usually use a confidence level of 95%, but that means one in every twenty tests could be flawed. What if we want to compare five different groups for differences between any pair of groups? We would need twenty comparisons, raising the odds of getting a false positive. An Analysis of Variance, or ANOVA, test helps prevent this by giving one test that compares for differences among multiple groups

Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments when treatments, processes, materials or products are being compared. Cornish, R. (2006) Oneway Analysis of Variance. Retrieved from: http://www.statstutor.ac.uk/resources/uploaded/onewayanova.pdf |

ANOVA: ANalysis Of VAriance between groups ANOVA: ANalysis Of VAriance between groups (n.d.) Retrieved from: http://www.physics.csbsju.edu/stats/anova.html |

Online ANOVA calculator |

The Analysis Of Variance, popularly known as the ANOVA, can be used in cases where there are more than two groups. ANOVA (n.d.) Retrieved from: https://explorable.com/anova |

Analysis of Variance. A statistical test for heterogeneity of means by analysis of group variances. ANOVA is implemented as ANOVA[data] in the Wolfram Language package ANOVA. Weisstein, E. W. (n.d.) ANOVA. From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/ANOVA.html |

Introduction to Analysis of Variance Learning Objectives 1) What null hypothesis is tested by ANOVA 2) Describe the uses of ANOVA Lane, D. M. (n.d.) Introduction to Analysis of Variance. Retrieved from: http://onlinestatbook.com/2/analysis_of_variance/intro.html |

One-Way Analysis of Variance for Independent or Correlated Samples Online ANOVA calculator |

### MANOVA

A MANOVA, or Multivariate Analysis of Variance, evaluates whether multiple categories predict variance across some variable. For example, a MANVOA could tell us if there are interactions between gender and international status in predicting students' perception of the OMS program. Here are some sources on MANOVA.

A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not properly understood, as demonstrated by the fact that few of the MANOVAs published in the scientific literature were accompanied by the correct post hoc procedure, descriptive discriminant analysis (DDA). The purpose of this article is to explain the theory behind and meaning of MANOVA and DDA. I also provide an example of a simple MANOVA with real mental health data from 4,384 adolescents to show how to interpret MANOVA results Warne, R. T. (2014). A primer on multivariate analysis of variance (MANOVA) for behavioral scientists. Practical Assessment, Research & Evaluation, 19(17). |

Multivariate Analysis of Variance (MANOVA) The Multivariate Analysis of Variance (MANOVA) is the multivariate analog of the Analysis of Variance (ANOVA) procedure used for univariate data. We will introduce the Multivariate Analysis of Variance with the Romano-British Pottery data example. Lesson 8: Multivariate Analysis of Variance (MANOVA). (n.d.). Retrieved April 29, 2017, from https://onlinecourses.science.psu.edu/stat505/node/159 |

If you have been analyzing ANOVA designs in traditional statistical packages, you are likely to find R's approach less coherent and user-friendly. A good online presentation on ANOVA in R can be found in ANOVA section of the Personality Project. Robk@statmethods.net, R. K. (n.d.). ANOVA. Retrieved April 30, 2017, from http://www.statmethods.net/stats/anova.html |

### Linear Regression

t-tests and ANOVA both aim to state if two different groups are different. Thus, they rely on data that can be split based on some discrete categories, such as whether the data comes from a pre-test or a post-test, or whether the data comes from a first-year student or a second-year student. However, what if our explanatory variable is continuous instead of discrete? What if, for example, we wanted to see if class performance varied as a function of number of forum contributions or time spent watching videos? In this case, we would be looking at regression. The simplest form of regression is linear regression, where one variable is assumed to be a linear function of another.

Simple Linear Regression Using Excel (watch on YouTube) YouTube video tutorial Snider, J. (n.d.) Simple Linear Regression Using Excel. Retrieved from: https://www.youtube.com/watch?t=128&v=IO8RQ-V3Xmw |

Introduction to Linear Regression Learning Objectives 1) Define linear regression 2) Identify errors of prediction in a scatter plot with a regression line Lane, D. M. (2015) Introduction to Linear Regression. Retrieved from: http://onlinestatbook.com/2/regression/intro.html |

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. Linear Regression (n.d.). Retrieved April 30, 2017, from http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm |

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. This lesson introduces the concept and basic procedures of simple linear regression. We will also learn two measures that describe the strength of the linear association that we find in data. Lesson 1: Simple Linear Regression. (n.d.). Retrieved April 30, 2017, from https://onlinecourses.science.psu.edu/stat501/node/250 |

### Multi-Linear Regression

Linear regression attempts to find a linear interaction between one explanatory variable and one outcome variable. Multiple linear regression allows the same type of analysis, but with multiple explanatory variables. For example, perhaps class performance is a function of both time spent watching class videos and time spent interacting on Piazza - multiple linear regression would allow us to evaluate both of these together. Here are some sources on multiple linear regression.

In this lesson, we make our first (and last?!) major jump in the course. We move from the simple linear regression model with one predictor to the multiple linear regression model with two or more predictors. Lesson 5: Multiple Linear Regression. (n.d.). Retrieved April 29, 2017, from https://onlinecourses.science.psu.edu/stat501/node/283 |

Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y. Multiple Linear Regression (n.d.). Retrieved April 30, 2017, from http://www.stat.yale.edu/Courses/1997-98/101/linmult.htm |

This lesson considers some of the more important multiple regression formulas in matrix form. Lesson 5: Multiple Linear Regression. (n.d.). Retrieved April 30, 2017, from https://onlinecourses.science.psu.edu/stat501/node/283 |

### Non-Linear Regression

You might speculate that the interaction between study time and class performance is non-linear. After all, is the difference between 100 hours of studying and 101 going to be as significant as the difference between 0 hours and 1? Non-linear regression generalizes linear regression to apply not just to straight lines, but to any function, such as exponential and logarithmic functions. The mathematics are still the same, but the power introduced can be useful. Here are some sources on non-linear regression.

Fitting curves to data using nonlinear regression: a practical and nonmathematical review Many types of data are best analyzed by fitting a curve using nonlinear regression, and computer programs that perform these calculations are readily available. Like every scientific technique, however, a nonlinear regression program can produce misleading results when used inappropriately. This article reviews the use of nonlinear regression in a practical and nonmathematical manner to answer the following questions: Why is nonlinear regression superior to linear regression of transformed data? How does nonlinear regression differ from polynomial regression and cubic spline? How do nonlinear regression programs work? What choices must an investigator make before performing nonlinear regression? What do the final results mean? How can two sets of data or two fits to one set of data be compared? What problems can cause the results to be wrong? This review is designed to demystify nonlinear regression so that both its power and its limitations will be appreciated. Motulsky, H. J., & Ransnas, L. A. (1987). Fitting curves to data using nonlinear regression: a practical and nonmathematical review. The FASEB journal, 1(5), 365-374. |

Before choosing nonlinear regression, make sure you don't really need another kind of regression. Also read about how nonlinear regression differs from linear regression. Data Driven Fitting. (n.d.). Retrieved April 30, 2017, from http://blogs.mathworks.com/loren/2011/01/13/data-driven-fitting/ |

Getting started with nonlinear regression Distinguishing nonlinear regression from other kinds of regression. Getting started with nonlinear regression (n.d.). Retrieved April 29, 2017, from http://www.graphpad.com/guides/prism/6/curve-fitting/index.htm?stat_other_kinds_of_regression.htm |

Getting started with nonlinear regression Distinguishing nonlinear regression from other kinds of regression Getting started with nonlinear regression (n.d.). Retrieved April 30, 2017, from http://www.graphpad.com/guides/prism/6/curve-fitting/index.htm?stat_other_kinds_of_regression.htm |

### Beginner Courses

Inferential statistics are concerned with making inferences based on relations found in the sample, to relations in the population. Inferential statistics help us decide, for example, whether the differences between groups that we see in our data are strong enough to provide support for our hypothesis that group differences exist in general, in the entire population. |

Intro to Descriptive Statistics Statistics is an important field of math that is used to analyze, interpret, and predict outcomes from data. Descriptive statistics will teach you the basic concepts used to describe data. This is a great beginner course for those interested in Data Science, Economics, Psychology, Machine Learning, Sports analytics and just about any other field. |

Intro to Inferential Statistics Inferential statistics allows us to draw conclusions from data that might not be immediately obvious. This course focuses on enhancing your ability to develop hypotheses and use common tests such as t-tests, ANOVA tests, and regression to validate your claims. |

Statistics is about extracting meaning from data. In this class, we will introduce techniques for visualizing relationships in data and systematic techniques for understanding the relationships using mathematics. |

### Advanced Courses

Learn basic statistics in a practical, experimental way, through statistical programming with R, using examples from the health sciences. |

Learn the R statistical programming language, the lingua franca of data science in this hands-on course. |

Quantitative Research Methods: Multivariate This course is the second semester in the statistics sequence for political science and public policy offered in the Political Science Department at MIT. The intellectual thrust of the course is a presentation of statistical models for estimating causal effects of variables. The model of an effect is a conditional mean (though we might imagine other effect). The notion of causality is the effect of one variable on another holding all else constant. |

### Beyond Stats: Data Science & ML

Online and software-based learning tools have been used increasingly in education. This movement has resulted in an explosion of data, which can now be used to improve educational effectiveness and support basic research on learning. |

We built this program with expert analysts and scientists at leading technology companies to ensure you master the exact skills necessary to build a career in data science. |

This course is part of the Microsoft Professional Program Certificate in Data Science. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from Duke University and Microsoft. |

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science. |

This is a class that will teach you the end-to-end process of investigating data through a machine learning lens. It will teach you how to extract and identify useful features that best represent your data, a few of the most important machine learning algorithms, and how to evaluate the performance of your machine learning algorithms. |

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you'll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you'll learn about some of Silicon Valley's best practices in innovation as it pertains to machine learning and AI. |

This class is offered as CS7641 at Georgia Tech where it is a part of the Online Masters Degree (OMS). Taking this course here will not earn credit towards the OMS degree. |