T-Test: Independent Samples

A t-test is used to determine if two sets of data are different enough to conclude that the underlying populations from which the data were drawn are also different. In a controlled experiment, a t-test could be used to determine if control and experimental conditions differed after the application of some treatment, such as evaluating the effectiveness of some learning intervention. In survey research, a t-test could be used to determine if responses drawn from different populations differ, such as evaluating whether domestic and international students view the Georgia Tech OMS program equally favorably. The following sources explain simple t-tests.

 The T-Test William M.K. Trochim Oct 20, 2006 The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-group randomized experimental design. Trochim, W. M. K. (2006) The T-Test. Retrieved from: http://www.socialresearchmethods.net/kb/stat_t.php What Is a t-test? And Why Is It Like Telling a Kid to Clean Up that Mess in the Kitchen? Patrick Runkel Jun 10, 2013 A t-test is one of the most frequently used procedures in statistics. But even people who frequently use t-tests often don?t know exactly what happens when their data are wheeled away and operated upon behind the curtain using statistical software like Minitab. Runkel, P. (2013) What Is a t-test? And Why Is It Like Telling a Kid to Clean Up that Mess in the Kitchen? Retrieved from: http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-is-a-t-test-and-why-is-it-like-telling-a-kid-to-clean-up-that-mess-in-the-kitchen T-Test (Independent Samples) Statwing A t-test helps you compare whether two groups have different average values (for example, whether men and women have different average heights). T-Test (Independent Samples) (n.d.) Retrieved from: http://docs.statwing.com/examples-and-definitions/t-test/

T-Test: Matched Pairs

Simple t-tests will take care of many analyses, but there are slightly more complicated versions for more complex analyses. For example, t-tests assume that the two groups don't interact. But what if you wanted to test the effectiveness of a learning tool without doing a controlled experiment? What if you simply wanted to evaluate whether the students knew more after using the tool than they knew before? For that, you would use a matched t-test, where you pair up connected values. The following sources explain matched t-tests.

 Inferences from Matched Pairs, Marlo F. Triola Jan 1, 2010 From Chapter 8 of Elementary Statistics Triola, M. F. (2010). Elementary statistics. Boston: Addison-Wesley. Dependent T-Test for Paired Samples Laerd Statistics The dependent t-test is testing the null hypothesis that there are no differences between the means of the two related groups. If we get a statistically significant result, we can reject the null hypothesis that there are no differences between the means in the population and accept the alternative hypothesis that there are differences between the means in the population. Dependent T-Test for Paired Samples (n.d.) Retrieved from: https://statistics.laerd.com/statistical-guides/dependent-t-test-statistical-guide-3.php Hypothesis Test: Difference Between Paired Means Stat Trek This lesson explains how to conduct a hypothesis test for the difference between paired means. Hypothesis Test: Difference Between Paired Means. (n.d.). Retrieved April 30, 2017, from http://stattrek.com/hypothesis-test/paired-means.aspx?Tutorial=AP Paired t-test Duke University This tutorial will walk you through the use of the paired t-test. Paired t-test (n.d.) Retrieved from: http://sites.nicholas.duke.edu/statsreview/means/paired/

T-Test: One-Sample

A third kind of t-test, the one-sample t-test, can be used when we know the population mean and want to evaluate whether or not a particular sample matches that population mean. For example, we may want to evaluate whether incoming Georgia Tech OMS students have average GRE scores that match the GRE average - the GRE average is known, and we may take one sample of incoming OMS students and compare their mean GRE scores to the GRE average. The following sources explain one-sample t-tests.

 One Sample t Test Kent State Apr 12, 2017 The One Sample t Test determines whether the sample mean is statistically different from a known or hypothesized population mean. LibGuides: SPSS Tutorials: One Sample t Test. (n.d.). Retrieved April 30, 2017, from http://libguides.library.kent.edu/SPSS/OneSampletTest One-Sample t-Test Emory University The one-sample t-test is used when we want to know whether our sample comes from a particular population but we do not have full population information available to us. One-Sample t-Test. (n.d.). Retrieved April 29, 2017, from http://www.psychology.emory.edu/clinical/bliwise/Tutorials/TOM/meanstests/tone.htm Independent One-Sample T-Test Explorable An independent one-sample t-test is used to test whether the average of a sample differ significantly from a population mean, a specified value ?0. Independent One-Sample T-Test (n.d.) Retrieved from: https://explorable.com/independent-one-sample-t-test

T-Test: Calculators

Generally, though, you won't do the math for t-tests by hand. You might use advanced statistical like SPSS or R, but you can also take advantage of simple online calculators like the ones below.

 One sample t test GraphPad Software T-Test Calculator for 2 Dependent Means Social Science Statistics From SocialScienceStatistics.com t-test Calculator GraphPad Software

ANOVA

t-tests test for differences between groups. However, if you read that information carefully, you may notice something problematic. t-tests usually use a confidence level of 95%, but that means one in every twenty tests could be flawed. What if we want to compare five different groups for differences between any pair of groups? We would need twenty comparisons, raising the odds of getting a false positive. An Analysis of Variance, or ANOVA, test helps prevent this by giving one test that compares for differences among multiple groups

 Oneway Analysis of Variance Rosie Cornish Jan 1, 2006 Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments when treatments, processes, materials or products are being compared. Cornish, R. (2006) Oneway Analysis of Variance. Retrieved from: http://www.statstutor.ac.uk/resources/uploaded/onewayanova.pdf ANOVA: ANalysis Of VAriance between groups St. John's University ANOVA: ANalysis Of VAriance between groups (n.d.) Retrieved from: http://www.physics.csbsju.edu/stats/anova.html ANOVA Calculator St. John's University Online ANOVA calculator ANOVA Explorable The Analysis Of Variance, popularly known as the ANOVA, can be used in cases where there are more than two groups. ANOVA (n.d.) Retrieved from: https://explorable.com/anova ANOVA Eric Weisstein, Wolfram MathWorld Analysis of Variance. A statistical test for heterogeneity of means by analysis of group variances. ANOVA is implemented as ANOVA[data] in the Wolfram Language package ANOVA. Weisstein, E. W. (n.d.) ANOVA. From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/ANOVA.html Introduction to Analysis of Variance David M. Lane Learning Objectives 1) What null hypothesis is tested by ANOVA 2) Describe the uses of ANOVA Lane, D. M. (n.d.) Introduction to Analysis of Variance. Retrieved from: http://onlinestatbook.com/2/analysis_of_variance/intro.html One-Way Analysis of Variance for Independent or Correlated Samples Vassar Stats Online ANOVA calculator

MANOVA

A MANOVA, or Multivariate Analysis of Variance, evaluates whether multiple categories predict variance across some variable. For example, a MANVOA could tell us if there are interactions between gender and international status in predicting students' perception of the OMS program. Here are some sources on MANOVA.

 A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists Russell Warne Nov 18, 2014 Reviews of statistical procedures (e.g., Bangert & Baumberger, 2005; Kieffer, Reese, & Thompson, 2001; Warne, Lazo, Ramos, & Ritter, 2012) show that one of the most common multivariate statistical methods in psychological research is multivariate analysis of variance (MANOVA). However, MANOVA and its associated procedures are often not properly understood, as demonstrated by the fact that few of the MANOVAs published in the scientific literature were accompanied by the correct post hoc procedure, descriptive discriminant analysis (DDA). The purpose of this article is to explain the theory behind and meaning of MANOVA and DDA. I also provide an example of a simple MANOVA with real mental health data from 4,384 adolescents to show how to interpret MANOVA results Warne, R. T. (2014). A primer on multivariate analysis of variance (MANOVA) for behavioral scientists. Practical Assessment, Research & Evaluation, 19(17). Multivariate Analysis of Variance (MANOVA) Penn State University The Multivariate Analysis of Variance (MANOVA) is the multivariate analog of the Analysis of Variance (ANOVA) procedure used for univariate data. We will introduce the Multivariate Analysis of Variance with the Romano-British Pottery data example. Lesson 8: Multivariate Analysis of Variance (MANOVA). (n.d.). Retrieved April 29, 2017, from https://onlinecourses.science.psu.edu/stat505/node/159 MANOVA n R Robert Kabacoff If you have been analyzing ANOVA designs in traditional statistical packages, you are likely to find R's approach less coherent and user-friendly. A good online presentation on ANOVA in R can be found in ANOVA section of the Personality Project. Robk@statmethods.net, R. K. (n.d.). ANOVA. Retrieved April 30, 2017, from http://www.statmethods.net/stats/anova.html

Linear Regression

t-tests and ANOVA both aim to state if two different groups are different. Thus, they rely on data that can be split based on some discrete categories, such as whether the data comes from a pre-test or a post-test, or whether the data comes from a first-year student or a second-year student. However, what if our explanatory variable is continuous instead of discrete? What if, for example, we wanted to see if class performance varied as a function of number of forum contributions or time spent watching videos? In this case, we would be looking at regression. The simplest form of regression is linear regression, where one variable is assumed to be a linear function of another.

 Simple Linear Regression Using Excel (watch on YouTube) Joseph Snider Jun 15, 2010 YouTube video tutorial Snider, J. (n.d.) Simple Linear Regression Using Excel. Retrieved from: https://www.youtube.com/watch?t=128&v=IO8RQ-V3Xmw Introduction to Linear Regression David M. Lane Aug 16, 2015 Learning Objectives 1) Define linear regression 2) Identify errors of prediction in a scatter plot with a regression line Lane, D. M. (2015) Introduction to Linear Regression. Retrieved from: http://onlinestatbook.com/2/regression/intro.html Linear Regression Calculator GraphPad Linear Regression Yale University Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. Linear Regression (n.d.). Retrieved April 30, 2017, from http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm Simple Linear Regression Penn State University Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. This lesson introduces the concept and basic procedures of simple linear regression. We will also learn two measures that describe the strength of the linear association that we find in data. Lesson 1: Simple Linear Regression. (n.d.). Retrieved April 30, 2017, from https://onlinecourses.science.psu.edu/stat501/node/250

Multi-Linear Regression

Linear regression attempts to find a linear interaction between one explanatory variable and one outcome variable. Multiple linear regression allows the same type of analysis, but with multiple explanatory variables. For example, perhaps class performance is a function of both time spent watching class videos and time spent interacting on Piazza - multiple linear regression would allow us to evaluate both of these together. Here are some sources on multiple linear regression.

 Multiple Linear Regression Penn State University In this lesson, we make our first (and last?!) major jump in the course. We move from the simple linear regression model with one predictor to the multiple linear regression model with two or more predictors. Lesson 5: Multiple Linear Regression. (n.d.). Retrieved April 29, 2017, from https://onlinecourses.science.psu.edu/stat501/node/283 Multiple Linear Regression Yale University Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y. Multiple Linear Regression (n.d.). Retrieved April 30, 2017, from http://www.stat.yale.edu/Courses/1997-98/101/linmult.htm Multiple Linear Regression Penn State University This lesson considers some of the more important multiple regression formulas in matrix form. Lesson 5: Multiple Linear Regression. (n.d.). Retrieved April 30, 2017, from https://onlinecourses.science.psu.edu/stat501/node/283

Non-Linear Regression

You might speculate that the interaction between study time and class performance is non-linear. After all, is the difference between 100 hours of studying and 101 going to be as significant as the difference between 0 hours and 1? Non-linear regression generalizes linear regression to apply not just to straight lines, but to any function, such as exponential and logarithmic functions. The mathematics are still the same, but the power introduced can be useful. Here are some sources on non-linear regression.

Beginner Courses

 Inferential Statistics Annemarie Zand Scholten, Emiel van Loon, Coursera Inferential statistics are concerned with making inferences based on relations found in the sample, to relations in the population. Inferential statistics help us decide, for example, whether the differences between groups that we see in our data are strong enough to provide support for our hypothesis that group differences exist in general, in the entire population. Intro to Descriptive Statistics Udacity Statistics is an important field of math that is used to analyze, interpret, and predict outcomes from data. Descriptive statistics will teach you the basic concepts used to describe data. This is a great beginner course for those interested in Data Science, Economics, Psychology, Machine Learning, Sports analytics and just about any other field. Intro to Inferential Statistics Udacity Inferential statistics allows us to draw conclusions from data that might not be immediately obvious. This course focuses on enhancing your ability to develop hypotheses and use common tests such as t-tests, ANOVA tests, and regression to validate your claims. Intro to Statistics Udacity Statistics is about extracting meaning from data. In this class, we will introduce techniques for visualizing relationships in data and systematic techniques for understanding the relationships using mathematics.

 Explore Statistics with R edX Learn basic statistics in a practical, experimental way, through statistical programming with R, using examples from the health sciences. Introduction to R Programming edX Learn the R statistical programming language, the lingua franca of data science in this hands-on course. Quantitative Research Methods: Multivariate MIT OpenCourseWare This course is the second semester in the statistics sequence for political science and public policy offered in the Political Science Department at MIT. The intellectual thrust of the course is a presentation of statistical models for estimating causal effects of variables. The model of an effect is a conditional mean (though we might imagine other effect). The notion of causality is the effect of one variable on another holding all else constant.

Beyond Stats: Data Science & ML

 Big Data in Education University of Pennsylvania Online and software-based learning tools have been used increasingly in education. This movement has resulted in an explosion of data, which can now be used to improve educational effectiveness and support basic research on learning. Data Analyst Nanodegree Udacity We built this program with expert analysts and scientists at leading technology companies to ensure you master the exact skills necessary to build a career in data science. Data Science Essentials edX, Microsoft This course is part of the Microsoft Professional Program Certificate in Data Science. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from Duke University and Microsoft. Intro to Data Science Udacity The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science. Intro to Machine Learning Udacity This is a class that will teach you the end-to-end process of investigating data through a machine learning lens. It will teach you how to extract and identify useful features that best represent your data, a few of the most important machine learning algorithms, and how to evaluate the performance of your machine learning algorithms. Machine Learning Udacity, Stanford Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you'll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you'll learn about some of Silicon Valley's best practices in innovation as it pertains to machine learning and AI. Machine Learning Udacity, Georgia Tech This class is offered as CS7641 at Georgia Tech where it is a part of the Online Masters Degree (OMS). Taking this course here will not earn credit towards the OMS degree.