Introductory Videos


Introduction to Controlled Experiments (watch on YouTube)

David Joyner introduces Controlled Experiments as part of Research Principles and Methodologies.

Joyner, D. & Udacity. (2016, June 6). Research Principles and Methodologies: Controlled Experiments Introductory Video. Retrieved from https://www.youtube.com/watch?v=HL2cXBeqb6U

Introductory Resources

Experimental Design

This paper discusses simple experimental designs - a two-group program versus comparison group design.

Trochim. M.K.W. (2006). Experimental Design. Retrieved April 30, 2017, from http://www.socialresearchmethods.net/kb/desexper.php

Experimental designs for research

This paper discusses experimental designs and how it is used in research in detail.

Saint-Germain, M. (1998, February 14). Experimental designs for research. Retrieved April 30, 2017, from http://web.csulb.edu/~msaintg/ppa696/

Experimental Method

This article discusses the different types of experiments.

McLeod, S. A. (2012). Experimental Method. Retrieved from www.simplypsychology.org/experimental-method.html

Experimental research

This article discusses experimental research in psychology for describing, explaining, predicting and controlling behavior and mental processes. It also discusses experimental design.

Boundless. (2016, September 20). Experimental research. Retrieved April 30, 2017, from https://www.boundless.com/psychology/textbooks/boundless-psychology-textbook/researching-psychology-2/types-of-research-studies-27/experimental-research-126-12661/

Experimental and Quasi-Experimental Designs for Research

This book is a survey of the different experimental and quasi experimental designs. The survey draws not just from educational research but from the social sciences in general and the methodological recommendations are broadly appropriate.

Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin.

Scholarly Readings

What is Evidenced-Based Research?

This paper argues that education should become more evidence-based. The distinction is made between using existing research and establishing high-quality educational research. The need for highquality systematic reviews and appraisals of educational research is clear. Evidence-based education is not a panacea, but is a set of principles and practices for enhancing educational policy and practice.

Davies, P. (1999). What is evidence‐based education?. British journal of educational studies, 47(2), 108-121.

Practical guide to controlled experiments on the web: Listen to your customers not to the hippo

We provide several examples of controlled experiments with surprising results. We review the important ingredients of running controlled experiments, and discuss their limitations (both technical and organizational). We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses.

Kohavi, R., Henne, R. M., & Sommerfield, D. (2007, August). Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 959-967). ACM.

Controlled experiments on the web: survey and practical guide

We focus on several areas that are critical to experimentation, including statistical power, sample size, and techniques for variance reduction. We describe common architectures for experimentation systems and analyze their advantages and disadvantages. We evaluate randomization and hashing techniques, which we show are not as simple in practice as is often assumed. Controlled experiments typically generate large amounts of data, which can be analyzed using data mining techniques to gain deeper understanding of the factors influencing the outcome of interest, leading to new hypotheses and creating a virtuous cycle of improvements. Organizations that embrace controlled experiments with clear evaluation criteria can evolve their systems with automated optimizations and real-time analyses.s.

Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. M. (2009). Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery, 18(1), 140-181.

Seven Pitfalls to Avoid when Running Controlled Experiments on the Web

In this follow-on paper, we focus on pitfalls we have seen after running numerous experiments at Microsoft. The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate standard deviation and statistical power can be applied and ignoring robots in analysis (a problem unique to online settings). Online experiments allow for techniques like gradual ramp-up of treatments to avoid the possibility of exposing many customers to a bad (e.g., buggy) Treatment. With that ability, we discovered that it’s easy to incorrectly identify the winning Treatment because of Simpson’s paradox.

Crook, T., Frasca, B., Kohavi, R., & Longbotham, R. (2009, June). Seven pitfalls to avoid when running controlled experiments on the web. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1105-1114). ACM.

Lab Experiements are a major source of knowledge in the social sciences

Laboratory experiments are a widely used methodology for advancing causal knowledge in the physical and life sciences. With the exception of psychology, the adoption of laboratory experiments has been much slower in the social sciences, although during the last two decades, the use of lab experiments has accelerated. Nonetheless, there remains considerable resistance among social scientists who argue that lab experiments lack ‘realism’ and ‘generalizability’. In this article we discuss the advantages and limitations of laboratory social science experiments by comparing them to research based on non-experimental data and to field experiments. We argue that many recent objections against lab experiments are misguided and that even more lab experiments should be conducted.

Falk, A., & Heckman, J. J. (2009). Lab experiments are a major source of knowledge in the social sciences. science, 326(5952), 535-538.

Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained

Online controlled experiments are often utilized to make datadriven decisions at Amazon, Microsoft, eBay, Facebook, Google, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and mining of online controlled experiments at scale—thousands of experiments now—has taught us many lessons. These exemplify the proverb that the difference between theory and practice is greater in practice than in theory. We present our learnings as they happened: puzzling outcomes of controlled experiments that we analyzed deeply to understand and explain. Each of these took multiple-person weeks to months to properly analyze and get to the often surprising root cause. The root causes behind these puzzling results are not isolated incidents; these issues generalized to multiple experiments.

Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., & Xu, Y. (2012, August). Trustworthy online controlled experiments: Five puzzling outcomes explained. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 786-794). ACM.

Controlled Experiments in Education

Identifying and Implementing Educational Practices Supported by Rigorous Evidence

This Guide seeks to provide educational practitioners with user-friendly tools to distinguish practices supported by rigorous evidence from those that are not.

Coalition for Evidence-based Policy. (2003). Identifying and implementing educational practices supported by rigorous evidence: A user-friendly guide. US Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.

Randomized Trials and Quasi-Experiments in Education Research

The 2001 No Child Left Behind (NCLB) Act promises a series of significant reforms. The hope is that these reforms will jump-start under-performing American schools. Most public discussion of the Act has focused on the mandate for test-based school accountability and the federal endorsements of charter schools and other forms of school choice. Other important provisions include changes in funding rules for states and a new emphasis on reading instruction. The NCLB Act also repeatedly calls for education policy to rely on a foundation of scientifically based research. Although this appears to be a bland technical statement, it strikes me as potentially at least as significant as other components of the Act.

Angrist, J. D. (2003). Randomized trials and quasi-experiments in education research. NBER Reporter Online, (Summer 2003), 11-14.

Controlled experiment replication in evaluation of e-learning system?s educational influence

We believe that every effectiveness evaluation should be replicated at least in order to verify the original results and to indicate evaluated e-learning system?s advantages or disadvantages. This paper presents the methodology for conducting controlled experiment replication, as well as, results of a controlled experiment and an internal replication that investigated the effectiveness of intelligent authoring shell eXtended Tutor?Expert System (xTEx-Sys). The initial and the replicated experiment were based on our approach that combines classical two-group experimental design and with factoral design. A trait that distinguishes this approach from others is the existence of arbitrary number of checkpoint-tests to determine the effectiveness in intermediate states. We call it a pre-and-post test control group experimental design with checkpoint-tests. The gained results revealed small or even negative effect sizes, which could be explained by the fact that the xTEx-Sys?s domain knowledge presentation is rather novel for students and therefore difficult to grasp and apply in earlier phases of the experiment. In order to develop and improve the xTEx-Sys, further experiments must be conducted.

Grubi?i?, A., Stankov, S., Rosi?, M., & ?itko, B. (2009). Controlled experiment replication in evaluation of e-learning system?s educational influence. Computers & Education, 53(3), 591-602.

Randomized Controlled Experiments in Education

Randomized controlled trials (RCTs) are becoming an important tool for the evaluation of social policies. They borrow the principle of comparing a treated group and a control group that are chosen by random assignment from medical science, where they have been a standard since the Second World War. They provide a robust and transparent way of eliciting the causal impact of an intervention. The method is not new in educational science, especially in relation to cognitive science, but it is mostly used in very controlled, quasi-laboratory, environments, and on rather small samples.

Bouguen, A., & Gurgand, M. (2012). Randomized controlled experiments in education.

Quasi-Experiments in Schools: The Case for Historical Cohort Control Groups

There is increased emphasis on using experimental and quasi-experimental methods to evaluate educational programs; however, educational evaluators and school leaders are often faced with challenges when implementing such designs in educational settings. Use of a historical cohort control group design provides a viable option for conducting quasi-experiments in school-based outcome evaluation. A cohort is a successive group that goes through some experience together, such as a grade level or a training program. A historical cohort comparison group is a cohort group selected from pre-treatment archival data and matched to a subsequent cohort currently receiving a treatment. Although prone to the same threats to study validity as any quasi-experiment, issues related to selection, history, and maturation can be particularly challenging. However, use of a historical cohort control group can reduce noncomparability of treatment and control conditions through local, focal matching. In addition, a historical cohort control group design can alleviate concerns about denying program access to students in order to form a control group, minimize resource requirements and disruption to school routines, and make use of archival data schools and school districts collect and find meaningful.

Walser, T. M. (2014). Quasi-experiments in schools: the case for historical cohort control groups. Practical Assessment, Research & Evaluation, 19(6), 2.

Exemplary Controlled Experiments

What Level of Tutor Interaction is Best?

Abstract. Razzaq and Heffernan (2006) showed that scaffolding compared to hints on demand in an intelligent tutoring system could lead to higher averages on a middle school mathematics post-test. There were significant differences in performance by condition on individual items. For an item that proved to be difficult for all of the students on the pretest, an ANOVA showed that scaffolding helped significantly (p < 0.01). We speculated that the scaffolding had a greater positive effect on learning for this item because it was much more difficult for the students than the other items. We thought that this result warranted a closer look at the link between the difficulty of an item and the effectiveness of scaffolding. In this paper, we report on an experiment that examines the effect of math proficiency and the level of interaction on learning. We found an interesting interaction between the level of interaction and math proficiency where less-proficient students benefited from more

Razzaq, L., Heffernan, N. T., & Lindeman, R. W. (2007). What Level of Tutor Interaction is Best?. FRONTIERS IN ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 158, 222.

Comparing Pedagogical Approaches for Teaching the Control of Variables Strategy

Abstract In this study, an extension of Klahr and Nigam (2004), we tested 177 middle school students? on their acquisition of the control of variables strategy (CVS) using an interactive virtual ramp environment. We compared the effectiveness of three pedagogical approaches, namely, direct instruction with reification, direct instruction without reification, and discovery learning, all of which were authored using the ASSISTment system. MANCOVAs showed that all conditions performed equally on a CVS multiple-choice post test, but that the two direct learning conditions (with and without reification) significantly outperformed the discovery learning condition for constructing unconfounded experiments starting from an initially multiply confounded experimental setup. Keywords: scientific inquiry learning; web-based interactive environment; learning with microworlds; direct vs. discovery learning; control of variables strategy.

Sao Pedro, M., Gobert, J., Heffernan, N., & Beck, J. (2009). Comparing pedagogical approaches for teaching the control of variables strategy. In NA Taatgen & H. vanRijn (Eds.), Proceedings of the 31st Annual Meeting of the Cognitive Science Society (pp. 1294-1299).

Hints: Is it Better to Give or Wait to be Asked?

Abstract. Many tutoring systems allow students to ask for hints when they need help solving problems, and this has been shown to be helpful. However, many students have trouble knowing when to ask for help or they prefer to guess rather than ask for and read a hint. Is it better to give a hint when a student makes an error or wait until the student asks for a hint? This paper describes a study that compares giving hints proactively when students make errors to requiring students to ask for a hint when they want one. We found that students learned reliably more with hints-on-demand than proactive hints. This effect was especially evident for students who tend to ask for a high number of hints. There was not a significant difference between the two conditions for students who did not ask for many hints.

Razzaq, L., & Heffernan, N. (2010). Hints: is it better to give or wait to be asked?. In Intelligent Tutoring Systems (pp. 349-358). Springer Berlin/Heidelberg.

Does Immediate Feedback While Doing Homework Improve Learning?

Abstract Much of the literature surrounding the effectiveness of intelligent tutoring systems has focused on the type of feedback students receive. Current research suggests that the timing of feedback also plays a role in improved learning. Some researchers have shown that delaying feedback might lead to a ?desirable difficulty?, where students? performance while practicing is lower, but they in fact learn more. Others using Cognitive Tutors have suggested delaying feedback is bad, but those students were using a system that gave detailed assistance. Many web-based homework systems give only correctness feedback (e.g. web- assign). Should such systems give immediate feedback or might it be better for that feedback to be delayed? It is hypothesized that immediate feedback will lead to better learning than delayed feedback. In a randomized controlled crossover-?within-subjects? design, 61 seventh grade math students participated. In one condition students received correctness feedback immediately, while doing their homework, while in the other condition, the exact same feedback was delayed, to when they checked their homework the next day in class. The results show that when given feedback immediately students learned more than when receiving the same feedback delayed.

Kehrer, P., Kelly, K. M., & Heffernan, N. T. (2013, May). Does Immediate Feedback While Doing Homework Improve Learning?. In FLAIRS Conference.

Choice in Feedback Mediums

This was a pilot study to examine adding student preference to the ASSISTments platform. The purpose of this study was to see if providing students a choice in feedback style would alter performance and learning gains. Can students gauge whether video or text is best for their learning? Should students be given more control of their education? This study used a 2x2 crossover design, with two feedback mediums (video or text) crossed with two conditions (choice or no choice). Those in the experimental condition were asked to choose their feedback medium, while those in the control were randomly assigned to one of the two types of feedback.

Choice in Feedback Mediums - ASSISTments: As a researcher's tool. (n.d.). Retrieved from: https://sites.google.com/site/assistmentsstudies/studiesinprogress/choice-in-feedback-mediums