\[%% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]
Assignment: You should analyze one of the student-created datasets described below. As usual, your task is to use Rmarkdown to write a short report, readable by a technically literate person. The code you used should not be visible in the final report (unless you have a good reason to show it). You will have time to discuss this in groups (the same groups as before), but you should write up the report yourself (in your own words).
You can use the following function to find out which dataset to analyze, where g
is your group number:
f <- function (g) {1 + (g %% 6)}
For instance, if you are in group 2, then f(2) = 3
, so you should analyze the third dataset below.
Due: Submit your work via Canvas by the end of the day (midnight) on Tuesday, February 9th. Please submit both the Rmd file and the resulting html or pdf file. You will also give a brief presentation on the results in class on Thursday, February 11th.
Dataset: coyote_roadrunner_data.csv
A recent surge in accidental coyote deaths have been attributed to roadrunner hijinks in Arizona. Researchers are concerned about the possible impact this unprecedented decline in coyote population will have on the local ecosystem, and have turned to unconventional sources for insights-the 1949 cartoon “Roadrunner and Coyote”.
We are interested in modeling survival time of Coyote in the cartoon “Roadrunner and Coyote”. We sampled 500 episodes and tracked the time it took for Coyote to die. We also took note of the number of cliffs present, the average Roadrunner speed, number of traps set by Coyote, presence of other Loony Toons characters, and presence of anvils. Each row corresponds to an episode observed. We want to know which factors are significantly correlated with Coyote’s survival time.
Dataset: Dunder_Mifflin_Sales_Revenue.csv
Dunder Mifflin is a paper company with 7 branches in Scranton, Pennsylvania. David Wallace, the company’s CFO, is interested in evaluating each branchs’s success by looking at their monthly revenue collected from the last five years. This information was collated by the Head of the Accounting Department, Angela, after the Scranton branches had undergone a number of management changes. In particular, David wants to figure out which branch manager had the highest total revenue for the last five years. In addition, David knows that the all of branches enjoy holding copious amounts of meetings with the entire branch and he questions how much these meetings help the company increase profits. David also remembers that Dunder Mifflin has tried various sales initiatives to increase profit, but he can’t recall whether these were successful. Finally, David knows that certain months out of the year may bring in more funds, particularly in the back to school rush that occurs at the end of summer.
This report should help inform David Wallace of the effect of different managers, sales schemes, months of the year, and hours spent in branch meetings on monthly revenue. The report should analyze the 420 observations that Angela gathered and determine which parameter combinations lead to the highest revenue possible. Each observation includes:
Dataset: group3-mortality.csv
We are investigating the influence of different risk factors on the binary outcome of living or dying for a group of 500 patients who have received a positive COVID-19 test. The data were collected by volunteers hired to evaluate the patients. Out of desperation for willing volunteers, we had to accept an overly enthusiastic volunteer who decided to ask for patient’s horoscope signs. We recorded the following variables for each patient:
We are interested in identifying which variables most significantly influence the likelihood of death for these patients.
Dataset: memer-memes.csv
Memes are a well-loved part of our internet culture, and in 2020 the number of memes created online exploded. But what factors contribute to a person creating more memes? We collected data from 1000 people in Eugene on how many memes they created in 2020, along with some other factors that are thought to contribute to this number. Please build a model that fits the observed data, to predict which factors significantly contribute to the number of memes a person creates.
Variables:
Dataset: honey.csv
Dr. C. Robin and his undergraduate assistant K. Roo were wandering through a forest located near their lab when they observed a bear climbing a tree to eat honey. After receiving a walloping number of stings to its face, the bear slid back to the ground and ran away. As an Ethologist (an animal behavior scientist) Dr. Robin became curious as to what parameters determine the volume of honey that a bear can eat in a single snacking session. To conduct their study, Dr. Robin and K. Roo tagged and followed a bear designated WTP001 over the course of a five years. Hunger was recorded as high medium or low, based on loudness of tummy rumbles, the distance of a hive from the ground in meters, and the number of stings received by the bear during the attempt. After 1000 observations, they sat down to analyze the data to determine which were related to the amount of honey consumed by the bear (in liters). Of these four parameters, which did they find are significant in determining how much honey the bear consumes? And what effect do they have on the consumption (sign and magnitude)?
Dataset: ufos.csv
Researchers noticed there was a small town in New Mexico that had a surprising number of UFO sightings per year relative to other towns of a similar size in the United States. This prompted the researchers to conduct a survey over a 100-day period in this small town. The researchers were able to survey every person that lived full-time in this town, every day for the 100-day period. They collected six metrics for each day. One row in the data represents a single day of observation, with the following six variables:
The researchers would like to know if any of the variables (AQI, BAC, airplanes, clouds, xfiles) have an impact on the number of UFOs seen by the townspeople on any given day. The researchers are also interested in whether there are any interactions between the variables.