\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Homework number 8: Simulation challenge

Assignment: You should analyze one of the student-created datasets described below. As usual, your task is to use Rmarkdown to write a short report, readable by a technically literate person. The code you used should not be visible in the final report (unless you have a good reason to show it). You will have time to discuss this in groups (the same groups as before), but you should write up the report yourself (in your own words).

You can use the following function to find out which dataset to analyze, where g is your group number:

f <- function (g) {
    groups <- c(1, 3, 4, 5, 8)
    c(groups, groups)[match(g, groups) + 3]
}

For instance, if you are in group 3, then f(3) = 8, so you should analyze the “group 8” dataset below.

Due: Submit your work via Canvas by the end of the day (midnight) on Tuesday, December 3rd.

Group 1: UFOs

The X-files unit of the FBI is interested in the number of recent UFO sightings in the United States. The agents suspect that the government is testing alien technology at military bases. To help assess this hypothesis, they have surveyed 150 towns in America. Previous “research” has indicated that UFO sightings are more common in rural areas, so they chose a random 150 towns from the smallest 25% of towns in America. For each town, they recorded the population size (in 1000s) and the distance to the nearest military base (in miles). Their tech-savvy crew of three nerds scoured the Internet to count up the number of UFO sightings posted about in each town during 2020 (though they did not control for unique UFOs or usernames). Since one of the agents thinks UFO sightings are made up by conspiracy theorists, they have also recorded the 2020 voter turnout rate in each town.

The questions that the agents are interested in answering are:

  1. Do towns that are closer to military bases have more UFO sightings?
  2. Do towns with lower voter turnout rates have more UFO sightings?
  3. Do these effects depend on population size?

Dataset: CONFIDENTIAL_ufo_sightings.csv

Group 3: Social Media

We have conducted a survey on 150 students from 3 different major, (sociology, biology, and fine arts) and asked them to record their daily social media usage. The applicants voluntarily installed an app on their phone to track their daily social media usage and at the end of the month their average daily usage was recorded. We included information about genders and range of ages (18-30) in the survey.

They then participated in an electronic test that measures one’s attention span and strength of their focus through various games and questions, and the possible scores ranged from 40 (worse attention span) to 100 (best able to focus without distraction).

None of the applicants were diagnosed with any mood or attention disorders prior to this test, and to the best of their knowledge they were not using any medication that would effect their attention span and ability to focus.

Before starting the test they were asked again if they feel ready and comfortable to take the test, and they had the chance to opt out at any time; the applicants were instructed not to do any heavy physical or intellectual activity on the day of the test.

We want to know to what degree the demographic variables (age, gender and major) and the number of hours spent on social media are correlated with attention span (as measured by the test score).

Dataset: social_media.csv

Group 4:

The North Pole surveys cities every year to ask children if they have seen reindeer flying on Christmas Eve (not really, but let’s be festive and pretend!). The data contains survey data collected over 50 years (from 1972 to 2021) on 5 different cities - Nuuk, Montreal, Longyearbyen, Moscow, and Reykjavik. The “surveyed” column reports the number of children in each city who were surveyed that year. The “yes_reindeer” column records the number of children surveyed who reported reindeer sightings on Christmas Eve. Note that each city varies in terms of their distance from the North Pole - you can look at a map to see how far they are. With the data provided, please answer the following questions:

  1. How, if at all, is the number of children who reported reindeer sightings influenced by location (city, and distance from the North Pole), and year? Hint: you should account for the fact that the number of individuals surveyed (and thus the number of “yeses” recorded) varies by both city and year.

  2. Based on your results, which cities are more/less likely to have higher prevalence of reindeer sightings?

Dataset: reindeer.csv

Group 5: Alcohol Use in a Clinical Sample of Adolescents

Researchers used a cross-sectional design to examine alcohol use in a clinical sample of adolescents. The sample consisted of 400 adolescents receiving substance use treatment from a university-affiliated hospital system. Data were collected by trained research assistants who conducted clinical assessment interviews with the participants during the treatment admission/intake process. Only those who reported at least one binge drinking or high-intensity drinking episode within the seven days prior to intake were eligible to participate in the study. Variables include:

Question: What characteristics differentiate adolescents who reported a high-intensity drinking episode from those who reported a binge drinking episode?

Dataset: alcohol.csv

Group 8:

The goal of this study was to explore the effects of multiple factors on patients’ blood sugar. For this study, 100 participants ate pancakes and drank juice as their first meal of the day. Participants were provided with juice boxes by the study, but were given the option of receiving pancakes from the study or bringing them homemade from home. The participants were then asked to record the amount of pancakes (in grams) and the amount of juice (mL) consumed. In addition, they were asked to note whether the pancakes were homemade or provided by the study. Finally, the participants were asked to record how much time they had spent exercising (in minutes) earlier that day. 15 minutes after the meal, the participants received a blood sugar test to determine their blood sugar in mg/dL.

The goal of the analysis is to determine which (if any) of the variables including pancake consumed (g), juice consumed (mL), whether the pancakes were homemade, and time spent exercising (min) significantly predict the blood sugar (mg/dL) of the participant

Dataset: blood_sugar.csv