Assignment: You should analyze one of the student-created datasets described below. As usual, your task is to use Rmarkdown to write a short report, readable by a technically literate person. The code you used should not be visible in the final report (unless you have a good reason to show it). You will have time to discuss this in groups (the same groups as before), but you should write up the report yourself (in your own words).
You can use the following function to find out which dataset to
analyze, where g
is your group number:
f <- function (g) {
groups <- c(1, 3, 4, 5, 8)
c(groups, groups)[match(g, groups) + 3]
}
For instance, if you are in group 3, then f(3) = 8
, so
you should analyze the “group 8” dataset below.
Due: Submit your work via Canvas by the end of the day (midnight) on Tuesday, December 3rd.
The X-files unit of the FBI is interested in the number of recent UFO sightings in the United States. The agents suspect that the government is testing alien technology at military bases. To help assess this hypothesis, they have surveyed 150 towns in America. Previous “research” has indicated that UFO sightings are more common in rural areas, so they chose a random 150 towns from the smallest 25% of towns in America. For each town, they recorded the population size (in 1000s) and the distance to the nearest military base (in miles). Their tech-savvy crew of three nerds scoured the Internet to count up the number of UFO sightings posted about in each town during 2020 (though they did not control for unique UFOs or usernames). Since one of the agents thinks UFO sightings are made up by conspiracy theorists, they have also recorded the 2020 voter turnout rate in each town.
The questions that the agents are interested in answering are:
Dataset: CONFIDENTIAL_ufo_sightings.csv
We have conducted a survey on 150 students from 3 different major, (sociology, biology, and fine arts) and asked them to record their daily social media usage. The applicants voluntarily installed an app on their phone to track their daily social media usage and at the end of the month their average daily usage was recorded. We included information about genders and range of ages (18-30) in the survey.
They then participated in an electronic test that measures one’s attention span and strength of their focus through various games and questions, and the possible scores ranged from 40 (worse attention span) to 100 (best able to focus without distraction).
None of the applicants were diagnosed with any mood or attention disorders prior to this test, and to the best of their knowledge they were not using any medication that would effect their attention span and ability to focus.
Before starting the test they were asked again if they feel ready and comfortable to take the test, and they had the chance to opt out at any time; the applicants were instructed not to do any heavy physical or intellectual activity on the day of the test.
We want to know to what degree the demographic variables (age, gender and major) and the number of hours spent on social media are correlated with attention span (as measured by the test score).
Dataset: social_media.csv
The North Pole surveys cities every year to ask children if they have seen reindeer flying on Christmas Eve (not really, but let’s be festive and pretend!). The data contains survey data collected over 50 years (from 1972 to 2021) on 5 different cities - Nuuk, Montreal, Longyearbyen, Moscow, and Reykjavik. The “surveyed” column reports the number of children in each city who were surveyed that year. The “yes_reindeer” column records the number of children surveyed who reported reindeer sightings on Christmas Eve. Note that each city varies in terms of their distance from the North Pole - you can look at a map to see how far they are. With the data provided, please answer the following questions:
How, if at all, is the number of children who reported reindeer sightings influenced by location (city, and distance from the North Pole), and year? Hint: you should account for the fact that the number of individuals surveyed (and thus the number of “yeses” recorded) varies by both city and year.
Based on your results, which cities are more/less likely to have higher prevalence of reindeer sightings?
Dataset: reindeer.csv
Researchers used a cross-sectional design to examine alcohol use in a clinical sample of adolescents. The sample consisted of 400 adolescents receiving substance use treatment from a university-affiliated hospital system. Data were collected by trained research assistants who conducted clinical assessment interviews with the participants during the treatment admission/intake process. Only those who reported at least one binge drinking or high-intensity drinking episode within the seven days prior to intake were eligible to participate in the study. Variables include:
Drinking episode level: Binge drinking episode (consuming 5-9 standard drinks in a row) or high-intensity drinking episode (consuming 10+ standard drinks in a row) in the past seven days. If an adolescent reported drinking episodes at both levels, the level with greater risk (i.e., high-intensity drinking) was recorded.
Age: In years.
Gender: Self-reported gender identity (girl, boy, non-binary).
Revised Adverse Childhood Experiences Checklist score: Number of adverse childhood experiences (ACEs; e.g., abuse, maltreatment, stressful life events, environmental stressors), with a higher number indicating a greater number of adverse experiences. The revised checklist consists of 14 different adverse experiences.
Past 7-day cannabis use: Days of cannabis use over the past seven days
Beck Depression Inventory score: A validated measure of depression severity, with a total score ranging from 0-63. A higher score indicates more severe depression. The recognized ranges are: minimal depression (0-13), mild depression (14-19), moderate depression (20-28), and severe depression (29-63).
Beck Anxiety Inventory score: A validated measure of anxiety severity, with a total score ranging from 0-63. A higher score indicates more severe anxiety. The recognized ranges are: minimal anxiety (0-7), mild anxiety (8-15), moderate anxiety (16-25), and severe anxiety (26-63).
Coping motives for alcohol use score: A subscale from the Drinking Motives Questionnaire measuring how often alcohol use is motivated by decreasing stress, avoidant behavior, and alleviating negative affect. The total score ranges from 0-20, with a higher score indicating drinking is more often motivated by coping.
Social motives for alcohol use score: A subscale from the Drinking Motives Questionnaire measuring how often alcohol use is motivated by drinking to improve social situations and interactions. The total score ranges from 0-20, with a higher score indicating drinking is more often socially motivated.
Question: What characteristics differentiate adolescents who reported a high-intensity drinking episode from those who reported a binge drinking episode?
Dataset: alcohol.csv
The goal of this study was to explore the effects of multiple factors on patients’ blood sugar. For this study, 100 participants ate pancakes and drank juice as their first meal of the day. Participants were provided with juice boxes by the study, but were given the option of receiving pancakes from the study or bringing them homemade from home. The participants were then asked to record the amount of pancakes (in grams) and the amount of juice (mL) consumed. In addition, they were asked to note whether the pancakes were homemade or provided by the study. Finally, the participants were asked to record how much time they had spent exercising (in minutes) earlier that day. 15 minutes after the meal, the participants received a blood sugar test to determine their blood sugar in mg/dL.
The goal of the analysis is to determine which (if any) of the variables including pancake consumed (g), juice consumed (mL), whether the pancakes were homemade, and time spent exercising (min) significantly predict the blood sugar (mg/dL) of the participant
Dataset: blood_sugar.csv