\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Homework, week 13: Make up some data.

Assignment: I would like your group to make up some data - i.e., come up with a model and a story, simulate data from it, and provide the story, the data, and the question. You should submit the following things:

  1. A short document (1-2 paragraphs) describing how the data were (hypothetically) collected, and posing the problem that you would like another group to solve.
  2. The dataset, as a csv file, with informative column names.
  3. An R script that exactly recreates the dataset (at the top, you should use set.seed( ) to make the randomness always the same), and fits the model you have in mind (to verify that the questions are answerable from the data provided).

You do not need to simulate the data from a model we have used in class, but you should check that the model you have in mind to analyse the data does what you expect.

Next week, others (the “analysis” team) will get (1) and (2), but (3) is for only the instructors.

Here are some further requirements.

  1. The question you pose should be solvable by fitting a model that we have learned in this class. There should be at least three variables in the dataset, and at least 100 observations.
  2. Try to make it engaging and/or fun - silly situations are ok!
  3. But, it should still be realistic. The “analysis” team will be encouraged to find any impossibilities, such as negative weights, or unrealistic measurements.
  4. Also, try to keep it simple - the question should be answered by doing one analysis, not multiple, dependent steps.
  5. To ensure that the question is answerable using the data provided (so, that there is enough data and it’s not too noisy), you should actually fit the model you have in mind.
  6. The description should not include any statistical details (e.g., don’t say that the response is Poisson distributed or that it is a linear function of the explanatory variables).
  7. Please include at least one red herring, such as a potentially explanatory variable that doesn’t affect the response at all or a few “extreme outlier” values (as from measurement error).

Due: Submit your work via Canvas by the end of the day (midnight) on Wednesday, January 27th. (Note: not on Thursday as usual!)