STA304 Assignment 2 Solved

35.00 $

Category:

Description

5/5 - (1 vote)

Instructions
There is a starter Rmd file (called Assignment2.Rmd) available for you to use to start your code.
Your submission will consist of three components:
1. .Rmd file (submitted as a Group)
2. .pdf file (submitted as a Group)
3. Completion of Assignment 2 – Group Work Survey (completed as an individual – even if you worked alone)
Group Work Submission
AND .Rmd then you will receive a 0 on this Assignment.
Please only submit your final work to the actual submission page. There will not be multiple attempts awarded for errors in submitting on the wrong page or incomplete submissions. So please be careful and mindful of what you are submitting and use the draft page to practice the submission process.
Individual Submission
Individual Item to Submit
This survey must be completed by ALL STUDENTS (even if you are working on Assignment 2 as an individual).
Assignment grading
There is one part to this assignment and it is to produce a report on a data analysis. The report focusses on theory/methodology, data analysis and communication/writing. We recommend you spellcheck and proofread your written work. We will be directly marking the pdf files, thus please ensure that your final submission looks as you want it to look before submitting it.
do NOT submit the pdf in your submission you will receive a grade of 0.
Group Work
All group members will receive the same grade on Assignment 2.
Report
Objective
To predict the overall popular vote of the next Canadian federal election (tentatively 2025) using a regression model with post-stratification.
Description:
In this assignment you will create an “Introduction”, “Data”, “Model” (or “Methods”), “Results” and a “Conclusions” section of a report, based on a post-stratification analyses. It is recommended that you use the General Social Survey (GSS) as the “census” data, and data from the CES2019 package as “survey” data.
The idea is, as a small team (of size 1-4) you will work through the following steps:
1. Load in the sample/survey data (CES data).
2. Build a model (any model is acceptable) on the sample data. Note: any model is acceptable, but some justification (either practical or statistical) should be given. (Some options: meaningful variables, p-values, LRTs, AIC, BIC, etc.) 3. Load in the census data (GSS data).
4. Calculate yˆPS.
General Social Survey (GSS) – Census Data
CES – Survey Data
Report Components
Introduction
The goal of the Introduction section is to introduce the overall “problem” to the reader.
Your Introduction section should include the following:
• Describe the data and the problem in 2-3 clear sentences.
• Introduce the importance of the analysis.
• Get the reader interested/excited about analysis.
• Provide some background/context explaining the overall relevance of the problem/data/analysis.
• Introduce terminology and prep the reader for the following sections. For example, here you should explain different political terms if they are niche.
• Introduce research question.
• Introduce any hypotheses (hypotheses should be decided on prior to performing your analysis and should have some mild justification).
• Inline referencing.
Data
The goal of the Data section is to introduce the reader to the data set, showcase some meaningful aspects of the data, and get them thinking about potential hypotheses/findings. Your Data section should include the following:
• A description of the data collection process.
• A summary of the cleaning process (if you cleaned the data). Someone (who is NOT necessarily familiar with Tidyverse functions) should be able to read this section and reproduce your cleaning process based off reading your description.
• A description of the important variables.
• Some text (and perhaps graphical summaries) of the variables you will use in your model. This should help prep the reader in understanding why the subsequent analysis is important/interesting and whether it is appropriate.
• At least 1 aesthetically pleasing plot/graph/figure (No more than 4 plots).
• Text explaining/highlighting each table or figure.
• In line referencing if needed.
• Reference the programming language/software used to complete this section.
Methods
The goal of the Methods section is to introduce the reader to the statistical methods that you will be using to analyze the data.
Your Methods section should include the following:
• A complete explanation of each methodology you are using. So a thorough explanation of the regression model and a thorough explanation of poststratification.
• Here you will describe the chosen model (e.g., if you decide to perform linear regression you must write out the mathematical model, with symbols (not numbers) and describe the parameters and variables included).
• Give some justification for why this model was selected.
• Here you will also give an explanantion of the poststratification process. I.e., explaining yˆPS.
• This should include a description of what poststratification is (in non-statistical language) and a description on why it is useful.
• As part of the poststratification technique you should also describe the cell/bin splits that you will display/implement in the Results, based on the sample data. Here you should briefly recall the variables that you are using to create the cells (again, the full description of these should be in the Data section). You can briefly justify the choice to include or exclude certain variables when creating the cells/bins. (For example, choosing “province” because it is likely to influence voter outcome because of…, or not including “eye colour” because it is not available in the census data).
• Explain any/all assumptions.
• An explanation of the parameters of interest.
• An explanation of the method for a general science reader (i.e., not a statistician).
• A description of why the method is appropriate (based off assumptions, variable types and practical rationale).
• In line referencing • In line R code (if needed).
Results
The goal of the Results section is to present the results of the statistical analyses to the reader. Your Results section should include the following:
• The results of the methodologies included in the report.
• An explanation/interpretation of the results.
• Some commentary on whether or not the results seem reasonable.
• Text explaining/highlighting each table or figure.
• In line referencing.
• In line R code to produce output in text (E.g. The mean is `r mean(x)`.).
Conclusions
The goal of the Conclusions section is to present the story of your analysis to the reader. Your Conclusions section should include the following:
• A brief recap of the hypotheses, methods, and results.
• State (or re-iterate) your key results.
• State any reasonable conclusions drawn from the results.
• An explanation/interpretation of the results.
• Some commentary on any drawbacks/limitations.
• Recommendations for Next Steps for future analyses/reports.
Bibliography
A well formatted bibliography, including references in a well formatted list. These should have been referred to in the text above.
General Notes:
• All tables/figures should be well labelled and clean.
• Everything should be written in full sentences/paragraphs.
• There should be no evidence that this is a class assignment, I should be able to take a copy of this report and paste it into a newspaper/blog without needing to implement any edits.
• There should be no raw code in the pdf. All output should be nicely formatted/presentable.
• You will also need a reference/bibliography section. You should reference the data, any outside code/documentation and any ideas/concepts that are taken outside of the course.
• Note, we are not marking grammar, but we are looking for clarity. If you need help with writing there are resources posted on the Course Info>Resources page of Quercus. It is important that you communicate in a clear and professional manner. I.e., no slang or emojis should appear.
• Remember to end each section with a concluding sentence. This means reiterating the key points from your writing.
• You are more than welcome to perform a prediction of a different election (e.g., predict the 2024 U.S.A. election or the outcome of the next British Columbia provincial election) in lieu of the next Canadian federal election, just be sure to still perform a regression and poststratification (i.e., create a model on sample data and poststratify on some census data).

  • STA304-Assignment2-m8mlts.zip