Exploratory Data Analysis
In this assignment, you will use R to perform EDA on a dataset. The objective is for you to practice the steps of formulating and answering questions through visual analysis.
For the write up (3-5 pages, APA formatted) walk me through your understanding of EDA, both systematically from an R perspective and how you apply the visualization to your analysis (example: the histogram showed a normal distribution which tells me that the mean is…..). That is what I mean by formulating and answering questions. A normal distribution is something you look for, outliers is something you look for, missing values, etc. you look for these things and make decisions based on what you see….aka EDA! 🙂
Choose a dataset: https://vincentarelbundock.github.io/Rdatasets/dat…
- Profile and examine the data (if you have no a priori knowledge, you will need to get to know your data).
- Pose questionsâ€”lather, rinse, and repeat as necessary.
- Create visualizations: Interact with the data, and if necessary, refine your questions.
- For your paper, you will want to keep a record of your analysis and prepare at least one final graphic and caption to answer an interesting question.
During your exploration of the data, create a record of various types of views: bar charts, scatterplots, time series, maps, etc., and question what you are exploring. Are there any noticeable differences in the views to support different questions? Do they reveal areas for further questions or exploration?
- After assessing the data and/or its description, write down a few initial questions that you think the data may answer (comparative/correlation questions such as relationships between pairs of variables, geographically oriented questions, or time-related or trend questions).
- Visually examine the data for answers to your initial questions.
- If needed, refine your initial questions based on what you find. For example, did you uncover a subset of something interesting with the data? If so, isolate the subset of the data that contains an interesting feature.
- Consider transforming your data (averages, medians, run the five-number summary, convert numbers to percentages, etc.).
- In your paper, focus on what you foundâ€”both the expected and the unexpected.
Assignments will be graded based on the following.
- basic description of data set contents, size, and perceived quality.
- the description of your visual exploration process; view types included (bar charts, scatterplots, maps, and time series)
- the depth of your analysis and design of your visualizations
- comments and evaluation of the visualization tool, including any improvements you might make
don’t forget that formatting always matters.