A key part of Pokémon Go is using evolutions to get stronger Pokémon, and a deeper understanding of evolutions is key to being the greatest Pokémon Go player of all time. The data set you will be working with for this assignment covers 75 Pokémon evolutions spread across four species. A wide set of variables are provided, allowing a deeper dive into what characteristics are important in predicting a Pokémon’s final combat power (CP).

Getting started

Here are the steps for getting started:

  • Start with an assignment link that creates a repo on GitHub with starter documents: https://classroom.github.com/a/kMWzaJVr
  • Clone this repo in RStudio
  • Make any changes needed as outlined by the tasks you need to complete for the assignment
  • Make sure all your code chunks are informatively named, and these labels re not repeated
  • Periodically commit changes (the more often the better, for example, once per each new task)
  • Push all your changes back to your GitHub repo

and voila, you’re done! Once you push your changes back you do not need to do anything else to “submit” your work. And you can of course push multiple times throughout the assignment. At the time of the deadline we will take whatever is in your repo and consider it your final submission, and grade the state of your work at that time (which means even if you made mistakes before then, you wouldn’t be penalized for them as long as the final state of your work is correct).

Data

Go to the codebook and find out which variabkes are included in this dataset, and what they represent.

Questions

For each exercise, write down an interpretation of your visualization (and what it reveals about the data) in 1-2 sentences max.

  1. Calculate the diference in heights pre and post evolution and save this as a new variable. Calculate the percentage of pokemon that grew during evolution. Also visualize the distribution of change in height by species and provide a discussion of how change in height varies across species.

  2. Recreate the following plot.

recreate

  1. Calculate the relative frequency of attack_strong_new becased on levels of attack_strong. That is, the relative frequency of post-evolution stronger attack based on the pre-evolution stronger attack. Do Pokemon tend to change their strong attack post-evolution?

  2. Pick two categorical variables and make a bar plot that depicts the relationship between them. These can be variables from the original data or ones that you create based on the given data.

  3. Pick a numerical and a categorical variable, and construct side-by-side box plots depicting the relationship between them.

  4. Learn something new: violin plots! Read about them at http://ggplot2.tidyverse.org/reference/geom_violin.html, and convert your side-by-side box plots from the previous task to violin plots. What, do the violin plots reveal that box plots do not? What features are apparent in the box plots but not in the violin plots?

  5. What characteristics correspond to an evolved Pokémon with a high combat power? You do not need to come up with an exhaustive list, but you should walk us through your reasoning for answering this question and include all relevant summary statistics and visualizations.

When you’re done, review the .md document on GitHub to make sure you’re happy with the final state of your work. Then go get some rest!

Getting help

Use the #questions channels on Slack to ask questions or posting on the GitHub Communiy. If your question is about an error you’re getting, make sure to clearly explain what generated the error as well as what the error says.

You are also welcomed to discuss the homework with each other broadly (no sharing code!) as well as ask questions at office hours.

Continuous integration and badges

Something new on this homework, and going forward on class assignments. Each time you push to GitHub a continuous integration tool (called wercker) will check to make sure your Rmd knits successfully. If it does, you’ll be able to see a badge on the README of your repo that says “build passing” (in green). If it fails, it’ll say “build failing” (in orange). If the build is failing you will need to fix your Rmd so that it knits without any errors and then push again. Some tips:

  • You should avoid leaving your repo in a failing stage.
  • You can click on the “build failing” badge to get more information.
  • A “build passing” badge is not guarantee that your work is correct, but instead just that your Rmd file knits.

Academic integrity

This is an individual assignment. You are welcomed to exchange ideas with classmates and ask questions on the getting help channels discussed above however you may not share your text or code answers directly with classmates.

The Duke Community Standard applies and course academic integrity policies apply. Please review them here. Specifically, the note on sharing / reusing code.

Grading

Total 100 pts
Questions 1-6 60 pts
Question 7 20 pts
Informatively named code chunks 5 pts
Commit after each question (at a minimum, more commits ok) 2 pts
Informative commit messages 3 pts
Document organization 4 pts
Quality of writing and grammar 6 pts