You can make use of the
rep() function in R to do this.
Here is a small example. Let's say I have the following contingency table:
|
|
married |
|
not married |
|
total |
mature mom |
|
25 |
|
107 |
|
132 |
younger mom |
|
361 |
|
506 |
|
867 |
total |
|
386 |
|
613 |
|
999 |
In order to recreate a data set where each row represents one respondent, use the following code:
mature = c(rep("mature mom", 132), rep("younger mom", 867))
married = c(rep("married", 25), rep("not married", 107), rep("married", 361), rep("not married", 506))
You can make a table using the following code to double check that the data looks like the original contingency table.
You can also create a new data set where these data are variables make up the columns of the data frame.
momdata = as.data.frame(cbind(mature,married))
Once you have your data you can write it out to a .csv file using the following.
write.csv(momdata, file = "moms.csv", quote = FALSE, row.names = FALSE)
You should now see a file called
mom.csv in the Files window. You can select that file and export it (check the box next to the file, click on More, and then Export...) so that you can have a copy of this file on your computer, and can submit it with your project. For more information on the
write.csv() function, use the following.
My dataset is in a .csv file on my computer. How can I get it into RStudio?
Under the Files tab in the bottom right corner of RStudio you should see a button called Upload (with a yellow up arrow). Click on that, and then click on Choose File and find your data file and hit OK. You should then see this file listed in the Files window.
This means that you have successfully uploaded your file to RStudio, but it's not yet in your Workspace. In order to get it in your Workspace, click on Import Dataset (under the Workspace tab on the top right corner of RStudio), then click on From Text File...
and choose your data file from the list. Make sure the radio button for Heading is selected for Yes (assuming that the first row of your dataset is the header row).
In order to use this dataset as a part of your write up, you need to include a piece of code in your .Rmd file to read the data in. Suppose your data file's name is "d_prj1.csv", and you want to call your dataset "d" then use the following:
```{r}
d = read.csv("d_prj1.csv")
```
Where can I find a list of R commands we've learned so far?
Click
here for a PDF with a list of useful R commands. The list will be updated as we progress through the semester.
How can I export my .Rmd file so that I can submit it on Sakai?
Locate the .Rmd file you want to export in the Files pane (lower right corner), check the box next it, then click on More -> Export, and then click on Download in the pop-up window.
How can I take a random sample of cases from my dataset?
Let's assume you want a random sample of 1000 observations, and you want to sample without replacement. This is a two step process:
- Generate random numbers between 1 and the number of rows in your original dataset, and store these.
rows_to_sample = sample(1:nrow(original_data), 1000, replace = FALSE)
- Grab the rows corresponding to the random numbers from the previous step, and store them in a new data set.
samp_data = original_data[rows_to_sample, ]
How can I make a plot visualizing the relationships between all of the variables in my dataset?
The simplest approach is to use the plot function on the entire dataset. The second approach is to use a new function from a contributed R package to get a much fancier plot. The two downsides with the second option are (1) It doesn't handle NAs automatically (this may not be an issue with your second project since there aren't many NAs), and (2) it takes a while to generate the plot so you'll need to be patient. The examples below use the ACS data from the multiple regression lab.
- plot function:
In this output you'll see that the lower diagonal of the plot matrix has repetitive information from the upper diagonal (same plots, with axes reversed). Also, depending on the number of variables you have, the plots may be small. If R complains about the plotting window being too small, just increase the size of your plotting window by dragging the margins in RStudio. You can use this to quickly determine which variables are related, then make single plots for those relationships that you'd like to view more closely. If you want to plot only certain variables, you can first make a subset, and then use the plot function.
- Subsetting based on column number: Only plot relationships between variables in columns 1 through 5.
- Subsetting based on variable names: First subset the data selecting variables that are numerical, and them plot the relationships between them.
acs_num = subset(acs,select = c("income","hrs_work","age","time_to_work"))
plot(acs_num)
- ggpairs function from the GGally package:
install.packages("GGally") # install package
library(GGally) # load package
acs_noNA=na.omit(acs)# omit rows with NAs
ggpairs(acs_noNA)
If you only want to plot certain columns of the dataset (say, 1 through 5), use
ggpairs(acs_noNA, columns = c(1:5))
This might be very useful since otherwise the plot gets very busy. Another parameter you might want to change in the ggpairs function is the font size of the correlation coefficients (they're pretty small by default).
ggpairs(acs_noNA, columns = c(1:5), params=list(corSize=10))
How can I calculate confidence intervals for the slopes in linear regression using R?
You can calculate confidence intervals for slopes manually (finding the appropriate t* for the degrees of freedom and confidence level you need), or you can use the confint function in R. The example below uses the ACS dataset from the multiple regression lab. You can either get confidence intervals for all slopes using:
m = lm(income ~ gender + hrs_work, data = acs)
confint(m)
or for one parameter at a time using:
confint(m, parm = "hrs_work")
Use the help file for the function to figure out how to change the confidence level.
How can I add a table to my report produced using knitr?
You can do this in one of two ways: (1) create the table elsewhere (like Excel), save table as an image, embed image in report. (2) create the table in knitr.
- Create the table elsewhere (like Excel), save table as an image, embed image in report: Once you create your table you can either save it as an image file, or take a screenshot. Save this image file in the same directory as your .Rmd file (by uploading it onto RStudio). Let's assume your image file is called "table_screenshot.png". In order to embed this in your report use the following code:
![table of blahs](table_screenshot.png)
where "table of blahs" is just a short description of your table.
- Create the table in knitr: Tables in knitr have the following structure
First Header | Second Header
------------- | -------------
Content Cell | Content Cell
Content Cell | Content Cell
Just replace the text with your content, and extend the table as needed.
Here is a quick example: I have this table that I created in Excel and saved as an image file by taking a screenshot on my computer and cropping around the table:
I upload this file into RStudio, in the same directory as the .Rmd file for my project. Then using the command
![table of blahs](table_screenshot.png)
I embed the file in my report. Alternatively, I can create the same table using the following code:
| col name 1 | col name 2 | col name 3
------------- | ----------- | ----------- | -----------
row name 1 | [some text] | [some text] | [some text]
row name 2 | [some text] | [some text] | [some text]
row name 3 | [some text] | [some text] | [some text]
The input and output look something like this:
How can I change the size of R code printed in my knitr document?
Add this to the top of your knitr document, and change 12px to a size you prefer. Do not use a size lower than 9px.
<style type="text/css">
code.r, code {
font-size: 12px; }
</style>
General
- Will tables be provided with the exams, or should we incorporate them into our cheat sheets?
Tables will be provided with the exam, you don't need to worry about bringing one to the exam with you.
- Can the cheat sheets be typed?
Sure, but they need to be prepared by you.
- Can I turn in a handwritten problem set?
No, you must type up the answers to the problem set in the space provided (under Assignments) in Sakai.
- I forgot my clicker today, can I write my responses on a piece of paper and get credit?
No, you need to have your clicker to be able to get credit for the day. Note that up to three unexcused late arrivals or absences will not affect your clicker grade.
- I have a class on East Campus before this class and I might be late to class. Will this affect my grade?
Yes and no. If there is a readiness assessment that day and you walk in late, you won't be given additional time and you may not be able to perform as well as you would have had you had more time. If there is no readiness assessment and you walk in just a few minutes late, you'll at most miss one or two clicker questions for the day. This shouldn't affect your score since answering at least 75% of the questions gets you a full score for the day.
- Is the final cumulative?
Yes. But it will be weighed more heavily towards material you haven't yet been tested on.