Project 2 - Data FAQ

How do I get my dataset into RStudio?

Under the Files tab in the bottom right corner of RStudio you should see a button called Upload (with a yellow up arrow). Click on that, and then click on Choose File and find your data file and hit OK. You should then see this file listed in the Files tab.

My dataset is in a .csv or .txt, how can I read it into RStudio?

Once you have successfully uploaded your file to RStudio, you will need to explicitly load it in order to get it into your Workspace. In order to load the data into your Workspace, click on Import Dataset (under the Workspace tab on the top right corner of RStudio), then click on From Text File…

and choose your data file from the list. In the new window make sure that the appropriate options are selected for heading, separator, etc.

After clicking Import the window will close and your console will now include the command that is necessary to import your data. In order to use this dataset as a part of your write up, you will need to include this command in your .Rmd file. For example, if your data file’s name is “d_prj1.csv” then the command will look something like the following (the arguments and function name may be slightly different):

d_prj1 = read.csv("d_prj1.csv")

What should I do if my data is in a more exotic format (SAS, SPSS, Stata, etc.)

R provides a library for reading data in from other statistical software and other common data formats. There are too many options to list here but the first place you should check for a function capable of reading in your data is the foreign package. To get a list of the various helper functions run the following command:

library(help=foreign)

If you run into any issue please let us know as soon as possible so we can help you in a timely fashion.

If I have counts for categorical data in a table form how do I recreate the data in R?

You can make use of the rep() function in R to do this.

Here is a small example. Let’s say I have the following contingency table:

married not married total

mature mom

25

107

132

younger mom

361

506

867

total

386

613

999

In order to recreate a data set where each row represents one respondent, use the following code:

mature = c(rep("mature mom", 132), rep("younger mom", 867))
married = c(rep("married", 25), rep("not married", 107), rep("married", 361), rep("not married", 506))

You can make a table using the following code to double check that the data looks like the original contingency table.

table(mature, married)

You can also create a new data set where these data are variables make up the columns of the data frame.

momdata = as.data.frame(cbind(mature,married))

Once you have your data you can write it out to a .csv file using the following.

write.csv(momdata, file = "moms.csv", quote = FALSE, row.names = FALSE)

You should now see a file called mom.csv in the Files window. You can select that file and export it (check the box next to the file, click on More, and then Export…) so that you can have a copy of this file on your computer, and can submit it with your project. For more information on the write.csv() function, use the following.

?write.csv