Independent vs. dependent

Independent variable:

Dependent variable:

What we define as independent or dependent variable may depend on the context of the study.

Difference between association and causation

Sampling

population: a whole set of things, people, etc. that we want to describe or discover something about

The sample is a subset of the population that we are able to observe, experiment on, etc.; we can use it to infer facts about the population. We use a sample because it would be difficult or impossible to make observations about each member of the population.

A simple random sample is a special kind of sample. Each member of the population has an equal chance to be part of the sample; whether or not a member becomes a part of the sample is determined randomly. Once I have chosen one or more members to be part of the sample, that doesn't affect the probability of choosing other members of the population (except that no member of the population can be chosen twice). See text, pg. 340.

Descriptive vs. inferential statistics

descriptive statistics: Methods to summarize the information that we know about a population or sample

inferential statistics: Methods to allow us to use what we know about the sample and try to generalize this information for the whole population. We use the rules of probability to help us make sound inferences.

How and why we summarize data

When you have a large amount of data, it's important to be able to be able to get a ``general picture'' of the population quickly and easily.

Bar graphs

One way to represent nominal or ordinal data is to use a bar graph

Histograms

A histogram is often a good way to represent quantitative data.

Numerical summaries of data

Three characteristics to summarize:

If we understand these 3 characteristics, we can get a good understanding how the data are distributed.

Ways to measure central tendency

Three commonly used statistics are:

Mean

Median

Mode

Measuring dispersion

How much variation is there in the data? How are the data spread out across the different possible values?

Range

Inter-quartile range (IQR)

Quartiles are the data points that divide an ordered data set into quarters. So, the 1st quartile (Q1) is the data value that separates the bottom fourth of the data from the remainder; the 3rd quartile (Q3) separates the top fourth of the data from the remainder.

Standard deviation (SD)

If a distribution is mound-shaped and approximately symmetric, then we can use the following approximations:


File translated from TEX by TTH, version 1.50.