Data Science Ethics

# Data Science Ethics
### Dr. Maria Tackett
### 12.03.19

---

<div class="my-footer">

<a href="http://datasciencebox.org" target="_blank">datasciencebox.org</a>

</div>

---

### [Click for PDF of slides](12-data-science-ethics.pdf)

---

### Announcements

- Project Data Analysis **due TODAY at 11:59p**

- HW 06 - due **Friday 12/6 at 11:59p**

- Project final write up & presentation - due **Friday, 12/13 at 11:59p**

- Presentations Saturday, 12/14
    - Lab 01: 7p - 8p
    - Lab 02: 8p - 9p
    - Lab 03: 9p - 10p
    
- Exam 02 Extra Credit
  + 90% response rate on course eval: +3 pts on Exam 02 grades

---

### Agenda

1. Misrepresenting data

2. Misusing p-values

3. Privacy

4. Algorithmic bias

---

## Misrepresenting data

---

---

---
.question[
What is wrong with this graph? How would you fix it?
]

---

## Misusing p-values

---

### What is a p-value?

.center[
<iframe src="https://fivethirtyeight.abcnews.go.com/video/embed/56150342" width="640" height="360" scrolling="no" style="border:none;" allowfullscreen></iframe>

Source: ["Not Even Scientists Can Easily Explain p-values"](https://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/)
]

---

### Statistically significant?

]
.pull-right[
![](img/12/green_beans_2.png)
]

---

### *Let’s repeat it: P-values don’t necessarily tell you if an experiment “worked” or not*

from [800 scientists say it’s time to abandon “statistical significance”](https://www.vox.com/latest-news/2019/3/22/18275913/statistical-significance-p-values-explained)

---

### Alternate ways to evaluate evidence

- Concentrate on effect sizes
    - How big of a difference does an intervention make
    - Is it practically meaningful?

- Use confidence intervals to estimate effect size

- Ask whether the result is from a novel study or a replication (put some more weight into a theory many labs have looked into)

- Ask whether underlying data is freely accessible (so anyone can check the math)

Source: [800 scientists say it’s time to abandon “statistical significance”](https://www.vox.com/latest-news/2019/3/22/18275913/statistical-significance-p-values-explained)

---

### Alternate ways to evaluate evidence

- Use alternative statistical techniques like likelihood ratios and Bayes factors
 - P-values ask the question “how rare are my results?” 
 - Likelihood ratios and Bayes factors ask the question “what is the probability my hypothesis is the best explanation for the results we found?” 
 
- You'll learn more about these in future statistics classes 😄

Source: [800 scientists say it’s time to abandon “statistical significance”](https://www.vox.com/latest-news/2019/3/22/18275913/statistical-significance-p-values-explained)

---

## Privacy

---

### OK Cupid Data Breach

- In 2016, researchers published data of 70,000 OkCupid users—including usernames, political leanings, drug usage, and intimate sexual details.

>"Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.""

> Researchers Emil Kirkegaard and Julius Daugbjerg Bjerrekær

- Although the researchers did not release the real names and pictures of the OkCupid users, critics noted that their identities could easily be uncovered from the details provided—such as from the usernames.

[*OKCupid Study Reveals the Perils of Big-Data Science*](https://www.wired.com/2016/05/okcupid-study-reveals-perils-big-data-science/)

---

.question[
When collecting and analyzing social media data, how do you make sure you don't violate reasonable expectations of privacy?
]

![](img/12/okcupid-tweet.png)

]

---

### Facebook & Cambridge Analytica

[How Cambridge Analytica turned Facebook 'likes' into a lucrative political tool](https://www.theguardian.com/technology/2018/mar/17/facebook-cambridge-analytica-kogan-data-algorithm)

---

## Algorithmic bias

---

### The Hatahway Effect

---
.center[

![](img/12/hathaway.png)

]

["Does Anne Hathaway News Drive Berkshire Hathaway's Stock?"](https://www.theatlantic.com/technology/archive/2011/03/does-anne-hathaway-news-drive-berkshire-hathaways-stock/72661/)

---

### The Hathaway Effect

- **Oct. 3, 2008** - Rachel Getting Married opens: BRK.A up .44%

- **Jan. 5, 2009** - Bride Wars opens: BRK.A up 2.61%

- **Feb. 8, 2010** - Valentine’s Day opens: BRK.A up 1.01%

- **March 5, 2010** - Alice in Wonderland opens:  BRK.A up .74%

- **Nov. 24, 2010** - Love and Other Drugs opens: BRK.A up 1.62%

- **Nov. 29, 2010** - Anne announced as co-host of the Oscars: BRK.A up .25%

[The Hathaway Effect: How Anne Gives Warren Buffet a Rise](https://www.huffpost.com/entry/the-hathaway-effect-how-a_b_830041)

---

### Algorithms in Criminal Justice

---

### Machine Bias

There’s software used across the country to predict future criminals. And it's biased...

[Pro Publica, May 23, 2016](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)

---

>“Although these measures were crafted with the best of intentions, I am concerned that they inadvertently undermine our efforts to ensure individualized and equal justice,” he said, adding, “they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”

> Then  U.S. Attorney General Eric Holder (2014)

---

### ProPublica analysis

**Data:** Risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 + whether they were charged with new crimes over the next two years

--
  
**Results:**

- 20% of those predicted to commit violent crimes actually did

- Algorithm had higher accuracy (61%) when full range of crimes taken into account (e.g. misdemeanors)

![](img/12/propublica-results.png)

- Algorithm was more likely to falsely flag African American defendants as future criminals, at almost twice the rate as Caucasian defendants

---