Data Science Ethics 🔍

# Data Science Ethics <br> 🔍

---

layout: true
  
<div class="my-footer">
<span>
Dr. Mine Çetinkaya-Rundel -
<a href="http://www2.stat.duke.edu/courses/Fall18/sta112.01/schedule" target="_blank">stat.duke.edu/courses/Fall18/sta112.01
</a>
</span>
</div>

---

## Announcements

- Second draft of proposals due Friday
- Office hours this week: Thursday 2:30 - 4:30pm

---

## Outline

- Misrepresentation
- P-hacking
- Privacy
- Algorithmic bias

---

# Misrepresentation

---

![](img/cost_of_gas.png)

---

# P-hacking

---

## Significant

---

# Privacy

---

### Ok Cupid

---

## OK Cupid data breach

- In 2016, researchers published data of 70,000 OkCupid users—including usernames, political leanings, drug usage, and intimate sexual details.

>Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.  
>Researchers Emil Kirkegaard and Julius Daugbjerg Bjerrekær

- Although the researchers did not release the real names and pictures of the OkCupid users, critics noted that their identities could easily be uncovered from the details provided—such as from the usernames.

---

## Social media data best practices

.question[
In analysis of social media data, how do you make sure you don't violate reasonable expectations of privacy?
]

![](img/okcupid-tweet.png)

---

### Cambridge Analytica

---

## Facebook & Cambridge Analytica

![](img/facebook-cambridge-analytica-scandal-explained-the-guardian-graphic.jpg)

---

# Algorithmic bias

---

### Criminal Sentencing

---

## Machine Bias

![](img/propublica-criminal-sentencing.png)

There’s software used across the country to predict future criminals. And it’s biased against blacks.
.small[
[propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing), May 23, 2016
]

---

## A tale of two convicts

.pull-left[
![](img/propublica-prater-broden-1.png)
]
--
.pull-right[
![](img/propublica-prater-broden-2.png)
]

---

>“Although these measures were crafted with the best of intentions, I am concerned that they inadvertently undermine our efforts to ensure individualized and equal justice,” he said, adding, “they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”
>  
>Then U.S. Attorney General Eric Holder (2014)

---

## ProPublica analysis

**Data:** Risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 + whether they were charged with new crimes over the next two years

--
  
**Results:** 
- 20% of those predicted to commit violent crimes actually did
- Algorithm had higher accuracy (61%) when full range of crimes taken into account (e.g. misdemeanors)
![](img/propublica-results.png)
- Algorithm was more likely to falsely flag black defendants as future criminals, at almost twice the rate as white defendants
- White defendants were mislabeled as low risk more often than black defendants

---

Read more at [propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing).

---

### Building a racist AI

---

## How to make a racist AI without trying

.pull-left[
![](img/racist_ai_python.png)
.center[
[Link to post](https://blog.conceptnet.io/posts/2017/how-to-make-a-racist-ai-without-really-trying/)
]
]
.pull-right[
![](img/racist_ai_r.png)
.center[
[Link to post](https://notstatschat.rbind.io/2018/09/27/how-to-write-a-racist-ai-in-r-without-really-trying/)
]
]

---

![](img/hathaway.png)

[https://www.theatlantic.com/technology/archive/2011/03/does-anne-hathaway-news-drive-berkshire-hathaways-stock/72661/](Source)

---

### Sexist AI

---

## Amazon's experimental hiring algorithm

- Used AI to give job candidates scores ranging from one to five stars - much like shoppers rate products on Amazon, some of the people said
- Company realized its new system was not rating candidates for software developer jobs and other technical posts in a gender-neutral way
- Amazon’s system taught itself that male candidates were preferable

>Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said.
>[Source](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)

---

### Algorithmic justice

---

---

## Review

.question[
A company uses a machine learning algorithm to determine which job advertisement to display for users searching for technology jobs. Based on past results, the algorithm tends to display lower paying jobs for women than for men (after controlling for other characteristics than gender).

What ethical considerations might be considered when reviewing this algorithm?
]

---

# Continuing your education on data science ethics

---

## Further reading

.pull-left[
![](img/ethics-data-science.jpg)
]
.pull-right[
[Ethics and Data Science](https://www.amazon.com/Ethics-Data-Science-Mike-Loukides-ebook/dp/B07GTC8ZN7)  
by Mike Loukides, Hilary Mason, DJ Patil  
(Free Kindle download)
]

---

## Further reading

---

## Further reading

.pull-left[
![](img/weapons-of-math-destruction.jpg)
]
.pull-right[
[Weapons of Math Destruction](https://www.amazon.com/Ethics-Data-Science-Mike-Loukides-ebook/dp/B07GTC8ZN7)  
How Big Data Increases Inequality and Threatens Democracy
by Cathy O'Neil
]

---

## Further watching

.center[
<iframe width="560" height="315" src="https://www.youtube.com/embed/MfThopD7L1Y" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>  
Predictive Policing: Bias In, Bias Out  
by Kristian Lum presents
]

---

## Parting thoughts

- At some point during your data science learning journey you will learn tools that can be used unethically
- You might also be tempted to use your knowledge in a way that is ethically questionable either because of business goals or for the pursuit of further knowledge (or because your boss told you to do so)

.question[
How do you train yourself to make the right decisions (or reduce the likelihood of accidentally making the wrong decisions) at those points?
]

---

## Do good with data

- Data for democracy: https://www.datafordemocracy.org/
- Data Science for Social Good: https://dssg.uchicago.edu/
- DataKind: https://www.datakind.org/
- Sign the Manifesto for Data Practices: https://datapractices.org/manifesto/