Duke DataFest 2019

April 5-7, 2019 at Penn Pavillion

ASA DataFest 2019 at Duke

ASA DataFest 2019 will take place at Duke over the April 5-7 weekend. More details on the event are below, and you can read about last year's challenge and winners here.

Event details

When: Friday, April 5 at 6pm - Sunday, April 7 at 5pm
Where: Penn Pavillion, Duke University

On Friday we will start with a reception where your surprise client will give a brief introduction to the data you will be working with over the weekend and tell you a bit about what they would like to get out of it. The data will likely be much more complex than what you are used to seeing in your classes, and you will be given free reign to analyze it however you like. In other words, you will come up with a research question that is of interest to you, and conduct the appropriate analysis to answer your question. But you are welcomed, and encouraged, to take cues from the client's introduction when shaping your research question(s).

Presentations and judging will begin ~2pm on Sunday. Each team will give a brief (5 minute) presentation of their findings to a panel of judges comprised of faculty and professionals from a variety of fields. There will be prizes in many categories, such as best visualization, best use of external data, and best findings. A finalized list of categories will be announced at the beginning of the competition.

What is DataFest?

ASA DataFestTM is a data analysis competition where teams of up to five students attack a large, complex, and surprise dataset over a weekend. Your job is to represent your school by finding and communicating insights into these data. The teams that impress the judges will win prizes as well as glory for their school. Everyone will have a great experience, lots of food, and fun!

ASA DataFestTM is also a great opportunity to gain experience that employers are looking for. Having worked on a data analysis problem at this scale will certainly help make you a good candidate for any position that involves analysis and critical thinking, and it will provide a concrete example to demonstrate your experience during interviews.

ASA DataFestTM at Duke is organized by the Department of Statistical Science at Duke University, and co-hosted by the Departments of Statistics and Operations Research at UNC and Statistics at NCSU.


While ASA DataFestTM is a competition, the main goal of the event is to promote collaboration. Here are some testimonials from past participants:

It was a great experience, with a fun and interesting challenge. One of my favorite parts is how varied the presentations and projects from each team are. I love learning about ways in which others looked at and analyzed the same problem/ data.

DataFest was an awesome experience. To me, the best part was working in a team of friends that I usually hung out with, but had not had a chance to work together intensively on a project. We enjoyed analyzing the situations and solving problems together for our client. At the end of the day, we just got to know each other better. It was also fun to interact with other teams to explore other approaches while keeping in mind that we were in competition. The fact that we were given a huge amount of data really challenges us to come up with creative and practical approaches. Another important part was the presentation. Every team had to explain well to the judges their objectives and solutions. Our team won the Best Visualization award which is really awesome. Lastly, the food was fantastic.

Past DataFests at Duke

DataFest 2018 - Data source: Indeed

Goal: What advice would you give a new high school about what major to choose in college? How does Indeed's data compare to official government data on the labor market? Can it be used to provide good economic indicators?

Find out more about the challenge and the winners here.

DataFest 2017 - Data source: Expedia

Goal: How do visitors' searches relate to the choices of hotels booked or not booked? What role do external factors play in hotel choice?

Expedia provided DataFesters with data from search results from millions of visitors around the world who were interested in traveling to destinations all over the world. The data were in two files, one of which included data collected on search results from visitors' sessions, and another which contained detailed information about the destinations that visitors searched for.

DataFest 2016 - Data source: Ticketmaster

Goal: How can site visits be converted to ticket sales, and how can TicketMaster identify "true fans" of an artist or band?

Data consisted of three sets. One included events from the last 12 months that tracked customer travel through the website. Another provided information about advertising campaigns on Google, and the third included data on the events themselves.

DataFest 2015 - Data source: Edmunds.com

Goal: Detect insights into the process of car shopping that can help make the process easier for customers.

Data consist of visitor 'pathways' through a website that helps customers configure car features and shop for cars. Five data files were linked by a customer key, and including data about the customer, about his or her visits to the webpage, and, when applicable, about the car purchased and the dealership where the car was purchased.

DataFest 2014 - Data source: GridPoint

Goal: Help understand how customers can best save money and energy.

Data consisted of a random sample of customers, with five-minute aggregates over a year of energy consumption that was then aggregated across important features of the commercial properties, as well as supporting climate and location data.

DataFest 2013 - Data source: eHarmony

Goal: Help understand what qualities people look for in prospective dates.

The DataFest students worked with a large sample of prospective matches. For each customer, data were provided on his or her preferences, as well as four matches, their preferences, and information about whether parties contacted one another.

DataFest 2012 - Data source: Kiva.com

Goal: Help understand what motivates people to lend money to developing-nation entrepreneurs and what factors are associated with paying these loans.

Several data sets were provided, including characteristics of lenders and borrowers and loan pay-back data.