Texata is a big data competition held yearly in Texas. 2015 is the second year that the event is being held.
There are 3 rounds each about 4hours each.
-
The first round was answering multiple choice questions that tested the theory and business of Data Science.
-
The second round was analyzing the NYC Txai Trips data.
The data scientist Chris Whong was able to get NYC 2013 taxi fare and usage data using the Freedom of information law. For the competition, the data was in Google Big Query, and available in Amazon s3 buckets. During the 4 hour window, we were asked to come up with a story/problem and share our work.
This year, I did not advance past round 2. Only the 12 best contestants in round 2 advance to the finals.
According to my results, I was in the top 10% for round 1 and top 30% for round 2.
Things I need to improve upon:
-
familiarity with BigQuery / Amazon Redshift During the competition, I used the BigQuery Ui. I wish I familiarized myself with python’s BigQuery interface so I could easily feed the data to graphing libraries like matplotlib.
-
familiarity with Geospatial data I wish I was more familiar with graphing Geospatial data. I had a 60mb csv file of latitudes and longitudes of taxi pickup locations and wanted to see how/if it changed over time but I was not familiar with tools for mapping this data.
-
phrasing the business problem Given an interesting data set, there are questions I am interested in answering. But I haven’t yet learned how to mine insights that may be useful for a business.
Hopefully next year I will do better.
Here is a markdown of the questions I was interested in answering.
Did you take part in this year’s competition? If so, please feel free to share your approaches and questions.