Hosted by


We are excited to host another Presto Meetup at our Facebook campus! Throughout 2014, we have seen increasing community interest, adoption, and participation. This event is a great opportunity for the Presto community to connect with each other, share experiences and discuss challenges and roadmap with Facebook Presto engineers.

 

Agenda

6:30pm - 7:00pm

Registration/Happy Hour

 

7:00pm - 7:30pm

Presto @ Facebook: Past, Present and Future, Dain Sundstrom, Facebook

 

7:30pm - 8:00pm

Presto @ Netflix: Interactive Queries at Petabyte Scale, Nezih Yigitbasi, Zhenxiao Luo, Netflix

 

8:00pm - 8:30pm

Providing Presto as a service at Treasure Data, Taro L Saito, Treasure Data

 

8:30pm - 9:00pm

Airpal, a UI Query Tool, Andy Kramolisch, James Mayfield, Airbnb

 

9:00pm - 9:30pm

Interactive SQL Querying of Amazon S3 through Presto on Amazon EMR, Steve McPherson, Amazon

 

9:30pm

Closing Remarks 


Speaker & Talk Details

Presto @ Facebook: Past, Present and Future, Dain Sundstrom, Facebook

Speaker Bio

Dain Sundstrom is a software engineer at Facebook and a founding member of Presto, an open source distributed SQL query engine for running interactive analytic queries.  In addition to Presto, Dain develops and maintains, the Airlift platform, Airline CLI library, and the Java ports of Snappy and LevelDB. Before joining Facebook, Dain was a founder of Apache Geronimo, a Java EE server, and one of the original authors of the JBoss Application Server.

 

Abstract

Presto has made a lot of progress in the 9 months since being open sourced.  In this talk we will detail the changes in the last 9 months, the new features currently being developed and the future roadmap.

 
Presto @ Netflix: Interactive Queries at Petabyte Scale, Nezih Yigitbasi, Zhenxiao Luo, Netflix

Speaker Bios

Nezih Yigitbasi is a senior software engineer @ Netflix's Big Data Platform team working on and contributing to Presto and Parquet projects. Previously he contributed to the design and implementation of various distributed systems in both academia and the industry, including Intel, Carnegie Mellon University, Delft University of Technology (TUDelft), Argela, and Telenity. He holds a PhD in computer science from TUDelft, and an MSc and a BSc in computer engineering from Istanbul Technical University.

 

Zhenxiao is a senior software engineer at Netflix currently working on Presto and Parquet. Before joining Netflix, he worked at Facebook, Cloudera, and Vertica, on HDFS, Hive, and relational databases. Zhenxiao holds a master's degree from the University of Wisconsin Madison, and a bachelor's degree from Fudan University.

 

Abstract

At Netflix we have been using Presto in production for around a year to address our interactive data processing use cases. Our production Presto deployment is now serving around 2.5K queries/day against our 10+ PB Hive data warehouse sitting on Amazon S3, our “source of truth”, and it’s growing. In this talk we will start with a brief overview of our production Presto deployment running on the Amazon cloud and show how things fit together. Then we will talk about our contributions, including our work on the Presto S3 file system, Parquet file format support, complex type support, new scalar/aggregation functions, and various other bug fixes/enhancements. We will also share our experiences with integrating Presto with our BI tools, namely Tableau and MicroStrategy. Finally, we will talk about our recent work on bringing vectorization support to the Parquet file format that we believe will improve the performance noticeably. We will conclude the talk with our future roadmap.

 
 

Providing Presto as a service at Treasure Data, Taro L Saito, Treasure Data

Speaker Bio

Taro L. Saito is a software engineer at Treasure Data, Inc. He received a Ph.D. of computer science at the University of Tokyo. Before joining Treasure Data, he had been working on genome sciences, database management systems and distributed computing as an assistant professor of the University of Tokyo.

 

Abstract

Treasure Data, Inc. provides Presto as a part of cloud service. We implemented monitoring of Presto clusters, blue-green deployment, ODBC/JDBC connectivity (named Prestogres), a custom connector, and some optimizers.  This session will talk about how we use and configure Presto as a cloud service provider.

 
 

Airpal, a UI Query Tool, Andy Kramolisch, James Mayfield, Airbnb

 Speaker Bio

James Mayfield is a Product Lead at Airbnb, focused on making the company more data informed. He spent seven years working at Facebook from 2006 - 2014 working primarily as a data analyst and product manager before making the transition to Airbnb. Building infrastructure and tools to help fellow employees explore, query, analyze, aggregate, and visualize data is his professional passion. In his personal time, he enjoys hanging out with family, swimming, and fixing up his old house.

 

Abstract

Airpal is a web-based, query execution tool which leverages Facebook's PrestoDB to make authoring queries and retrieving results simple for users. Airpal provides the ability to find tables, see metadata, browse sample rows, write and edit queries, then submit queries all in a web interface. Once queries are running, users can track query progress and when finished, get the results back through the browser as a CSV (download it or share it with friends). The results of a query can be used to generate new Hive table for subsequent analysis and Airpal maintains a searchable history of all queries run within the tool. This presentation will describe the basic user features as well as the architecture and technological components of the tool.


Interactive SQL Querying of Amazon S3 through Presto on Amazon EMR, Steve McPherson, Amazon

Speaker Bio

Steve McPherson is the Senior Manager of Amazon Elastic MapReduce, a managed Hadoop web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. Before starting in Amazon Elastic MapReduce, Steve was the CTO of Fabric Worldwide and the Director of Technology for Vivaki.

 

Abstract

Presto provides low latency, interactive, SQL querying of data on Amazon S3. This talk will show you how to get up and running with a production quality Presto Cluster in Amazon EMR, in less than five minutes. The Amazon EMR team will demonstrate how to provision and configure Presto Clusters in Amazon EMR, and will show some of the common customer use cases that are driving adoption of Presto in AWS.