Title: Data Processing on the Google Cloud Platform: from MapReduce to Dremel

Speaker: Oleg Golubitsky, Software Engineer, Google

Date/Time: Tuesday, February 24, 17:00 to 18:00

Location: DMB-2A301 Conestoga College, Kitchener

Abstract: This seminar will cover two data processing services on the Google Cloud Platform: Hadoop and BigQuery. Hadoop is based on an older technology called MapReduce, designed for batch processing of large data sets. This technology provides maximum flexibility and scalability. It allows programmers to express complex logic in a programming language such as Java or C++. One can fine-tune the performance by adjusting parameters such as the number of mappers/reducers and disk/CPU resources. But this flexibility also makes it harder to use. BigQuery is based on a more recent technology called Dremel. It supports a SQL-like query language. It is much easier to write SQL queries than to implement mappers/reducers. Dremel scales well with respect to the input size, and most queries run faster than equivalent MapReduce jobs. Dremel’s language may not appear as expressive as Java, yet we will see that one can actually implement in it all basic computations that are commonly done in MapReduce, including filtering, aggregation, and joins. The seminar will include a few tips and tricks on how to optimize Dremel queries.

Bio: Oleg Golubitsky received his PhD degrees in Mathematics from Moscow State University and in Computer Science from the University of New Brunswick. He worked as an Assistant Professor of Computer Science at the University of New Brunswick and as a Postdoctoral Fellow at the University of Pisa, Queen’s University, and the University of Western Ontario. Oleg’s publications are in the areas of computer algebra, differential algebra, handwriting recognition, quantum computing, and combinatorial design. He competed and coached for the ACM International Collegiate Programming Contest. Since 2009, Oleg is a Software Engineer at Google.

