TodoBI - Business Intelligence, Big Data, ML y AI TodoBI - Business Intelligence, Big Data, ML y AI

Business Intelligence for Hadoop Benchmark


Quite interested this Benchmark you can download from atscale, where you can find insights about Business Intelligence on Hadoop
If you are interested, check also our posts:
OLAP for Big Data. It´s possible?
List of Open Source Business Intelligence tools
Analysis Big Data OLAP sobre Hadoop con Apache Kylin   (spanish)
Caso de uso de Apache Kafka en tiempo real, Big Data   (spanish)
About the Benchmark:
Key Findings:
  • SQL-on-Hadoop engines are well suited for Business Intelligence (BI) : All tested engines – Hive, Impala, Presto,and Spark SQL – successfully executed all of the queries in our benchmark suite and are stable enough to support business intelligence workloads.

  • There is no single “best engine” : We continue to see the different engines shine in different areas. Depending on raw data size, query complexity, and the target number of end-users enterprises will find that each engine has its own ‘sweet spot’.

  • Version-to-version improvements are significant : The open source community continues to drive significant and rapid improvements across the board. All engines tested showed between 2x to 4x performance gains in the six months between the first and second edition of the benchmarks. This is great news for those enterprises deploying BI workloads to Hadoop.

  • Small vs. Big Data : Impala and Spark SQL continue to shine for small data queries (queries against the AtScale Adaptive Cache). New in this edition, the latest release of Hive LLAP (Live Long and Process) shows suitable “small data” query response times. Presto also shows promise on small, interactive queries.

  • Few vs. Many Users : While Impala continues to shine in terms of concurrent query performance, Hive and SparkSQL showed improvements in this category. Presto, new to this edition of the benchmarks, showed the best results in our user concurrency testing.