Comparacion de sistemas Open Source OLAP para Big Data

Admin oct. 11, 2019 0

Ya os hemos hablado en este blog mucho de nuestra solucion Open Source OLAP para Big Data preferida, que es Apache Kylin :

- x50 faster 'near real time' Big Data OLAP Analytics Architecture
- Use Case “Dashboard with Kylin (OLAP Hadoop) & Power BI”
- Cuadros de mando con Tableau y Apache Kylin (OLAP con Big Data)
- BI meet Big Data, a Happy Story
- 7 Ejemplos y Aplicaciones practicas de Big Data
- Analysis Big Data OLAP sobre Hadoop con Apache Kylin
- Real Time Analytics, concepts and tools
- Hadoop Hive y Pentaho: Business Intelligence con Big Data (Caso Practico)
Hoy os vamos a contar sobre otras alternativas gracias a Roman Lementov :
I want to compare ClickHouse , Druid and Pinot , t he three open source data stores that run analytical queries over big volumes of data with interactive latencies.

ClickHouse, Druid and Pinot have fundamentally similar architecture, and their own niche between general-purpose Big Data processing frameworks such as Impala, Presto, Spark, and columnar databases with proper support for unique primary keys, point updates and deletes, such as InfluxDB.

Due to their architectural similarity, ClickHouse, Druid and Pinot have approximately the same “optimization limit”. But as of now, all three systems are immature and very far from that limit. Substantial efficiency improvements to either of those systems (when applied to a specific use case) are possible in a matter of a few engineer-months of work. I don’t recommend to compare performance of the subject systems at all, choose the one which source code you are able to understand and modify, or in which you want to invest.

Among those three systems, ClickHouse stands a little apart from Druid and Pinot, while the latter two are almost identical, they are pretty much two independently developed implementations of exactly the same system.

ClickHouse more resembles “traditional” databases like PostgreSQL. A single-node installation of ClickHouse is possible. On small scale (less than 1 TB of memory, less than 100 CPU cores).

ClickHouse is much more interesting than Druid or Pinot, if you still want to compare with them, because ClickHouse is simpler and has less moving parts and services. I would say that it competes with InfluxDB or Prometheus on this scale, rather than with Druid or Pinot.

Druid and Pinot more resemble other Big Data systems in the Hadoop ecosystem. They retain “self-driving” properties even on very large scale (more than 500 nodes), while ClickHouse requires a lot of attention of professional SREs. Also, Druid and Pinot are in the better position to optimize for infrastructure costs of large clusters, and better suited for the cloud environments, than ClickHouse.

The only sustainable difference between Druid and Pinot is that Pinot depends on Helix framework and going to continue to depend on ZooKeeper, while Druid could move away from the dependency on ZooKeeper. On the other hand, Druid installations are going to continue to depend on the presence of some SQL database.

Currently Pinot is optimized better than Druid. (But please read again above — “I don’t recommend to compare performance of the subject systems at all”, and corresponding sections in the post.)

LinceBI, la mejor solución Big Data Analytics basada en Open Source

Curso Online de Power BI (17, 18 Febrero)

Pentaho Analytics

Material Big Data

Metodologias Agiles para Analytics (Business Intelligence, Big Data)

Comparacion de sistemas Open Source OLAP para Big Data

STDashboard, a free license way to create Dashboards

From Big Data to Fast Data

12 aplicaciones gratuitas para crear Dashboards

Como empezar a aprender Big Data en 2 horas

Metodologias Agiles para Analytics (Business Intelligence, Big Data)

Big Data para PowerBI

Los 9 problemas a los que se enfrentan las empresas que trabajan con datos

STCard, a free license way to create powerful Scorecards

Tutorial gratuito de Introduccion a Pentaho

Free whitepaper 'Big Data Analytics benchmark' for fastest Business Intelligence performance

Videotutorial: Usando R para Machine Learning con PowerBI

Las 50 claves para aprender y conocer PowerBI

STAgile, simple, just Dashboards in seconds

Tutorial de Introduccion a Talend Open Studio

LinceBI, the best Analytics/BigData open source based solution!!

Diccionario de Arquitecturas de Datos

11 Consejos sobre Bad Data: el enemigo silencioso en Business Intelligece y Big Data

Como extraer y trabajar con los datos de SAP

Videotutorial: Introduccion a Vertica Analytics Database

30 Consejos y Buenas Prácticas para hacer un proyecto de Power BI con éxito

Por que muchos Data Scientist estan dejando sus trabajos?

Videotutorial: Trabajando con Python en Power BI

Herramientas Business Intelligence Open Source

Top Business Intelligence Tools study

Cual es el roadmap para ser un Data Engineer