Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Apuntate al Curso de PowerBI. Totalmente práctico, aprende los principales trucos con los mejores especialistas

Imprescindible para el mercado laboral actual. Con Certificado de realización!!

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 8 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

LinceBI, la mejor solución Big Data Analytics basada en Open Source

LinceBI incluye Reports, OLAP, Dashboards, Scorecards, Machine Learning y Big Data. Pruébala!!

16 jul. 2019

Cloudera cambia de estrategia y se hace Open Source

Para los que pensaban que la compra de Hortonworks por parte de Cloudera iba a hacer peligrar el modelo open source, todo lo contrario. Cloudera será 100% Open Source, según acaban de afirmar (leer bien el enlace anterior)

Cloudera acaba de anunciar que se va a centrar en un modelo de servicios y soporte

Una gran noticia para todos los que trabajan con Stacks Big Data basados en Open Source, como LinceBI

9 jul. 2019

Glosario de Terminos de Business Intelligence

Para todos aquellos que se están introduciendo en el mundo del Business Intelligence, os incluimos un Glosario de los principales términos de Business Intelligence. 

Si queréis jugar con una Demo abierta, open source, para conocer y probar estos conceptos, es lo mejor para familiarizarse.

Glosario de Términos Business Intelligence:

  • Automated Analysis: Automatic analysis of data to find hidden insights in the data and show users the answers to questions they have not even thought of yet.
  • BI Analyst: As stated by, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things. (Can be called a data analyst too)
  • BI Governance: According to Boris Evelson, from Forrester Research, BI governance is a key part of data governance, but if focuses on a BI system and governs over who uses the data, when, and how.
  • Big Data: Enormous and complex data sets that traditional data processing tools cannot deal with.
  • Bottlenecks: Points of congestion or blockage that hinder the efficiency of the BI system.
  • Business Intelligence: According to Gartner, “Business Intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.”
  • Centralized Business Intelligence: A BI model that enables users to work connected and share insights, while seeing the same and only version of the truth. IT governs over data permissions to ensure data security.
  • Collaborative BI: An approach to Business Intelligence where the BI tool empowers users to collaborate between colleagues, share insights, and drive collective knowledge to improve decision making.
  • Collective Knowledge: Knowledge that benefits the whole enterprise as it comes from the sharing of insights and data findings across groups and departments to enrich analysis.
  • Dark Data: According to Gartner, the definition for Dark Data is “information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes”. 90% of companies’ data is dark data.
  • Dashboards: A data visualization tool that displays the current enterprise health, the status of metric and KPIs, and the current data analysis and insights.
  • Data Analyst: As stated by, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things.
  • Data Analytics: According to TechTarget, “data analytics is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software.”
  • Data Governance: According to Boris Evelson, from Forrester Research, data governance “deals with the entire spectrum (creation, transformation, ownership, etc.) of people, processes, policies, and technologies that manage and govern an enterprise’s use of its data assets (such as data governance stewardship applications, master data management, metadata management, and data quality).
  • Data Mashup: An integration multiple data sets in a unified analytical and visual representation.
  • Data Silos: According to Tech Target, a data silo is “data that is under the control of one department or person and is isolated from the rest of the organization.” Data silos are a bottleneck for effective business operations.
  • Data Sources: The source where the data to be analyzed comes from. It can be a file, a database, a dataset, etc. Modern BI solutions like Necto can mashup data from multiple data sources.
  • Data Visualization: The graphic visualization of data. Can include traditional forms like graphs and charts, and modern forms like infographics.
  • Data Warehouse: A relational database that integrates data from multiple sources within a company.
  • Embedded Analytics: The integration of reporting and data analytic capabilities in a BI solution. Users can access full data analysis capabilities without having to leave their BI platform.
  • Excel Hell: A situation where the enterprise is full of unnecessary copies of data, thousands of spreadsheets get shared, and no one knows with certainty which is the most updated and real version of the data.
  • Federated Business Intelligence: A BI model where users work in separate desktops, creating data silos and unnecessary copies of data, leading to multiple versions of the truth.
  • Geo-analytic capabilities: The ability that a BI or data discovery tool has to analyze data by geographical area and reflect such analysis on maps on the user’s dashboard.
  • Infographics: Visual representations of data that are easily understandable and drive engagement.
  • Insights: According to Forrester Research, insights are “actionable knowledge in the context of a process or decision.”
  • KPI: Key Performance Indicator. A quantifiable measure that a business uses to determine how well it meets the set operational and strategic goals. KPIs give managers insights of what is happening at any specific moment and allow them to see in what direction things are going.
  • Modern BI: An approach to BI using state of the art technology, providing a centralized and secure platform where business users can enjoy self-service capabilities and IT can govern over data security.
  • OLAP: Stands for Online Analytical Processing and it is a technology for data discovery invented by Panorama Software and then sold to Microsoft in 1996. It has many capabilities, such as complex analytics, predictive “what if” scenario planning, and limitless report viewing.
  • Scalability: The ability of a BI solution to be used by a larger number of users as time passes.
  • Self-Service BI: An approach that allows business users to access and work with data sources even though they do not have an analyst or computer science background. They can access, profile, prepare, integrate, curate, model, and enrich data for analysis and consumption by BI platforms. In order to have successful self-service BI, the BI tool must be centralized and governed by IT.
  • Smart Data: Smaller data sets from Big Data that are valuable to the enterprise and can be turned into actionable data.
  • Smart Data Discovery: The processing and analysis of Smart Data to discover insights that can be turned into actions to make data-driven decisions in an organization.
  • Social BI: An approach where social media capabilities, such as social networking, crowdsourcing, and thread-based discussions are embedded into Business Intelligence so that users can communicate and share insights.
  • Social Enterprise: An enterprise that has a new level of corporate connectivity, leveraging the social grid to share and collaborate on information and ideas. It drives a more efficient operation where problems are uncovered and fixed before they can affect the revenue streams.
  • SQL: Stands for Standardized Query Language. It is a language used in programming for managing relational databases and data manipulation.
  • State of the Art BI: The highest level of technology, the most up-to date features, and the best analysis capabilities in a Business Intelligence solution.
  • Suggestive Discovery Engine: An engine behind the program that recommends to the users the most relevant insights to focus on, based on personal preferences and behavior.
  • Systems of Insight: This is a term coined by Boris Evelson, VP of Forrester Research. It is a Business Intelligence system that combines data availability with business agility, where both IT and business users work together to achieve their goals.
  • Workboards: An interactive data visualization tool. It is like a dashboard that displays the current status of KPIs and other data analysis, with the possibility to work directly on it and do further analysis.

Visto en el blog de Panorama

26 jun. 2019

Verdades y Mentiras acerca del Software Libre

BBVA ha elaborado un estudio (unas 120 páginas) muy interesante sobre Open Source: Historia, Tecnologías, Modelos de Negocio, etc...

Ni dudeis en descargároslo

Empieza interesante...

25 jun. 2019

Tutorial: Creacion de Dashboards con soluciones Open Source

Cada vez son más demandados los Cuadros de Mando y la buena noticia es que gran parte de ellos pueden hacerse con soluciones Open Source: Pentaho, CDE, dc.js...

Como novedad, también puedes crearlos con StDashboard: How to create your own Dashboards in Pentaho

Os incluimos las principales claves para construir potentes Cuadros de Mando, del Curso de creación de Dashboards Open Source:

Si os ha interesado, podéis también:

- Ver ejemplos en funcionamiento de Cuadros de Mando Open Source
- Ver Galería de Cuadros de Mando y Video Tutorial de Cuadros de Mando Open Source
- Ver temario y Cursos presenciales e 'in company' para crear cuadros de mando de forma práctica
- Ver Cuadros de Mando con tecnologías Big Data 'Real Time?

También podéis ver este Video Tutorial muy práctico:

21 jun. 2019

Gestion de Proyectos con Redmine Analytics

Redmine Analytics es la solución complementaria a la herramienta para la gestión de proyectos Redmine, gracias al uso del Business Intelligence basado en open source Pentaho de LinceBI, con todos los modelos preparados y listos para su uso. También integrado con PowerBI

Consulta a nuestros compañeros de Stratebi

Modelo de Análisis de Productividad:

Modelo de Análisis por Proyectos:

Para una organización es de vital importancia articular los proyectos de una manera correcta y ágil en beneficio de la misma. 

Asociado a la ejecución de los proyectos, es igualmente importante conocer si el equipo que participa en los proyectos es productivo, así como si la previsión en cuanto a costes, calendario y esfuerzo se mantiene. 


Alerta sobre la Productividad:

Objetivo: Controlar las horas imputadas por empleado respecto a lo previsto
Comunicación: Vía e-mail. A cada empleado reporte de horas. A los manager, resumen por empleado de la diferencia de horas.

Alerta sobre el Consumo de Horas Estimadas:

Objetivo: Sobre proyectos y servicios ofrecidos, se identifican aquellos para los que se ha superado sobre lo estimado el 50%, 75%, 85% y 100%.
Comunicación: Vía e-mail

Alerta sobre la parametrización del proyecto:

Objetivo: Controlar la configuración de cada proyecto en Redmine en cuanto a establecimiento de fases, tiempo estimado de referencia, perfiles y coste por perfil.

Comunicación: Vía e-mail

Con las variables oportunas la toma de decisiones cobra sentido y permite hacer correctivos en tiempo y forma, y para ello Redmine Analytics ofrece todo lo necesario. 
Además, de forma totalmente automatizada

Tipos de roles en Analytics (Business Intelligence, Big Data)

Conforme va creciendo la industria de Analytics, se hace más dificil conocer las descripción de cada uno de los roles y puestos. Es más, generalmente se usan de forma equivocada, mezclando tareas, descripciones de cometidos, etc...

Esto lleva a confusión tanto a los propios especialistas, como a las personas que están formandose y estudiando para realizar estos trabajos. En una industria tan cambiante es frecuente la aparición y especialización de diferentes puestos de trabajos. Aquí, os detallamos cada uno de ellos:

Business Analyst:

Data Analyst:

Data and Analytics Manager:

Data Architect:

Data Engineer:

Data Scientist:

Database Administrator:


Te puede interesar tambien:

Como pasar una entrevista con Pentaho BI Open Source?
Skills en Data Analysts y sus diferencias
Empezar a aprender Big Data en 2 horas?

Visto en Kdnuggets

17 jun. 2019

Tutorial y Demo: trabajando con Grafana

Ya tenemos demo Grafana con datos públicos de ocupación del Ayuntamiento de Málaga recogidos mediante API. 

El propósito de este documento es recoger el proceso de creación de un cuadro de mandos que monitorice la situación de los parkings públicos de Málaga en tiempo real utilizando la herramienta Grafana.

Grafana es una herramienta de software libre que permite crear cuadros de mando y gráficas a partir de múltiples fuentes de datos. Suele ser utilizado para la visualización y monitorización de datos en tiempo real. 

En este ejemplo práctico el origen de datos será el portal de datos abiertos del Ayuntamiento de Málaga (, concretamente el conjunto de datos sobre la ocupación de los aparcamientos públicos municipales. Esta información se encuentra en formato CSV y se actualiza cada minuto.

Acceso Demo:
Usuario: demo
Pass: tKPnruDeN4YJWiTa

7 Ejemplos y Aplicaciones practicas de Big Data

En las siguientes Aplicaciones, Cuadros de Mando y ejemplos podéis ver el funcionamiento práctico del Big Data en diferentes casos y usando diferentes tecnologías: Kafka, Spark, Apache Kylin, Neo4J....

Acceder a los ejemplos

Si quieres saber más de Big Data, te pueden interesar estos enlaces:

OLAP for Big Data. It´s possible? 
Como empezar a aprender Big Data en 2 horas
List of Open Source Business Intelligence tools
Analysis Big Data OLAP sobre Hadoop con Apache Kylin (spanish)
Caso de uso de Apache Kafka en tiempo real, Big Data

14 jun. 2019

STCard Videotutorials (Open Source based Scorecard solution)

The improvements in this version of STCard, an open source based solution, are focused on user interface for panel and dashboard and also some enhancement in performance and close some old bugs:

- Import with ETL
- New KPIs always in red bug
- Tooltips and characters solved
- Export to PDF
- Modify colors of new scorecard
- Some other minus bugs...

It works with Pentaho and embeded in web applications

You can manage your organization with a powerful KPIs control with Balance Scorecard using STCard

You can see it in action in this Demo Online and as a part of LinceBI suite

STCard doesn´t requiere anual license, you can manage unlimited users and it´s open source based. 


- STCard 01 Global View
STCard 02 Create a new scorecard and security
STCard 03 Configuration
STCard 04 Planning and write back data
STCard 05 Scorecard Analysis and dashboard

STCard includes professional services (training, support and maintenance, docs and bug resolution - so, you have high enterprise level guaranteed -)

Interested? contact Stratebi or LinceBI

See a Video Demo:

About main functionalities:

STCard works on top of Pentaho, is the best tool for managing your KPIs (Key Performance Indicators), targets an keep track of your Balance Scorecard strategy

Fully integrated with Pentaho CE, you can leverage all the power of this Open Source BI Suite

STCard is an open source tool developed by StrateBI for the creation, management and analysis of Scorecards.
A Scorecard is a global management system within an organization that allows you to have a view of it based on a number of perspectives. All these as a whole define the vision and strategy of the organization.
To define a Scorecard you have to define a clear strategy:
  • Strategic Objectives for the units of the organization.
  • Indicators (KPI’s) that mark the fulfillment of the strategic objectives.
The main features of STCard are:
  • Flexibility: A Scorecard is always referred to an organization as a whole, but with STCard we can create a scorecard for a specific area of the organization. For example:Treasury Financial Area, Consolidation, Suppliers, etc. On the other hand, the concept of flexibility is applicable to the creation of a scorecard in terms of the number of strategic perspectives and objectives. As many as you like. The philosophy of Kaplan and Norton is not limited to 4 perspectives: customer, financial, internal business procedures and learning and growth. You can create as you need
  • Flexibility does not break with the original philosophy. A scorecard in STCAD consists of a weighted hierarchical structure of 3 levels:
    • Perspective: from what point of view we will see our system. For example, financial, quality, customers, IT, etc.
    • Strategic Objective: what is our goal. For example, increase profitability, customer loyalty, incentive and motivation HR, etc.
    • Indicator (KPI): the measure or metric. Indicators can be quantitative or qualitative (confirmation / domain values), and these always have a real value and a target value.
For the launch of the ScoreCard we can consider three scenarios:
  • This scenario has a rapid implementation, and only requires the definition of a load processes to obtain the information of the indicators of the organization and adapt it to STCard.
  • The organization lacks a system / repository of indicators.
    This variant requires more consulting work, because in the organization, first, a pure BI project must be carried out to obtain those indicators to be dealt with later in STCARD.
    For example: data sources; ETL processes; System / repository of indicators; Load processes in STCard.
  • Immediate start-up:
    It is the fastest alternative, only requires installation / configuration and training. Data management is done through Excel templates. No additional consulting work required.
    Users set values through Excel templates, where data is filled. These values are loaded into STCARD and after this, it is the users who interact with STCARD.

These are the main features of STCard:

More info:

STReport (Web Reporting Open Source based tool) Video Tutorials

You can see on this series of VideoTutorials, main features of STReport (best open source web reporting tool based, with no licenses and professional support included) and how it works STReport is part of LinceBI Open Analytics solution 1. STReport (creating simple report using rows, groups, filters) 2. STReport (Models, exploring categories and glossary) 3. STReport (Work area, hidden sections, limit results, info options...) 4. STReport...

STAgile Videotutorials (easy and fast web Dashboards from excel), open source based

STAgile is a quick and simple dashboard generator that gives the user the ability to create their own dashboards using Excel and CSV files including save, share, filter, export features... What does STAgile offer?     Simple design for intuitive operation     You don't have to write a single line of code     Generation of charts from Excel or CSV     Navigate through hierarchies using drill down  ...

STPivot (Web Analytics open source based) complete Videotutorials

You can see on this series of VideoTutorials, the main features of STPivot (best open source based web analysis tool, with no licenses and professional support included) and how it works Besides, you can embed, customize and modify in order to fit your needs STPivot is part of LinceBI Open Analytics solution 1. LinceBI OLAP interactive analysis 2. STPivot OLAP Analytics for Big Data  3. Powerful Forecasts in STPivot 4. STPivot...

STDashboard (Web Dashboard Editor open source based), Video Tutorials

You can see on this series of VideoTutorials, the main features of STDashboard (best open source based web dashboarding tool, with no licenses and professional support included) and how it works STDashboard is part of LinceBI Open Analytics solution 0. STDashboard (Dashboard for end users in minutes) 1. STDashboard (LinceBI Open Source BI/BigData Solution) 2. STDashboard (LinceBI Vertical Dashboarding Solution) 3. STDashboard...

Cuadros de Mando y Business Intelligence para Ciudades Inteligentes

Cada vez son más las ciudades que están implementando soluciones de Ciudades Inteligentes, Smart Cities... en donde se abarcan una gran cantidad de aspectos, en cuando a tecnologías, dispositivos, analítica de datos, etc...

Lo principal en todos ellos es que son soluciones que deben integrar información e indicadores diversos de todo tipo de fuentes de datos: bases de datos relacionales tradicionales, redes sociales, aplicaciones móviles, sensores... en donde es fundamental que no haya islas o tecnologías cerradas, por lo que el Open Source es fundamental, pues se puede adaptar a todo tipo de soluciones

En base a nuestra experiencia en algunos de estos proyectos de ciudades inteligentes en los que hemos participado, queremos compartir unos cuantas tecnologías, recursos y demos que os pueden ser de ayuda:

1. List of Open Source solutions for Smart Cities - Internet of Things projects

2. List of Open Source Business Intelligence tool for Smart Cities 

3. 35 Open Source Tools para Internet of Things (IoT)


Tecnologías Big Data

Demos Business Intelligence

Seguimiento del tráfico near real time en el Ayuntamiento de Madrid (Acceso)

Geoposicionamiento de rutas dinámicas (Acceso/Video)

Recomendación de Rutas (grafos) (Acceso/Video)