Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Apuntate al Curso de PowerBI. Totalmente práctico, aprende los principales trucos con los mejores especialistas

Imprescindible para el mercado laboral actual. Con Certificado de realización!!

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 8 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

LinceBI, la mejor solución Big Data Analytics basada en Open Source

LinceBI incluye Reports, OLAP, Dashboards, Scorecards, Machine Learning y Big Data. Pruébala!!

28 feb. 2017

Machine Learning: Choosing the right estimator

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job.
Different estimators are better suited for different types of data and different problems.

The flowchart below by Scikit Learn is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data

24 feb. 2017

Leaflet and R

Leaflet 1.1.0 is now available on CRAN! The Leaflet package is a tidy wrapper for the Leaflet.js mapping library, and makes it incredibly easy to generate interactive maps based on spatial data you have in R.

Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB

This release was nearly a year in the making, and includes many important new features.
  • Easily add textual labels on markers, polygons, etc., either on hover or statically
  • Highlight polygons, lines, circles, and rectangles on hover
  • Markers can now be configured with a variety of colors and icons, via integration with Leaflet.awesome-markers
  • Built-in support for many types of objects from sf, a new way of representing spatial data in R (all basic sf/sfc/sfg types except MULTIPOINT and GEOMETRYCOLLECTION are directly supported)
  • Projections other than Web Mercator are now supported via Proj4Leaflet
  • Color palette functions now natively support viridis palettes; use "viridis", "magma", "inferno", or "plasma" as the palette argument
  • Discrete color palette functions (colorBin, colorQuantile, and colorFactor) work much better with color brewer palettes
  • Integration with several Leaflet.js utility plugins
  • Data with NA points or zero rows no longer causes errors
  • Support for linked brushing and filtering, via Crosstalk (more about this to come in another blog post)
Visto en blog.rstudio

23 feb. 2017

Citus 6.1 Released, escala tu Base de datos PostgreSQL

Interesantes novedades de Citusdata, ver Community Edition

Citus es una base de datos distribuida que permite escalar PostgreSQL (una de nuestras Bases de Datos favoritas), permitiendo usar todas las funcionalidades de PostgreSQL con las ventajas de escalar.

Microservices and NoSQL get a lot of hype, but in many cases what you really want is a relational database that simply works, and can easily scale as your application data grows. Microservices can help you split up areas of concern, but also introduce complexity and often heavy engineering work to migrate to them. Yet, there are a lot of monolithic apps out that do need to scale. 

If you don’t want the added complexity of microservices, but do need to continue scaling your relational database then you can with Citus. With Citus 6.1 we’re continuing to make scaling out your database even easier with all the benefits of Postgres (SQL, JSONB, PostGIS, indexes, etc.) still packed in there.

With this new release customers like Heap and Convertflow are able to scale from single node Postgres to horizontal linear scale. Citus 6.1 brings several improvements, making scaling your multi-tenant app even easier. These include:
  • Integrated reference table support
  • Tenant Isolation
  • View support on distributed tables
  • Distributed Vaccum / Analyze

All of this with the same language bindings, clients, drivers, libraries (like ActiveRecord) that Postgres already works with

15 feb. 2017

Glosario de Terminos de Business Intelligence

Para todos aquellos que se están introduciendo en el mundo del Business Intelligence, os incluimos un Glosario de los principales términos de Business Intelligence. Visto en el blog de Panorama

Si queréis jugar con una Demo abierta, open source, para conocer y probar estos conceptos, es lo mejor para familiarizarse.

Glosario de Términos Business Intelligence:

  • Automated Analysis: Automatic analysis of data to find hidden insights in the data and show users the answers to questions they have not even thought of yet.
  • BI Analyst: As stated by, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things. (Can be called a data analyst too)
  • BI Governance: According to Boris Evelson, from Forrester Research, BI governance is a key part of data governance, but if focuses on a BI system and governs over who uses the data, when, and how.
  • Big Data: Enormous and complex data sets that traditional data processing tools cannot deal with.
  • Bottlenecks: Points of congestion or blockage that hinder the efficiency of the BI system.
  • Business Intelligence: According to Gartner, “Business Intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.”
  • Centralized Business Intelligence: A BI model that enables users to work connected and share insights, while seeing the same and only version of the truth. IT governs over data permissions to ensure data security.
  • Collaborative BI: An approach to Business Intelligence where the BI tool empowers users to collaborate between colleagues, share insights, and drive collective knowledge to improve decision making.
  • Collective Knowledge: Knowledge that benefits the whole enterprise as it comes from the sharing of insights and data findings across groups and departments to enrich analysis.
  • Dark Data: According to Gartner, the definition for Dark Data is “information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes”. 90% of companies’ data is dark data.
  • Dashboards: A data visualization tool that displays the current enterprise health, the status of metric and KPIs, and the current data analysis and insights.
  • Data Analyst: As stated by, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things.
  • Data Analytics: According to TechTarget, “data analytics is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software.”
  • Data Governance: According to Boris Evelson, from Forrester Research, data governance “deals with the entire spectrum (creation, transformation, ownership, etc.) of people, processes, policies, and technologies that manage and govern an enterprise’s use of its data assets (such as data governance stewardship applications, master data management, metadata management, and data quality).
  • Data Mashup: An integration multiple data sets in a unified analytical and visual representation.
  • Data Silos: According to Tech Target, a data silo is “data that is under the control of one department or person and is isolated from the rest of the organization.” Data silos are a bottleneck for effective business operations.
  • Data Sources: The source where the data to be analyzed comes from. It can be a file, a database, a dataset, etc. Modern BI solutions like Necto can mashup data from multiple data sources.
  • Data Visualization: The graphic visualization of data. Can include traditional forms like graphs and charts, and modern forms like infographics.
  • Data Warehouse: A relational database that integrates data from multiple sources within a company.
  • Embedded Analytics: The integration of reporting and data analytic capabilities in a BI solution. Users can access full data analysis capabilities without having to leave their BI platform.
  • Excel Hell: A situation where the enterprise is full of unnecessary copies of data, thousands of spreadsheets get shared, and no one knows with certainty which is the most updated and real version of the data.
  • Federated Business Intelligence: A BI model where users work in separate desktops, creating data silos and unnecessary copies of data, leading to multiple versions of the truth.
  • Geo-analytic capabilities: The ability that a BI or data discovery tool has to analyze data by geographical area and reflect such analysis on maps on the user’s dashboard.
  • Infographics: Visual representations of data that are easily understandable and drive engagement.
  • Insights: According to Forrester Research, insights are “actionable knowledge in the context of a process or decision.”
  • KPI: Key Performance Indicator. A quantifiable measure that a business uses to determine how well it meets the set operational and strategic goals. KPIs give managers insights of what is happening at any specific moment and allow them to see in what direction things are going.
  • Modern BI: An approach to BI using state of the art technology, providing a centralized and secure platform where business users can enjoy self-service capabilities and IT can govern over data security.
  • OLAP: Stands for Online Analytical Processing and it is a technology for data discovery invented by Panorama Software and then sold to Microsoft in 1996. It has many capabilities, such as complex analytics, predictive “what if” scenario planning, and limitless report viewing.
  • Scalability: The ability of a BI solution to be used by a larger number of users as time passes.
  • Self-Service BI: An approach that allows business users to access and work with data sources even though they do not have an analyst or computer science background. They can access, profile, prepare, integrate, curate, model, and enrich data for analysis and consumption by BI platforms. In order to have successful self-service BI, the BI tool must be centralized and governed by IT.
  • Smart Data: Smaller data sets from Big Data that are valuable to the enterprise and can be turned into actionable data.
  • Smart Data Discovery: The processing and analysis of Smart Data to discover insights that can be turned into actions to make data-driven decisions in an organization.
  • Social BI: An approach where social media capabilities, such as social networking, crowdsourcing, and thread-based discussions are embedded into Business Intelligence so that users can communicate and share insights.
  • Social Enterprise: An enterprise that has a new level of corporate connectivity, leveraging the social grid to share and collaborate on information and ideas. It drives a more efficient operation where problems are uncovered and fixed before they can affect the revenue streams.
  • SQL: Stands for Standardized Query Language. It is a language used in programming for managing relational databases and data manipulation.
  • State of the Art BI: The highest level of technology, the most up-to date features, and the best analysis capabilities in a Business Intelligence solution.
  • Suggestive Discovery Engine: An engine behind the program that recommends to the users the most relevant insights to focus on, based on personal preferences and behavior.
  • Systems of Insight: This is a term coined by Boris Evelson, VP of Forrester Research. It is a Business Intelligence system that combines data availability with business agility, where both IT and business users work together to achieve their goals.
  • Workboards: An interactive data visualization tool. It is like a dashboard that displays the current status of KPIs and other data analysis, with the possibility to work directly on it and do further analysis.