Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Apuntate al Curso de PowerBI. Totalmente práctico, aprende los principales trucos con los mejores especialistas

Imprescindible para el mercado laboral actual. Con Certificado de realización!!

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 8 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

LinceBI, la mejor solución Big Data Analytics basada en Open Source

LinceBI incluye Reports, OLAP, Dashboards, Scorecards, Machine Learning y Big Data. Pruébala!!

9 dic. 2017

The Visual Reference for Dashboards

Muy interesante infografía sobre la mejor forma de utilizar los gráficos en los Dashboards de PowerBI que, evidentemente, valen para todo tipo de Dashboards



4 dic. 2017

Ebook gratuito, La Consultoria con Humor


Ya podéis bajaros el libro de 'La Gacela de Wirayut' para leer en vuestro tablet preferido de forma gratuita, en formato pdf.

A todos los que habéis trabajado, trabajais en el mundo de la consultoría, o habéis tratado con consultores, seguro que hay muchas cosas que os resultan familiares.

Se da un repaso a la inutilidad de muchas reuniones de trabajo, la relación con los jefes, el uso del e-mail y de internet, la hipocresía en muchas empresas.

Un apasionante viaje a las profundidades de las empresas. Lugares en donde pasamos gran parte de nuestra vida sin comprender realmente que hacemos allí. Esperemos que os guste y además gratis!!

¿Te llevas bien con tu jefe o solo disimulas?, ¿Utilizas Internet para temas relacionados con tu trabajo o con tu ocio?, ¿Alguna vez has trabajado en el extranjero sin saber apenas inglés?


Indice

0. Introducción. La Gacela de Wirayut
1. De moquetas y despachos
2. Tienes un e-mail
3. Haciendo amigos (seguridad, limpieza, mantenimiento)
4. Que bien.... reunión
5. Trabajar en el extranjero (Como Tarzán en Sarajevo)
6. En tierra extraña (cuando se llega a una empresa ‘cliente’)
7. El Alien ‘ado’
8. Reuniones anuales, Kick-offs
9. ¿Un coffee y hablamos?
10. Internet... no se puede estar sin él
11. Momento Rocky Balboa



1 dic. 2017

Open Source for Analytics White Paper


Muy interesante este reciente white paper (no es muy extenso la verdad), cuyo punto más interesante es que está sponsorizado por SAS. Si, por SAS

Al igual que hizo Microsoft hace unos años, fabricantes tradicionales se están viendo superados por la irrupción del Open Source: R, sin ir más lejos y están tratando de ubicarse
En uno de los puntos de la tabla de contenidos se habla del interesante enfoque de 'como pueden convivir el software comercial y el open source', que también habría podido ser; 'Cómo puede sobrevivir el software comercial'


  • NUMBER ONE Understand open source analytics tools 
  • NUMBER TWO Consider open source analytics opportunities 
  • NUMBER THREE Be mindful of open source analytics challenges 
  • NUMBER FOUR Think about the business use case 
  • NUMBER FIVE Consider using open source and commercial products together 
  • NUMBER SIX Learn the analytics techniques

28 nov. 2017

Analyzing Ashley Madison Files with Pentaho


Now, you can access from the website ohmydata.org to Ashley Madison files revealed using Business Intelligence Open Source tools like Pentaho, Mondrian, STPivot, Saiku, d3.js...

You can find predefined reports, analysis and dashboards or you can create your own, slicing by sex, age, country, city, sexual orientation, ethnicity, drink and smoke habits, height, weight and more

Hope you like!!





25 nov. 2017

Radio.garden, impresionante!!



Queréis sentiros como un 'big brother y controlar todo lo que se escucha en las radios del mundo? Radio.Garden


How it works?


It's quite simple. This technology is called streaming audio. For example, “SHOUTCAST” is an internet broadcasting tool, used for audio streaming. It can be used by radio stations to bring audio to an internet audience. It uses the very popular mp3 technology for audio delivery. The station broadcasts through the internet instead of broadcasting through radio waves. It is powerful and stable. There is no need for any complicated or expensive software or hardware at the receiving end either - a simple desktop PC and broadband connection will work fine.
RADIO GARDEN is the brainchild of Jonathan Puckey, who is based in Netherlands and is the main person behind this project. The main idea is to help radio makers and listeners connect with distant cultures and/or re-connect with home thousands of miles away. Radio Stations who are willing to be heard on RADIO GARDEN simply register with the webmaster who then pinpoints the location of the station on the world map and provides a hyperlink to the URL of the station. The DJ sends audio data from the station's computer to a central SHOUTcast server that then turns around and streams it out to all connected listeners. I believe, so far more than 10,000 stations have registered. Radio Garden is funded with public money from the Netherlands Institute of Sound and Vision; there is no commercial aspect to the project right now.
Listening to big radio stations is passe. Talk about remoteness. Great music, too. Do give it a try - you will be pleasantly surprised!

Is AI the future?





17 nov. 2017

Pentaho Community 2017, Resumen de la 10 Edicion en Mainz, Alemania




En este año 2017, hemos llegado a la edición número 10 del evento más importante para los desarrolladores y comunidad de Pentaho y como todos los años, pasamos a haceros un resumen de lo más interesante.

Agradecer al equipo de IT-Novum y la comunidad de Pentaho (Pedro Alves), su gran trabajo en la organización

Por nuestra parte, nuestro granito de arena a la comunidad es STPivot4, que ya se ha hecho el 'pull request' para que esté disponible en Pentaho Marketplace

Keynotes:

Stefan Müller, el gran organizador, abriendo el evento


  • 10:00 AM - 10:30 AM:    All about Pentaho 8.0, Pedro Alves | Senior Vice President Community at Pentaho
Siempre, la presentación más esperada
    • 10:30 AM - 11:00 AM:    What’s new in PDI 8.0?, Jens Bleuel | Sr. Product Manager Data Integration at Pentaho
    • 11:00 AM - 11:30 AM:    What’s brewing in the Pentaho Labs?, Matt Casters | Chief Architect, PDI Kettle Project Founder at Pentaho
    • 11:30 AM - 11:50 AM: Introducing Pentaho on the Hitachi Vantara Community , Jill Ross | Enterprise Community Manager Hitachi Vantara
    • 11:50 AM - 12:20 PM:    CERN's Business Computing Accelerated by Pentaho, Jan Janke | Deputy Group Leader of Administrative Information Systems at CERN

Sesiones Técnicas:



Presentación de Hiromu Hota: SpoonGit (Git client integration with spoon)





Slawo, un experto en PDI y ETLs como Testear PDI Solutions




Desde Letonia, eazyBI, proporciona entorno BI para JIRA



Nelson Sousa, uno de los expertos en Pentaho más divertidos


Understanding the Pentaho CDE NewMapComponent from Kleyson Rios



Ejemplo de uso de Pentaho en la Sanidad en Africa (Mozambique)



Pentaho PDI and the Jare Ruleengine from uwe geercken


Código para los ejercicios del libro de Francesco Corti:  Pentaho 8 Reporting for Java Developers




 

Matt Casters, el genial creador de PDI-Kettle, buen amigo que estuvo en nuestras oficinas en España en alguna ocasión

Tiempo para la cena:



Detalle en inglés, en el blog de IT-Novum y en el de Hitachi Vantara

30 oct. 2017

Workshop gratuito BI Open Source Pentaho en Lima, Peru (21 de Noviembre)


Os presentamos un más que interesante Workshop gratuito de Pentaho en Lima, Perú. Será realizado por los especialistas de Stratebi

PUBLICO OBJETIVO

Para todos los que se quieran dedicar al mundo del Business Intelligence, profesionales de las tecnologías de información, gestores de TI, consultores en Business Intelligence, Analistas de Negocio, Analistas de sistemas, arquitectos Java, desarrolladores de sistemas, administradores de bases de datos, desarrolladores y profesionales con relación a el área de tecnología, marketing, negocio y financiera.
Si desean inscribirse o  formación 'online completa', adaptada a sus necesidades, pueden contactar con info@stratebi.com

OBJETIVO

El objetivo es enseñar al alumno las posibilidades para construir una solución de Business Intelligence (BI) para hacer el análisis de datos procedentes de diversas fuentes y sistemas, utilizando herramientas de software libre como Pentaho. Herramienta líder en el mercado Open Source.
También se hablará sobre otros entornos BI Open Source como Saiku, Ctools, Talend y otras soluciones desarrolladas por la comunidad.

29 oct. 2017

Project Maestro: ETL para Tableau




Que duda cabe, como indicábamos hace unos días en la comparativa entre Tableau y PowerBI que uno de los elementos que se achaca como carencia a Tableau es la ausencia de herramienta de ETL.

Lo cierto es que lleva bastante tiempo anunciando su propia herramienta ETL, Project Maestro, más de un año, aunque por lo que conocemos, se asemejaría más a un módulo de Data Preparation orientado a usuario final que una herramienta ETL completa. 

En nuestra opinión y práctica diaria, buena parte de compañías que usan Tableau y PowerBI, que tienen necesidades de ETL importantes, se decantan por el uso de Pentaho Data Integration y Talend Open Studio, para orquestar todos sus procesos

En cualquier caso, la iniciativa de Tableau es interesante para aquellas compañías/usuarios que no tengan necesidades importantes en cuanto a ETL y quieran hacerlas ellos mismos directamente



La cuestión... es que hay que seguir esperando... de momento


26 oct. 2017

New features in STDashboard for Pentaho



The improvements in this version of STDashboard are focused on user interface for panel and dashboard and also some enhancement in performance and close some old bugs. It works with Pentaho versions 5, 6 and 7

You can see it in action in this Pentaho Demo Online





About UI improvements:

 - New set of predefined dashboard templates. We have designed a new way to manage dashboard panels that allow you to shape the dashboard in almost any combination of size, proportion and amount of panel you want to have. For this reason we have created a set of different layouts for most common cases.



 - Self managed panel. Add and remove panels, now in stdashboard you can add or remove panels easily using the button inside each panel header.



 - New layout management. Now an stashboard layout is composed of a list panel container, the containers in this list are stacked vertically in the page. There are two types of such containers; horizontal and vertical, each one stores a list of real panels (the ones where the graph are drawn) in an horizontal or vertical flow, in this ways you can combine those panels to achieve almost any layout you can imagine.



 - Resizable panels. We have included the possibility of resize the panel horizontally or vertically, keeping the proportion of graph inside it in correspondence with horizontal adjacent panels without making an horizontal scroll in the page, that means if you shrink a panel horizontally and there is another panel in the same row, the other panels also shrink an a proportional way to allow all panels in a row fit the horizontal size of the window. 

Is interesting to note here that we have implemented this functionality using pure GWT API, to avoid external dependencies and ensure portability between browsers.

 - Draggable panels. Each panel in the entire dashboard can be dragged to any parent container. In the header of each single panel the is a handle that allow dragging the panels to any panel container in the dashboard.




 - Responsive Dashboard. The ability to resize dynamically the panels and graph when the window's dimensions change, or when a user make zoom in the page is now implemented, also in most phones the dashboard can be seen proportionally and keeping the original layout.

 - Persistent state of the layout. When you save a dashboard to a file, we are saving the visual state of it and store it in the file. Then, when you open the dashboard, all the details of visual interface are hold and you can see the dashboard exactly the same previous to saved, that means panels size, locations are restored effectively.


About performance:

 - In some points of the application an specific query was causing performance problem. To know if a member has child or not in a multilevel hierarchy, the previous code issued a query to list all the sons of that member and check if the size is greater than 0, our solutions in this case for this type of query was simply check the level of the current member and in this way answer that boolean query.

 - Connection to cubes using the new MondrianOlap4jDriver java class. This improve the connection performance and stability because is designed for mondrian connections, the previous code was using an standard JDBC connection.


About new enhacements:

- Date configuration for filters. Date dimension are special dimensions, because almost any cube has at least one defined and are very used for make range query over fact table, to allow dynamic filter in panels, we had to enable a .property file that allow the user to define their date dimension and configure the way they want to use it in queries.


Added the Pentaho File Explorer to allows the users navigation through the files stored in pentaho, like reports, documents, etc and embeed it inside a panel in the dashboard







See a Video Demo:

24 oct. 2017

Fintech radar en España, una nueva burbuja?



Que duda cabe, que el surgimiento de iniciativas tecnológicas alrededor del campo de las finanzas, Fintech, en España, puede considerarse una buena noticia.

No obstante, viendo experiencias pasadas de burbujas 2.0 y anteriores en el año 2.000, hay que se cautos; para separar el grano de la paja. De una bonita idea, logo, oficinas, etc... a su aplicación práctica con retorno de la inversión, puede distar mucho

En cualquier caso, la lista esta disponible, :-)

More than 2600 Open Data Portals around the world


The table of contents will give you a summary of all countries represented on this list. Simply click on a country’s name and the page will bring you to the correct section.

If you are curious about how we created this list, an article about it, thanks to OpenDataSoft

18 oct. 2017

Human Resources Analytics


Human Resources LinceBI Analytics solution is based on open source including KPIs, Reports, OLAP Analysis, Dashboards, Scorecards, Big Data and Machine Learning with 'predefined templates, dashboards and KPIs/ratios and fully customizable environment

Manage budgets efficiently and maximize revenues and costs in favour of collective benefit.



Do more with less! Through innovative techniques of Data Mining and Social Intelligence to maximize objectives, identifying trends related to workers behavior and satisfaction in order to answer their demands efficiently and improve engagement










15 oct. 2017

Comparativa de Costes Tableau vs PowerBI

 

Os dejamos un documento listo para descargar, con una comparativa muy completa de costes entre Tableau y PowerBI (hay que decir que el informe ha sido encargado por Tableau, por lo que puede tener cierto sesgo). 

Por ejemplo, en cuanto al esfuerzo de este tipo de proyectos, si tenemos en cuenta que ambas son herramientas de Data Discovery (usuario final), no se tiene suficientemente en cuenta la parte más importante, el modelado, ETL, Data Quality, etc... 

En la práctica, estas herramientas, necesitan también de herramientas ETL, metadatos, MDM, Data Quality que garanticen la correcta implementación en entornos en producción

Para una comparativa de funcionalidades técnicas echad un vistazo a la Comparativa de herramientas Business Intelligence

Ver también: Como preparar un entorno Big Data OLAP con Tableau y con PowerBI








Open Source Business Intelligence tips in October 2017

10 oct. 2017

Pentaho 8 Reporting for Java Developers


Gracias a Packt que nos ha enviado una copia de: 'Pentaho 8 Reporting for Java Developers' para revisión, como hemos hecho en otras ocasiones y que publicaremos proximamente

Este libro está escrito por un buen amigo con el que hemos coincidido en bastantes Pentaho Developers, Francesco Corti. Echad un vistazo a su web, gran experto en Alfresco y su integración con Pentaho.

Más de 400 páginas de utilidad en este libro, con código para ejercicios

Puedes ver también, el tutorial gratuito sobre Pentaho

5 oct. 2017

Cuales son las novedades es MySQL 8.0?



MySQL, the popular open-source database that’s a standard element in many web application stacks, has unveiled the first release candidate for version 8.0.
Features to be rolled out in MySQL 8.0 include:
  • First-class support for Unicode 9.0 out of the box.
  • Window functions and recursive SQL syntax, for queries that previously weren’t possible or would have been difficult to write.
  • Expanded support for native JSON data and document-store functionality.
With version 8.0, MySQL is jumping several versions in its numbering (from 5.5), due to 6.0 being nixed and 7.0 being reserved for the clustering version of MySQL.

MySQL 8.0’s expected release date

MySQL hasn’t committed to a release date for MySQL 8.0, by MySQL’s policy is “a new [general] release every 18-24 months.” The last general release was October 21, 2015, for MySQL 5.7, so MySQL 8.0’s production version is likely to come in October 2017

Where to download MySQL 8.0
You can download the beta versions of MySQL 8.0 now for Windows, MacOS, several versions of Linux, FreeBSD, and Solaris; the source code is also available. Scroll down the downloads page and go to the Development Releases tab to get them.

Visto en Infoworld

1 oct. 2017

Google lanza Cloud Dataprep in public beta



Muy interesante esta iniciativa de Google en Cloud, Cloud Dataprep, con la idea de facilitar los procesos ETL. Os dejamos la info más abajo, pero según nuestra opinión, dos temas importantes a considerar:

- Data preparation es un eufemismo para intentar dar a entender que los procesos ETL pueden ser sencillos y para usuarios finales, algo que para cualquiera que se dedique al Analytics sabe que no lo es (de hecho, es la parte más compleja e importante, es como la parte oculta de un iceberg). Y, esto es, por que se vislumbra mercado/ingresos en este área. Ver siguiente punto:

- Tiene un modelo de pricing

Google Cloud Dataprep is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Cloud Dataprep is serverless and works at any scale. There is no infrastructure to deploy or manage. Easy data preparation with clicks and no code.

The stories behind the data



No dejéis de echar un vistazo a esta iniciativa de Bill Gates: The stories behind the data

"We are launching this report this year and will publish it every year until 2030 because we want to accelerate progress in the fight against poverty by helping to diagnose urgent problems, identify promising solutions, measure and interpret key results, and spread best practices.
As it happens, this report comes out at a time when there is more doubt than usual about the world’s commitment to development."

Microsoft lanza nuevas herramientas de Machine Learning



Microsoft, just like many of its competitors, has gone all in on machine learning. That emphasis is on full display at the company’s Ignite conference this where, where the company today announced a number of new tools for developers who want to build new A.I. models and users who simply want to make use of these pre-existing models — either from their own teams or from Microsoft.

For developers, the company launched three major new tools today: the Azure Machine Learning Experimentation service, the Azure Machine Learning Workbench and the Azure Machine Learning Model Management service.


In addition, Microsoft also launched a new set of tools for developers who want to use its Visual Studio Code IDE for building models with CNTK, TensorFlow, Theano, Keras and Caffe2. And for non-developers, Microsoft is also bringing Azure-based machine learning models to Excel users, who will now be able to call up the AI functions that their company’s data scientists have created right from their spreadsheets.

Visto en Techcrunch

Te puede interesar: Las 53 claves para conocer Machine Learning

15 sept. 2017

En Tecnologia y Consultoria #StopBodyShopping


Defendamos el trabajo bien hecho y de calidad. Aprender lleva mucho tiempo. No se puede saber de todo

"La sabiduría es hija de la experiencia"
 

Leonardo Da Vinci(1452-1519) Pintor, escultor e inventor

Big Data Analytics for Financial Services


Un gran evento el de Big Data Analytics for Financial Services

"Due to the sheer volume of data the financial services sector generates from customers, transactions, global trading, and many other sources, it is currently one of the most risk laden sectors.

This has put the FS sector under increased scruitiny from regulatory bodies to remain compliant, resulting in the on-going pressure for effective information governance.

But this has also created an opportunity to improve competiveness and drive business growth. The sector has continued to use data to detect and manage the increase in fraud and financial crime, develop competitive pricing, manage risk & compliance as well as make strategic business decisions. But now, the shift has also moved towards innovation, and data is being leveraged to develop new and personalised products and services via better customer segmentation and analysis"

Descargar Documento

 

1 sept. 2017

Libro gratuito: Ultimate Guide To Data Science Interviews


What’s inside? 
90 pages of original research, interviews with real data scientists and hiring managers at some of the best data science teams on earth, as well as recruiters and successful candidates who are now data scientists, and actionable checklists. We’ll walk you, step-by-step through everything you need to know to ace the data science interview. 
  • You’ll start by understanding the different roles and industries within data science so you can apply for jobs that are the best fit for you.
  • Next, you’ll learn how to apply for these jobs to maximize your chances of getting an interview.
  • Then, you’ll go over every step of the data science interview process so that you can prepare for what’s coming.
  • Next, you’ll get free sample questions that cover the categories of questions you can expect to receive, which you can use to practice how you approach the data science interview.
  • Then, you’ll get advice on what to do after the interview to move the process forward.
  • Finally, you’ll know what to do if you’re juggling between different offers.

Table of Contents:
Introduction
What is Data Science?
Different Roles within Data Science
How Different Companies Think About Data Science:
  1. Early­stage startups (200 employees or fewer) looking to build a data product
  2. Early­stage startups (200 employees or fewer) looking to take advantage of their data
  3. Mid­size and large Fortune 500 companies who are looking to take advantage of their data
  4. Large technology companies with well­ established data teams
Industries that employ Data Scientists
Getting a Data Science Interview
Nine Paths to a Data Science Interview
Traditional Paths to Job Interviews:
  1. Data Science Job Boards and Standard Job Applications
  2. Work with a Recruiter
  3. Go to Job Fairs
Proactive Paths to Job Interviews:
  1. Attend or Organize a Data Science Event
  2. Freelance and Build a Portfolio
  3. Get Involved in Open Data and Open Source
  4. Participate in Data Science Competitions
  5. Ask for Coffees, do Informational Interviews
  6. Attend Data Hackathons
Working with Recruiters
  1. How to Apply
  2. CV vs LinkedIn
  3. Cover Letter vs Email
  4. How to get References and Your Network to Work for You
  5. Preparing for the Interview
What to Expect:
  1. The Phone Screen
  2. Take­home Assignment
  3. Phone Call with a Hiring Manager
  4. On­site Interview with a Hiring Manager
  5. Technical Challenge
  6. Interview with an Executive
What a data scientist is being evaluated on
  1. The Categories of Data Science Questions
  2. Behavioral Questions
  3. Mathematics Questions
  4. Statistics Questions
  5. Scenario Questions
  6. Tackling the Interview
  7. Conclusion
What Hiring Managers are Looking For:
  1. Interview with Will Kurt (Quick Sprout)
  2. Interview with Matt Fornito (OpsVision Solutions)
  3. Interview with Andrew Maguire (PMC/Google/Accenture)
  4. Interview with Hristo Gyoshev (MasterClass)
  5. Conclusion
How Successful Interviewees Made It:
  1. Sara Weinstein
  2. Niraj Sheth
  3. Sdrjan Santic
  4. Conclusion
7 Things to Do After The Interview:
  1. Send a follow­up thank you note
  2. Send them thoughts on something they brought up in the interview
  3. Send relevant work/homework to the employer
  4. Keep in touch, the right way
  5. Leverage connections
  6. Accept any rejection with professionalism
  7. Keep up hope
The Offer Process
  1. Handling Offers
  2. Company Culture
  3. Team
  4. Location
  5. Negotiating Your Salary
  6. Facts and Figures
  7. Taking the Offer to the Best First Day
Templates
  1. Reaching out to get a referral
  2. Following up after an interview