Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Apuntate al Curso de PowerBI. Totalmente práctico, aprende los principales trucos con los mejores especialistas

Imprescindible para el mercado laboral actual. Con Certificado de realización!!

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 8 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

Aprende gratis Analytics OLAP sobre Pentaho

La solución open source para business intelligence y Big Data sobre Pentaho, no te lo pierdas!!

31 ene. 2018

Una Wikipedia para la visualización de datos

Si alguna vez tienes dudas sobre cual es el mejor tipo de gráfico para usar en cada ocasión, puedes echar un vistazo a the Data Viz Project, en donde tienes más de 150 gráficos explicados y la mejor forma de usarles y sacar partido.

Una de las mejores partes de la web es donde se muestran ejemplos reales de aplicación práctica de cada uno de los gráficos:

29 ene. 2018

Working together PowerBI with the best open source solutions

Here you can see a nice sample combining PowerBI with open source based Business Intelligence solutions, like LinceBI, in order to provide the most complete BI solution with an affordable cost

- Predefined Dashboards
- Adhoc Reporting
- OLAP Analysis
- Adhoc Dashboarding
- Scorecards

More info:
- PowerBI functionalities
- PowerBI training

25 ene. 2018

Las 7 personas que necesitas en tu equipo de datos

Great and funny data info in Lies, Damned Lies

1. The Handyman

Weird-Al-Handy_thumb10The Handyman can take a couple of battered, three-year-old servers, a copy of MySQL, a bunch of Excel sheets and a roll of duct tape and whip up a basic BI system in a couple of weeks. His work isn’t always the prettiest, and you should expect to replace it as you build out more production-ready systems, but the Handyman is an invaluable help as you explore datasets and look to deliver value quickly (the key to successful data projects). 
Just make sure you don’t accidentally end up with a thousand people accessing the database he’s hosting under his desk every month for your month-end financial reporting (ahem).

Really good handymen are pretty hard to find, but you may find them lurking in the corporate IT department (look for the person everybody else mentions when you make random requests for stuff), or in unlikely-seeming places like Finance. He’ll be the person with the really messy cubicle with half a dozen servers stuffed under his desk.
The talents of the Handyman will only take you so far, however. If you want to run a quick and dirty analysis of the relationship between website usage, marketing campaign exposure, and product activations over the last couple of months, he’s your guy. But for the big stuff you’ll need the Open Source Guru.

2. The Open Source Guru.

cameron-howe_thumbI was tempted to call this person “The Hadoop Guru”. Or “The Storm Guru”, or “The Cassandra Guru”, or “The Spark Guru”, or… well, you get the idea. As you build out infrastructure to manage the large-scale datasets you’re going to need to deliver your insights, you need someone to help you navigate the bewildering array of technologies that has sprung up in this space, and integrate them.

Open Source Gurus share many characteristics in common with that most beloved urban stereotype, the Hipster. They profess to be free of corrupting commercial influence and pride themselves on plowing their own furrow, but in fact they are subject to the whims of fashion just as much as anyone else. Exhibit A: The enormous fuss over the world-changing effects of Hadoop, followed by the enormous fuss over the world-changing effects of Spark. Exhibit B: Beards (on the men, anyway).

So be wary of Gurus who ascribe magical properties to a particular technology one day (“Impala’s, like, totally amazing”), only to drop it like ombre hair the next (“Impala? Don’t even talk to me about Impala. Sooooo embarrassing.”) Tell your Guru that she’ll need to live with her recommendations for at least two years. That’s the blink of an eye in traditional IT project timescales, but a lifetime in Internet/Open Source time, so it will focus her mind on whether she really thinks a technology has legs (vs. just wanting to play around with it to burnish her resumé).

3. The Data Modeler 

While your Open Source Guru can identify the right technologies for you to use to manage your data, and hopefully manage a group of developers to build out the systems you need, deciding what to put in those shiny distributed databases is another matter. This is where the Data Modeler comes in.
The Data Modeler can take an understanding of the dynamics of a particular business, product, or process (such as marketing execution) and turn that into a set of data structures that can be used effectively to reflect and understand those dynamics.

Data modeling is one of the core skills of a Data Architect, which is a more identifiable job description (searching for “Data Architect” on LinkedIn generates about 20,000 results; “Data Modeler” only generates around 10,000). And indeed your Data Modeler may have other Data Architecture skills, such as database design or systems development (they may even be a bit of an Open Source Guru). 
But if you do hire a Data Architect, make sure you don’t get one with just those more technical skills, because you need datasets which are genuinely useful and descriptive more than you need datasets which are beautifully designed and have subsecond query response times (ideally, of course, you’d have both). And in my experience, the data modeling skills are the rarer skills; so when you’re interviewing candidates, be sure to give them a couple of real-world tests to see how they would actually structure the data that you’re working with.

4. The Deep Diver

diver_thumb3Between the Handyman, the Open Source Guru, and the Data Modeler, you should have the skills on your team to build out some useful, scalable datasets and systems that you can start to interrogate for insights. But who to generate the insights? Enter the Deep Diver.
Deep Divers (often known as Data Scientists) love to spend time wallowing in data to uncover interesting patterns and relationships. A good one has the technical skills to be able to pull data from source systems, the analytical skills to use something like R to manipulate and transform the data, and the statistical skills to ensure that his conclusions are statistically valid (i.e. he doesn’t mix up correlation with causation, or make pronouncements on tiny sample sizes). As your team becomes more sophisticated, you may also look to your Deep Diver to provide Machine Learning (ML) capabilities, to help you build out predictive models and optimization algorithms.

If your Deep Diver is good at these aspects of his job, then he may not turn out to be terribly good at taking direction, or communicating his findings. For the first of these, you need to find someone that your Deep Diver respects (this could be you), and use them to nudge his work in the right direction without being overly directive (because one of the magical properties of a really good Deep Diver is that he may take his analysis in an unexpected but valuable direction that no one had thought of before).
For the second problem – getting the Deep Diver’s insights out of his head – pair him with a Storyteller (see below).

5. The Storyteller

woman_storytellerThe Storyteller’s yin is to the Deep Diver’s yang. Storytellers love explaining stuff to people. You could have built a great set of data systems, and be performing some really cutting-edge analysis, but without a Storyteller, you won’t be able to get these insights out to a broad audience.
Finding a good Storyteller is pretty challenging. You do want someone who understands data quite well, so that she can grasp the complexities and limitations of the material she’s working with; but it’s a rare person indeed who can be really deep in data skills and also have good instincts around communications.

The thing your Storyteller should prize above all else is clarity. It takes significant effort and talent to take a complex set of statistical conclusions and distil them into a simple message that people can take action on. Your Storyteller will need to balance the inherent uncertainty of the data with the ability to make concrete recommendations.
Another good skill for a Storyteller to have is data visualization. Some of the most light bulb-lighting moments I have seen with data have been where just the right visualization has been employed to bring the data to life. If your Storyteller can balance this skill (possibly even with some light visualization development capability, like using D3.js; at the very least, being a dab hand with Excel and PowerPoint or equivalent tools) with her narrative capabilities, you’ll have a really valuable player.

There’s no one place you need to go to find Storytellers – they can be lurking in all sorts of fields. You might find that one of your developers is actually really good at putting together presentations, or one of your marketing people is really into data. You may also find that there are people in places like Finance or Market Research who can spin a good yarn about a set of numbers – poach them.

6. The Snoop 

These next two people – The Snoop and The Privacy Wonk – come as a pair. Let’s start with the Snoop. Many analysis projects are hampered by a lack of primary data – the product, or website, or marketing campaign isn’t instrumented, or you aren’t capturing certain information about your customers (such as age, or gender), or you don’t know what other products your customers are using, or what they think about them.

The Snoop hates this. He cannot understand why every last piece of data about your customers, their interests, opinions and behaviors, is not available for analysis, and he will push relentlessly to get this data. He doesn’t care about the privacy implications of all this – that’s the Privacy Wonk’s job.
If the Snoop sounds like an exhausting pain in the ass, then you’re right – this person is the one who has the team rolling their eyes as he outlines his latest plan to remotely activate people’s webcams so you can perform facial recognition and get a better Unique User metric. But he performs an invaluable service by constantly challenging the rest of the team (and other parts of the company that might supply data, such as product engineering) to be thinking about instrumentation and data collection, and getting better data to work with.

The good news is that you may not have to hire a dedicated Snoop – you may already have one hanging around. For example, your manager may be the perfect Snoop (though you should probably not tell him or her that this is how you refer to them). Or one of your major stakeholders can act in this capacity; or perhaps one of your Deep Divers. The important thing is not to shut the Snoop down out of hand, because it takes relentless determination to get better quality data, and the Snoop can quarterback that effort. And so long as you have a good Privacy Wonk for him to work with, things shouldn’t get too out of hand.

7. The Privacy Wonk 
The Privacy Wonk is unlikely to be the most popular member of your team, either. It’s her job to constantly get on everyone’s nerves by identifying privacy issues related to the work you’re doing.
You need the Privacy Wonk, of course, to keep you out of trouble – with the authorities, but also with your customers. There’s a large gap between what is technically legal (which itself varies by jurisdiction) and what users will find acceptable, so it pays to have someone whose job it is to figure out what the right balance between these two is. 

But while you may dread the idea of having such a buzz-killing person around, I’ve actually found that people tend to make more conservative decisions around data use when they don’t have access to high-quality advice about what they can do, because they’re afraid of accidentally breaking some law or other. So the Wonk (much like Sadness) turns out to be a pretty essential member of the team, and even regarded with some affection.

Of course, if you do as I suggest, and make sure you have a Privacy Wonk and a Snoop on your team, then you are condemning both to an eternal feud in the style of the Corleones and Tattaglias (though hopefully without the actual bloodshed). But this is, as they euphemistically say, a “healthy tension” – with these two pulling against one another you will end up with the best compromise between maximizing your data-driven capabilities and respecting your users’ privacy.

Bonus eighth member: The Cat Herder (you!) The one person we haven’t really covered is the person who needs to keep all of the other seven working effectively together: To stop the Open Source Guru from sneering at the Handyman’s handiwork; to ensure the Data Modeler and Deep Diver work together so that the right measures and dimensionality are exposed in the datasets you publish; and to referee the debates between the Snoop and the Privacy Wonk. 

This is you, of course – The Cat Herder. If you can assemble a team with at least one of the above people, plus probably a few developers for the Open Source Guru to boss about, you’ll be well on the way to unlocking a ton of value from the data in your organization.

Visto en: Lies, Damned Lies

23 ene. 2018

Web Reporting open source based tool

Some new features of one of 'our favourites tools' in analytics that you can use it for Adhoc web reporting for end users. 

You can use it 'standalone', with some BI solutions like Pentaho (check online Demo), suiteCRM, Odoo... or as a part of predefined solutions like LinceBI

You can see STReport main new functionalities on this video including:

- Graph support
- Indentify cardinality of elements
- Parameter filter for end users access
- Cancel execution of long queries
- Upgraded to new Pentaho versions
- Many other minor enhacements and bugs fixed

Contact info

19 ene. 2018

Marketing Analytics, Open Source based Solution

As powerful as an enterprise version, with the advantages of being Open Source based. Discover LinceBI, the most complete Bussines Intelligence platform including all the functionalities you need for Marketing

  • User friendly, templates and wizard
  • Technical skills is not mandatory
  • Link to external content
  • Browse and navigate on cascade dependency graphs

Analytic Reporting
  • PC, Tablet, Smartphone compatibility
  • Syncs your analysis with other users
  • Download information on your device
  • Make better decisions anywhere and anytime

  • Different output formats (CSV, Excel, PDF, HTML)
  • Task scheduling to automatic execution
  • Mailing
Balance Scorecard
  • Assign customized weights to your kpis
  • Edit your data on fly or upload an excel template
  • Follow your key performance indicators
  • Visual kpis, traffic lights colours
  • Assign color coding to your threshold
  • Define your own key performance indicators
  • Make calculated fields on the fly
  • Explore your data on chart
  • Drill down and roll up capabilities
  • What if analysis and mailing

Adhoc Reporting
  • Build your reports easily, drag and drop
  • Models and languaje created to Business Users
  • Corporative templates to your company
  • Advanced filters
  • Configure your threshold
  • Mapping alerts and business rules
  • Planning actions when an event happen

Marketing KPIs:

Check FAQs section for any question

17 ene. 2018

Apple release 'Turi Create' Machine Learning Framework on Github

Apple says Turi Create is easy to use, has a visual focus, is fast and scalable, and is flexible. Turi Create is designed to export models to Core ML for use in iOS, macOS, watchOS, and tvOS apps. From the Turi Create Github repository: 
  • Easy-to-use: Focus on tasks instead of algorithms
  • Visual: Built-in, streaming visualizations to explore your data
  • Flexible: Supports text, images, audio, video and sensor data
  • Fast and Scalable: Work with large datasets on a single machine
  • Ready To Deploy: Export models to Core ML for use in iOS, macOS, watchOS, and tvOS apps
With Turi Create, for example, developers can quickly build a feature that allows their app to recognize specific objects in images. Doing so takes just a few lines of code.

With Turi Create, you can tackle a number of common scenarios:
You can also work with essential machine learning models, organized into algorithm-based toolkits:

Supported Platforms

Turi Create supports:
  • macOS 10.12+
  • Linux (with glibc 2.12+)
  • Windows 10 (via WSL)

System Requirements

  • Python 2.7 (Python 3.5+ support coming soon)
  • x86_64 architecture

6 motivos para llevar tu BI a la nube y 1 por el que no

De todos es sabido que una de las principales tendencias en estos últimos años es que muchos sistemas Business Intelligence se están llevando a la nube. Pero, también es cierto, que los sistemas que se están llevando a la nube no siempre son para áreas críticas de negocio o que manejen datos sensibles.

Esto limita las posibilidades para llevar tu Business Intelligence al cloud o tener un BI Corporativo con las ventajas de ambos modelos.

Os vamos a contar 6 motivos por los que puedes llevar tu BI a la nube y uno por el que no:

1. Reducción de Costes
Si te adaptas a las características de las soluciones BI que te proponen muchas empresas, con los límites, niveles premium, etc... puedes conseguir unos precios mensuales/anuales muy buenos por usuario

2. Aumentar los recursos, según se necesiten
En negocios y sectores en los que puedes tener variaciones en las concurrencias de uso muy altas, incrementos de volumenes de datos en determinados periodos, análisis real-time puntuales, etc... la posibilidad de asignar más recursos de forma fléxible es una gran ventaja

3. Desarrollos más ágiles
El no tener que depender de las áreas de sistemas, o de la disponibilidad de hardware en tiempo y capacidad de forma adecuada, hace que los desarrollos de proyectos e implementación se puedan demorar en gran manera

4. Entornos compatibles
Llevar tu BI a la nube te puede permitir tener algunos componentes y conectores desplegados de forma nativa, referentes a redes sociales, machine learning, bases de datos, etc...

5. Seguridad y Disponibilidad
Aunque no tener los datos en tus propias máquinas puede darte qué pensar, lo cierto, es que muchas compañías, para gran cantidad de información tienen más seguridad de no perderla o de tener brechas si confían en un entorno en la nube que en sus organizaciones internas, de las que no confían tanto

6. Acceso desde cualquier lugar y momento
Cada vez más, desde que se produjo la democratización del Business Intelligence y más usuarios, analistas, etc... hacen uso de los sistemas BI desde todo tipo de redes, accesos, dispositivos móviles... los entornos en la nube pueden garantizar un acceso común y similar para cada uno de ellos

Y por qué no mover a la nube?

Todas esas ventajas tienen el inconveniente de que si quieres desplegar el BI 'on premise', en tus propios entornos, controlando los datos sensibles y accesos, te topas con que el coste de las correspondientes licencias se dispara hasta niveles increíbles. 
Es el 'peaje' a pagar, por salirte del camino marcado en la nube por los grandes proveedores.

Salvo que.... y esa es la gran ventaja del modelo open source o sin coste de licencias, que se está extendiendo, que puedas desplegarlo 'on premise' o en una 'nube privada', controlada por tí, con todas las ventajas anteriores, más la de ahorro de costes.

Confía en las soluciones BI Open Source o sin coste de licencias y que tener tu BI en la nube o 'on premise' no sea un problema

14 ene. 2018

Los 30 mejores proyectos de Machine Learning Open Source

Como sabéis, el Machine Learning es uno de los temas que más nos interesan en el Portal y, máxime, cuando gran parte de las tecnologías son Open Source. En esta entrada, os indicamos los 30 proyectos más interesantes en en este año.

Os dejamos también el material que publicamos con las claves del Machine Learning y una introducción

Ver también, VideoTutorial

No 1

FastText: Library for fast text representation and classification. [11786 stars on Github]. Courtesy of Facebook Research

……….. [ Muse: Multilingual Unsupervised or Supervised word Embeddings, based on Fast Text. 695 stars on Github]

No 2

Deep-photo-styletransfer: Code and data for paper “Deep Photo Style Transfer” [9747 stars on Github]. Courtesy of Fujun Luan, Ph.D. at Cornell University

No 3

The world’s simplest facial recognition api for Python and the command line [8672 stars on Github]. Courtesy of Adam Geitgey

No 4

Magenta: Music and Art Generation with Machine Intelligence [8113 stars on Github].

No 5

Sonnet: TensorFlow-based neural network library [5731 stars on Github]. Courtesy of Malcolm Reynolds at Deepmind

No 6

deeplearn.js: A hardware-accelerated machine intelligence library for the web [5462 stars on Github]. Courtesy of Nikhil Thorat at Google Brain

No 7

Fast Style Transfer in TensorFlow [4843 stars on Github]. Courtesy of Logan Engstrom at MIT

No 8

Pysc2: StarCraft II Learning Environment [3683 stars on Github]. Courtesy of Timo Ewalds at DeepMind

No 9

AirSim: Open source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research [3861 stars on Github]. Courtesy of Shital Shah at Microsoft

No 10

Facets: Visualizations for machine learning datasets [3371 stars on Github]. Courtesy of Google Brain

No 11

Style2Paints: AI colorization of images [3310 stars on Github].

No 12

Tensor2Tensor: A library for generalized sequence to sequence models — Google Research [3087 stars on Github]. Courtesy of Ryan Sepassi at Google Brain

No 13

Image-to-image translation in PyTorch (e.g. horse2zebra, edges2cats, and more) [2847 stars on Github]. Courtesy of Jun-Yan Zhu, Ph.D at Berkeley

No 14

Faiss: A library for efficient similarity search and clustering of dense vectors. [2629 stars on Github]. Courtesy of Facebook Research

No 15

Fashion-mnist: A MNIST-like fashion product database [2780 stars on Github]. Courtesy of Han Xiao, Research Scientist Zalando Tech

No 16

ParlAI: A framework for training and evaluating AI models on a variety of openly available dialog datasets [2578 stars on Github]. Courtesy of Alexander Miller at Facebook Research

No 17

Fairseq: Facebook AI Research Sequence-to-Sequence Toolkit [2571 stars on Github].

No 18

Pyro: Deep universal probabilistic programming with Python and PyTorch [2387 stars on Github]. Courtesy of Uber AI Labs

No 19

iGAN: Interactive Image Generation powered by GAN [2369 stars on Github].

No 20

Deep-image-prior: Image restoration with neural networks but without learning [2188 stars on Github]. Courtesy of Dmitry Ulyanov, Ph.D at Skoltech

No 21

Face_classification: Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV. [1967 stars on Github].

No 22

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind’s WaveNet and tensorflow [1961 stars on Github]. Courtesy of Namju Kim at Kakao Brain

No 23

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation [1954 stars on Github]. Courtesy of Yunjey Choi at Korea University

No 24

Ml-agents: Unity Machine Learning Agents [1658 stars on Github]. Courtesy of Arthur Juliani, Deep Learning at Unity3D

No 25

DeepVideoAnalytics: A distributed visual search and visual data analytics platform [1494 stars on Github]. Courtesy of Akshay Bhat, Ph.D at Cornell University

No 26

OpenNMT: Open-Source Neural Machine Translation in Torch [1490 stars on Github].

No 27

Pix2pixHD: Synthesizing and manipulating 2048x1024 images with conditional GANs [1283 stars on Github]. Courtesy of Ming-Yu Liu at AI Research Scientist at Nvidia

No 28

Horovod: Distributed training framework for TensorFlow. [1188 stars on Github]. Courtesy of Uber Engineering

No 29

AI-Blocks: A powerful and intuitive WYSIWYG interface that allows anyone to create Machine Learning models [899 stars on Github].

No 30

Deep neural networks for voice conversion (voice style transfer) in Tensorflow [845 stars on Github]. Courtesy of Dabi Ahn, AI Research at Kakao Brain

Visto en: