Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Apuntate al Curso de PowerBI. Totalmente práctico, aprende los principales trucos con los mejores especialistas

Imprescindible para el mercado laboral actual. Con Certificado de realización!!

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 8 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

LinceBI, la mejor solución Big Data Analytics basada en Open Source

LinceBI incluye Reports, OLAP, Dashboards, Scorecards, Machine Learning y Big Data. Pruébala!!

20 ago. 2012

ETL Validations in Pentaho

We present a framework for validation. In our case very focused to validate that ETL load has worked well.

The purpose and use (desired) for this STValidations (free download for previous link), is to automate the validation routine and sometimes we fail to do just that. For routine.

Its use is quite simple and are in a version 0.1, so it can evolve much and that we have few use cases how to make sure their robustness and to cover all needs. Hope this code from Stratebi helps you.

ETL is a process that reads from a table a list of queries to be executed and the expected result. After that, run the query and compares it with the expected result if the result is correct is a log of executions correct and if the result is not expected is a log of botched executions. At the end sends an email with the two records to the user to decide so you can review the implementation.

Here you can see the aspect of work in general:

And the detail of the validation:

Simple ... powerful and versatile. From simple queries like "select count (*) from table" to validate that you have records to complex queries for comparison of values ​​in different tables. In fact, if what I see is data in a table so I think that most of the validations can be performed using this method.

You can follow instructions in this video tutorial:


You can check more videos in our Youtube Channel

The process, in step settings, read the configuration data from, check that there is a validation table and if not creates and inserts a validation sample. Then, the process reads all existing queries in the table, runs and compare the result with the expected result.

The lookup table looks like and filled manually (we are at version 0.1, remember .....):

We did that because we had some problems of inconsistency in some projects that made clear that the loads must be validated and is a tedious and repetitive work.
  • We must always validate charges. We have detected errors due to inconsistent or unexpected data input formats. And that can not be controlled unless the charges are validated.
  • Loads must be validated especially after making a change in the ETL and it is pretty boring, perform the same query validation.
  • We must make a series of routine consultations to ensure that the data is equal to the source.
  • If you can automate ... I'd rather be doing other things.

Strengths and weaknesses:
  • It is a version 0.1 we have tested in a couple of clients and so far so good. But we know that, for example, still can not compare data from different sources faith.
  • Queries can become heavy and this is a potentially VERY expensive in terms of resource consumption. That's why ETL is a process that can be run independently attached to the ETL process or in another moment of time with little load. We do it well. The process does not run right after the ETL but at 6:00 a.m. how a separate process.
  • Do not rely on the BI server. Simple and useful.

Well, I think it can be very useful in projects and we hope you find it useful too!!

By the way ... this is the error log to send:

2 comentarios:

Anónimo dijo...

Thanks for contribution, very useful

Anónimo dijo...

World bright and beautiful surpasses below half area: Liu Sha Sha wins De Biying Han of beautiful collision day is right definitely
Time of My147 dispatch Beijing on June 20 message, 9 balls tounament ended woman of world of cup of group of the buy on 2012 to leave half area 16 into the match of 8, liu Sha Sha is in China Chen Xue of the beat easily in heart comparing. In flower beauty collision, british famous general triumphant Li - Mo Ni of famous general of United States of gram of force of Fei Xue 9-7 gets stuck - Wei Bai, and be opposite in Han of a day definitely in, piao Enzhi of Korea famous general captures Ceng Gengong of Japanese famous general with 9-7 force likewise child.
Triumphant Li - on Fei Xue annulus with Korea small hag Jin Jia mirrors fight hard 17 bureaus win out, mo Ni gets stuck - Wei Bai criterion with pigeon of player of 9-4 conquer China, relaxed promotion. Benchangbi is surpassed at the beginning, mo Ni gets stuck - the strong ball bureau that Wei Bai expends snow with respect to shatter, those who obtain 2-0 is banner. Each other of after this both sides defeats strong ball bureau, the 5th bureau, wei Bai is thin 4 balls error, fei Xue captures an opportunity to finish clear stand. Wei Bai takes the 6th bureau after many setbacks, pull open cent difference 4-2 again. Fei Xue leaves two bureaus repeatedly subsequently, turn score make the same score for 4-4.
The 9th bureau, wei Bai is hit lose simple 8 balls, fei Xue gets the better of one bureau again. Bilateral subsequently again each other defeats strong ball bureau, fei Xue 6-5 is banner. Dozenth bureau and thirteenth bureau, feixue and Wei Bai are performed feel greatly. The 14th bureau, feixue develops ball effect not beautiful, dan Weibai fails to grasp an opportunity, fei Xue 8-6 obtains game ball. The 15th bureau, wei Bai is felt greatly redeem a game ball. The 16th bureau, dozens both sides is big defend battle, wei Bai tries to pass break up bag hit into 4 balls, but the opportunity that misses miscarriage to give Fei Xue free ball however, fei Xue is accomplished in one move, overcome Wei Bai with 9-7 force thereby, promotion 4 strong.
Go up round of match, ceng Gengong child 9-7 accident finishs the champion that defend crown Zhu Qing washs out a bureau, benchangbi surpasses a bureau, ceng Gengong child continued good position, come up to gain the advantage of 2-0. Nevertheless after this Piao Enzhi is finished in the 6th bureau time broken, bilateral battle is made the same score into 3-3. After 5-5 is smooth, piao Enzhi again shatter Ceng Gengong child strong ball bureau, dan Cenggen respectful child strick back instantly, bilateral again battle is smooth. After 7-7 is smooth, piao Enzhi eventually at a dash, after getting the better of two bureaus repeatedly with 9-7 conquer Ceng Gengong child.
Battle is compared in heart of a China in, liu Sha Sha with Chen Xue of 9-4 beat easily, she will encounter Ceng Gengong in battle of 4 strong contention child. And in another 1/8 final, ai Li dark - Fei Xue lags behind with 5-7 for a time weed of Tan He of [url=]cheap kids nike shoes[/url] China Taipei famous general, after this she turns score make the same score for 8-8. In bureau of decide the issue of the battle, the Tan He Yun with more stable state of mind infiltrates get the upper hand of 9 balls, thereby with 9-8 win by a narrow margin Ai Li dark - Fei Xue, she will with triumphant Li - Fei Xue contends for one Zhang Sijiang ticket. (My147 Ge Xin is happy) (manuscript origin: My stage net)