Apache Pig is a high level data analysis language capable of handling very high data volume. Ease of programming, parallelization, extensibility and optimization opportunities are some of the key features of this platform.
Pig Script Executor job entry step can be used to execute "Pig Latin" script on a Hadoop cluster.
Stepwise illustration on how to configure PDI "Pig Script Executor" is given below.
Pentaho Kettle Data Integration Installation for Windows, Pentaho Data Integration ( PDI ) - Generic Design Guidelines
Tuesday, February 18, 2014
Thursday, January 30, 2014
Pentaho Big Data : Hadoop File Input
The Hadoop File Input step can be used to extract data from Hadoop cluster. This step can read comma separated, tab delimited , fixed width and other common types of text files.
Stepwise illustration on how to configure Pentaho Hadoop file input is given below.
Cloudera Quick Start VM used for demo purpose. Refer link for more info.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
Stepwise illustration on how to configure Pentaho Hadoop file input is given below.
Cloudera Quick Start VM used for demo purpose. Refer link for more info.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
Friday, January 24, 2014
Pentaho Data Integration : JSON input Step
JSON (JavaScript Object Notation) is a text based, light weight data inter change format.
This format enjoys a wide availability of implementations and is platform independent.
Stepwise illustration on the usage of Pentaho JSON step given below.
Step 1
This format enjoys a wide availability of implementations and is platform independent.
Stepwise illustration on the usage of Pentaho JSON step given below.
Step 1
Wednesday, January 22, 2014
Pentaho Data Integration : HTTP Client
HTTP Client provide ability to perform a call to a base URL with parameter values and return result value as a string. Sample transformation given below.
Free Yahoo finance API to download stock quotes used here for demo purpose.
Current stock prices with a 15 minute delay can be retrieved using this API.
Service return data in CSV format.
Base URL : http://finance.yahoo.com/d/quotes.csv
Free Yahoo finance API to download stock quotes used here for demo purpose.
Current stock prices with a 15 minute delay can be retrieved using this API.
Service return data in CSV format.
Base URL : http://finance.yahoo.com/d/quotes.csv
Monday, January 20, 2014
Pentaho Data Integration 5.0.2 - Configure DI server ( Linux )
Basic configuration steps for Pentaho Data Integration server given below. PDI installation using installation wizard on Linux OS is used for demo purposes. The server was installed on an included Apache Tomcat server.
Step 1 : Start DI Server
Script "ctlscript.sh" can be used to manage the DI server. Here are the available script arguments.
Sunday, January 5, 2014
Pentaho Business Analytics Enterprise Edition 5.0.2 - Installation for Linux
Pentaho Business Analytics (BA) Suite can be installed in several ways. Install All Components method chosen for demo purpose. Both Business Analytics and Data Integration components will be installed.
Here is a list of commonly used BA and DI components.
Friday, January 3, 2014
Pentaho Data Integration : Microsoft Excel Input
Microsoft Excel Input step can be used to integrate data from various Excel sources including open office work books. This step can extract data from Excel 97-2003 ( xls ) files or Excel 2007 ( xlsx ) files. Plz find stepwise illustration below on how to configure Pentaho Excel Input Step.
Step 1
Drag and drop Microsoft Excel Input Step into the Transformation design canvas.

Step 1
Drag and drop Microsoft Excel Input Step into the Transformation design canvas.

Subscribe to:
Posts (Atom)
-
This step can be used to perform various types of aggregations such as sum, average, min, max e.t.c. Input data always need to be sorted for...
-
Pentaho 7 is the latest Pentaho version with powerful features including enhanced big data security features and advanced data explor...
-
Error Message Error connecting to database [ORA_TEST_JDBC] : org.pentaho.di.core.exception.KettleDatabaseException: Error occured while tr...