Thursday, January 30, 2014

Pentaho Big Data : Hadoop File Input

The Hadoop File Input step can be used to extract data from Hadoop cluster. This step can read comma separated, tab delimited , fixed width and other common types of text files.

Stepwise illustration on how to configure Pentaho Hadoop file input is given below.

Cloudera Quick Start VM used for demo purpose. Refer link for more info.

Friday, January 24, 2014

Pentaho Data Integration : JSON input Step

JSON (JavaScript Object Notation) is a text based, light weight data inter change format.
This format enjoys a wide availability of implementations and is platform independent.

Stepwise illustration on the usage of Pentaho JSON step given below.

Step 1

Wednesday, January 22, 2014

Pentaho Data Integration : HTTP Client

HTTP Client provide ability to perform a call to a base URL with parameter values and return result value as a string. Sample transformation given below.

Free Yahoo finance API to download stock quotes used here for demo purpose.
Current stock prices with a 15 minute delay can be retrieved using this API.
Service return data in CSV format.

Base URL :

Monday, January 20, 2014

Pentaho Data Integration 5.0.2 - Configure DI server ( Linux )

Basic configuration steps for Pentaho Data Integration server given below. PDI installation using installation wizard on Linux OS is used for demo purposes. The server was installed on an included Apache Tomcat server.

Step 1 : Start DI Server

Script "" can be used to manage the DI server. Here are the available script arguments.

Sunday, January 5, 2014

Pentaho Business Analytics Enterprise Edition 5.0.2 - Installation for Linux

Pentaho Business Analytics (BA) Suite can be installed in several ways. Install All Components method chosen for demo purpose. Both Business Analytics and Data Integration components will be installed.
Here is a list of commonly used BA and DI components.

Friday, January 3, 2014

Pentaho Data Integration : Microsoft Excel Input

Microsoft Excel Input step can be used to integrate data from various Excel sources including open office work books. This step can extract data from Excel 97-2003 ( xls ) files or Excel 2007 ( xlsx ) files.  Plz find stepwise illustration below on how to configure Pentaho Excel Input Step.

Step 1

Drag and drop Microsoft Excel Input Step into the Transformation design canvas.