The Hadoop File Input step can be used to extract data from Hadoop cluster. This step can read comma separated, tab delimited , fixed width and other common types of text files.
Stepwise illustration on how to configure Pentaho Hadoop file input is given below.
Cloudera Quick Start VM used for demo purpose. Refer link for more info.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
Pentaho Kettle Data Integration Installation for Windows, Pentaho Data Integration ( PDI ) - Generic Design Guidelines
Thursday, January 30, 2014
Friday, January 24, 2014
Pentaho Data Integration : JSON input Step
JSON (JavaScript Object Notation) is a text based, light weight data inter change format.
This format enjoys a wide availability of implementations and is platform independent.
Stepwise illustration on the usage of Pentaho JSON step given below.
Step 1
This format enjoys a wide availability of implementations and is platform independent.
Stepwise illustration on the usage of Pentaho JSON step given below.
Step 1
Wednesday, January 22, 2014
Pentaho Data Integration : HTTP Client
HTTP Client provide ability to perform a call to a base URL with parameter values and return result value as a string. Sample transformation given below.
Free Yahoo finance API to download stock quotes used here for demo purpose.
Current stock prices with a 15 minute delay can be retrieved using this API.
Service return data in CSV format.
Base URL : http://finance.yahoo.com/d/quotes.csv
Free Yahoo finance API to download stock quotes used here for demo purpose.
Current stock prices with a 15 minute delay can be retrieved using this API.
Service return data in CSV format.
Base URL : http://finance.yahoo.com/d/quotes.csv
Monday, January 20, 2014
Pentaho Data Integration 5.0.2 - Configure DI server ( Linux )
Basic configuration steps for Pentaho Data Integration server given below. PDI installation using installation wizard on Linux OS is used for demo purposes. The server was installed on an included Apache Tomcat server.
Step 1 : Start DI Server
Script "ctlscript.sh" can be used to manage the DI server. Here are the available script arguments.
Sunday, January 5, 2014
Pentaho Business Analytics Enterprise Edition 5.0.2 - Installation for Linux
Pentaho Business Analytics (BA) Suite can be installed in several ways. Install All Components method chosen for demo purpose. Both Business Analytics and Data Integration components will be installed.
Here is a list of commonly used BA and DI components.
Friday, January 3, 2014
Pentaho Data Integration : Microsoft Excel Input
Microsoft Excel Input step can be used to integrate data from various Excel sources including open office work books. This step can extract data from Excel 97-2003 ( xls ) files or Excel 2007 ( xlsx ) files. Plz find stepwise illustration below on how to configure Pentaho Excel Input Step.
Step 1
Drag and drop Microsoft Excel Input Step into the Transformation design canvas.

Step 1
Drag and drop Microsoft Excel Input Step into the Transformation design canvas.

Subscribe to:
Posts (Atom)
-
This step can be used to perform various types of aggregations such as sum, average, min, max e.t.c. Input data always need to be sorted for...
-
Pentaho 7 is the latest Pentaho version with powerful features including enhanced big data security features and advanced data explor...
-
The Get Filenames step allows you to retrieve information associated with filenames in the file system. The obtained file name is added to ...