Pentaho Data Integration : Google Analytics

Google Analytics service provide details about a website's traffic. This service track various statistics and can be integrated with AdWords to review online campaigns.

Pentaho Google Analytics step allows to extract Google Analytics data.
Stepwise illustration given below.

Step 1

Enable Google Analytics and generate API key.


Pentaho Common Errors : Driver class 'org.gjt.mm.mysql.Driver' could not be found

Error Message
Error connecting to database [MySQLDev] : org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database

Driver class 'org.gjt.mm.mysql.Driver' could not be found, make sure the 'MySQL' driver (jar file) is installed.
org.gjt.mm.mysql.Driver



Pentaho Big Data : Pig Script Executor

Apache Pig is a high level data analysis language capable of handling very high data volume. Ease of programming, parallelization, extensibility and optimization opportunities are some of the key features of this platform.

Pig Script Executor job entry step can be used to execute "Pig Latin" script on a Hadoop cluster.
Stepwise illustration on how to configure PDI "Pig Script Executor" is given below.


Pentaho Big Data : Hadoop File Input

The Hadoop File Input step can be used to extract data from Hadoop cluster. This step can read comma separated, tab delimited , fixed width and other common types of text files.

Stepwise illustration on how to configure Pentaho Hadoop file input is given below.

Cloudera Quick Start VM used for demo purpose. Refer link for more info.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html


Pentaho Data Integration : JSON input Step

JSON (JavaScript Object Notation) is a text based, light weight data inter change format.
This format enjoys a wide availability of implementations and is platform independent.

Stepwise illustration on the usage of Pentaho JSON step given below.

Step 1



Pentaho Data Integration : HTTP Client

HTTP Client provide ability to perform a call to a base URL with parameter values and return result value as a string. Sample transformation given below.

Free Yahoo finance API to download stock quotes used here for demo purpose.
Current stock prices with a 15 minute delay can be retrieved using this API.
Service return data in CSV format.

Base URL : http://finance.yahoo.com/d/quotes.csv



Pentaho Data Integration 5.0.2 - Configure DI server ( Linux )

Basic configuration steps for Pentaho Data Integration server given below. PDI installation using installation wizard on Linux OS is used for demo purposes. The server was installed on an included Apache Tomcat server.

Step 1 : Start DI Server

Script "ctlscript.sh" can be used to manage the DI server. Here are the available script arguments.


Pentaho Business Analytics Enterprise Edition 5.0.2 - Installation for Linux

Pentaho Business Analytics (BA) Suite can be installed in several ways. Install All Components method chosen for demo purpose. Both Business Analytics and Data Integration components will be installed.
 
Here is a list of commonly used BA and DI components.
 


Pentaho Data Integration : Microsoft Excel Input

Microsoft Excel Input step can be used to integrate data from various Excel sources including open office work books. This step can extract data from Excel 97-2003 ( xls ) files or Excel 2007 ( xlsx ) files.  Plz find stepwise illustration below on how to configure Pentaho Excel Input Step.
 

Step 1

Drag and drop Microsoft Excel Input Step into the Transformation design canvas.

UA-46724997-1