Pentaho Big Data : Hadoop File Input

The Hadoop File Input step can be used to extract data from Hadoop cluster. This step can read comma separated, tab delimited , fixed width and other common types of text files.

Stepwise illustration on how to configure Pentaho Hadoop file input is given below.

Cloudera Quick Start VM used for demo purpose. Refer link for more info.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html

Step 1

Open Spoon and create a new transformation.
Expand Big data section on design palette and drag Hadoop File Input onto the Canvas.


Step 2

Configure the Input file name and location.

File "hdfs://${USER}:${PASSWORD}@localhost:8020/user/training/input/purchases.txt" used for demo purpose. Ensure that this file is available on HDFS.

[training@localhost conf]$ hadoop fs -ls /user/training/input
Found 1 items
-rw-r--r--   1 training supergroup  211312924 2014-01-26 13:33 /user/training/input/purchases.txt
[training@localhost conf]$


[training@localhost ~]$ hadoop fs -cat /user/training/input/purchases.txt | head -10
2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex
2012-01-01 09:00 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Diego Music 66.08 Cash
2012-01-01 09:00 Pittsburgh Pet Supplies 493.51 Discover
2012-01-01 09:00 Omaha Children's Clothing 235.63 MasterCard
2012-01-01 09:00 Stockton Men's Clothing 247.18 MasterCard
2012-01-01 09:00 Austin Cameras 379.6 Visa


Step 3

Configure the Input file type and format.
Tab separated file used for demo.


Step 4

Configure field names.
Can use "Get Fields" button if needed.


Step 5

Use "Preview rows" option to examine source data.

Step 6

Pass user name and password as parameters.

Step 7

Execution log and Results.




2014/01/29 15:22:53 - Spoon - Transformation opened.
2014/01/29 15:22:53 - Spoon - Launching transformation [tr_testar1]...
2014/01/29 15:22:53 - Spoon - Started the transformation execution.
2014/01/29 15:22:53 - tr_testar1 - Dispatching started for transformation [tr_testar1]
2014/01/29 15:22:54 - Hadoop File Input.0 - Opening file: hdfs://cloudera:***@localhost:8020/user/training/input/purchases.txt
2014/01/29 15:23:35 - Hadoop File Input.0 - Finished processing (I=4138476, O=0, R=0, W=4138476, U=1, E=0)
2014/01/29 15:23:35 - Dummy (do nothing).0 - Finished processing (I=0, O=0, R=4138476, W=4138476, U=0, E=0)
2014/01/29 15:23:35 - Spoon - The transformation has finished!!

30 comments :

kbs training institute said...

I had really like it very much for providing the great info is visible in this blog and the nice technology is visible in this blog Big data online course  | Big data and Hadoop Admin Training

oracle procurement said...

You provided good records on sap it definitely is thrilling and helpful for the folks that are attempting to find big data. it's far accurate area to locate information on sap, many thanks concerning discussing facts.
thank regards
oracle fusion procurement online training
oracle fusion procurement training

test said...

Hi,

I am trying to connect PDI 7.1 comminuty with Cloudera 5.4
I have followed ths steps in https://help.pentaho.com/Documentation/7.0/0H0/Set_Up_Pentaho_to_Connect_to_a_Cloudera_Cluster

I want to read a HDFS File.
I make the connection to Cloudera in 'Haddop clusters'

Now I am using Hadoop file Input. This step can locate the file, but when it tried to read gives an error:

"Couldn't open file #0 :hdfs://cloudera:***@192.168.132.128:8020/user/cloudera/pru.txt --> java.nio.channels.UnresolverdAddressException"

The file and the directory og Cloudera has 777 permisions.

Can you help me??

Thanks

Unknown said...

This site has lots of advantage awesome i really enjoyed reading thanks for sharing for grate info ..we are also providing calfre is India's local search destination, both online training institutes and offline institutes details.. It provides fast, free, reliable and comprehensive information to its users across internet, mobile internet, telephone (voice) and text (SMS) platforms. The Just type related course keyword this search engine shows top training institutes details.Devops training

zara said...

I really appreciate your information which you shared with us for Big Data Training in Hyderabad and Data Science Training in Hyderabad

markson said...

With the enchanted investigation inferred out of constant information, they string the core of clients. data science course in pune

ExcelR Solutions said...

Thanks for sharing your valuable information to us, it is very useful.
digital marketing course

data science course in mumbai said...

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
Data Analytics Course in Mumbai

ameer said...

Excelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "
best data science course in hyderabad"
and get trained with Excelr.

Manikanta said...

Such a very useful Blog. Very interesting to read this article. I have learn some new information.thanks for sharing. know more about

ravali said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
ExcelR data science

data science course in mumbai said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
ExcelR Data Analytics courses

Priyanka said...

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
ExcelR Data Science Course

Sanfrans said...

A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.
data science course in india

Sanfrans said...

I am a new user of this site so here i saw multiple articles and posts posted by this blog,I curious more interest in some of them hope you will give more information on this topics in your next articles. Data Science Courses

Sanfrans said...

I am a new user of this site so here i saw multiple articles and posts posted by this blog,I curious more interest in some of them hope you will give more information on this topics in your next articles. Data Science Courses
Data Scientist

Excelrsolutions said...


I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore

datasciencecourse said...

I am impressed by the information that you have on this blog. It shows how well you understand this subject.

data analytics courses

data science interview questions

business analytics courses

data science course in mumbai

Unknown said...

The information provided on the site is informative. Looking forward more such blogs. Thanks for sharing .
Artificial Inteligence course in Aurangabad
AI Course in Aurangabad

imexpert said...

Excellent Blog! Great Work and informative
data analytics course mumbai

Emma said...

Impressive! I finally found a great post here.
Python Certification Course in Delhi

j8899 said...

I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
data analytics courses mumbai

Atharv Joshi said...

Banquets in Indore
Thanks for sharing such information. This is really helpful for me. you can also visit our blog
https://palmindore.in/blog/banquets-in-indore/

Atharv Joshi said...

Thanks for sharing such information. This is really helpful for me. you can also visit our blog
Banquets in Indore

ek said...

I will really appreciate the writer's choice for choosing this excellent article appropriate to my matter.Here is deep description about the article matter which helped me more.
I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
Data Analytics Courses
I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!

ek said...

I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
Machine Learning Courses in Pune I really enjoy reading and also appreciate your work.

Excelr Tuhin said...

I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
data science course in hyderabad with placements

Huongkv said...

Mua vé máy bay tại Aivivu, tham khảo

vé máy bay từ hàn quốc sang việt nam

vé máy bay hà nội đi tphcm

săn vé máy bay đi hà nội

vé máy bay đi đà lạt vietnam airline

bay từ mỹ về việt nam

360DigiTMGAurangabad said...

Very Useful article
machine learning course in aurangabad

Unknown said...

Informative article. Thanks for sharing with us.keep it up.
data science courses chennai

Post a Comment

UA-46724997-1