Pentaho Big Data : Hadoop File Input

The Hadoop File Input step can be used to extract data from Hadoop cluster. This step can read comma separated, tab delimited , fixed width and other common types of text files.

Stepwise illustration on how to configure Pentaho Hadoop file input is given below.

Cloudera Quick Start VM used for demo purpose. Refer link for more info.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html

Step 1

Open Spoon and create a new transformation.
Expand Big data section on design palette and drag Hadoop File Input onto the Canvas.


Step 2

Configure the Input file name and location.

File "hdfs://${USER}:${PASSWORD}@localhost:8020/user/training/input/purchases.txt" used for demo purpose. Ensure that this file is available on HDFS.

[training@localhost conf]$ hadoop fs -ls /user/training/input
Found 1 items
-rw-r--r--   1 training supergroup  211312924 2014-01-26 13:33 /user/training/input/purchases.txt
[training@localhost conf]$


[training@localhost ~]$ hadoop fs -cat /user/training/input/purchases.txt | head -10
2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex
2012-01-01 09:00 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Diego Music 66.08 Cash
2012-01-01 09:00 Pittsburgh Pet Supplies 493.51 Discover
2012-01-01 09:00 Omaha Children's Clothing 235.63 MasterCard
2012-01-01 09:00 Stockton Men's Clothing 247.18 MasterCard
2012-01-01 09:00 Austin Cameras 379.6 Visa


Step 3

Configure the Input file type and format.
Tab separated file used for demo.


Step 4

Configure field names.
Can use "Get Fields" button if needed.


Step 5

Use "Preview rows" option to examine source data.

Step 6

Pass user name and password as parameters.

Step 7

Execution log and Results.




2014/01/29 15:22:53 - Spoon - Transformation opened.
2014/01/29 15:22:53 - Spoon - Launching transformation [tr_testar1]...
2014/01/29 15:22:53 - Spoon - Started the transformation execution.
2014/01/29 15:22:53 - tr_testar1 - Dispatching started for transformation [tr_testar1]
2014/01/29 15:22:54 - Hadoop File Input.0 - Opening file: hdfs://cloudera:***@localhost:8020/user/training/input/purchases.txt
2014/01/29 15:23:35 - Hadoop File Input.0 - Finished processing (I=4138476, O=0, R=0, W=4138476, U=1, E=0)
2014/01/29 15:23:35 - Dummy (do nothing).0 - Finished processing (I=0, O=0, R=4138476, W=4138476, U=0, E=0)
2014/01/29 15:23:35 - Spoon - The transformation has finished!!

32 comments :

ramya parvathaneni said...

Hi,
Thanks for providing information hadoop training provides by the online with ral time experts on
hadoop online training

Hadoop online training said...

Hi,
good content to viewers hadoop experts provides best online training on
hadoop online training
by real time experienced experts

kbs training institute said...

I had really like it very much for providing the great info is visible in this blog and the nice technology is visible in this blog Big data online course  | Big data and Hadoop Admin Training

oracle fusion hcm online Training said...

You provided good records on sap it definitely is thrilling and helpful for the folks that are attempting to find big data. it's far accurate area to locate information on sap, many thanks concerning discussing facts.
thank regards
oracle fusion procurement online training
oracle fusion procurement training

Calfre India said...

Oracle fusion Financials Training from CALFRE.COM gives you the best results to learn your dream course and maintains sufficient knowledge on oracle. It provides training by self-paced videos which are very helpful for
the users to watch at any time according to their schedule. It is globally accepted and having many users undergoing
training every day.


Oracle fusion Financials Training in hyderabad

Oracle Fusion Financials online Training in hyderabad

Datademy Formacion said...

Hi,

I am trying to connect PDI 7.1 comminuty with Cloudera 5.4
I have followed ths steps in https://help.pentaho.com/Documentation/7.0/0H0/Set_Up_Pentaho_to_Connect_to_a_Cloudera_Cluster

I want to read a HDFS File.
I make the connection to Cloudera in 'Haddop clusters'

Now I am using Hadoop file Input. This step can locate the file, but when it tried to read gives an error:

"Couldn't open file #0 :hdfs://cloudera:***@192.168.132.128:8020/user/cloudera/pru.txt --> java.nio.channels.UnresolverdAddressException"

The file and the directory og Cloudera has 777 permisions.

Can you help me??

Thanks

Wright Petter said...

Thanks for the post, I am techno savvy. I believe you hit the nail right on the head. I am highly impressed with your blog. It is very nicely explained. Your article adds best knowledge to our Java Online Training from India. or learn thru Java Online Training from India Students.

calfre services said...

This site has lots of advantage awesome i really enjoyed reading thanks for sharing for grate info ..we are also providing calfre is India's local search destination, both online training institutes and offline institutes details.. It provides fast, free, reliable and comprehensive information to its users across internet, mobile internet, telephone (voice) and text (SMS) platforms. The Just type related course keyword this search engine shows top training institutes details.Devops training

Unknown said...

I am really admired for the great info is visible in this blog that to lot of benefits for visiting the nice info in this website. Thanks a lot for using the nice info is visible in this blog.
Java training in chennai | Data Science Training in Chennai | DevOps Training in Chennai

zara said...

I really appreciate your information which you shared with us for Big Data Training in Hyderabad and Data Science Training in Hyderabad

Sanvi said...

It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time.

CEH Training In Hyderbad

Unknown said...

This post is much helpful for us. This is really very massive value to all the readers and it will be the only reason for the post to get popular with great authority.
Devops Training in Chennai
Devops Certification in Chennai
Big Data Training in Chennai
German Classes in Chennai
German Language Classes in Chennai
Python Training in Chennai
Devops Training in Adyar

markson said...

With the enchanted investigation inferred out of constant information, they string the core of clients. data science course in pune

venkatesh@CS said...

Excellent Blog. Thank you so much for sharing.
best react js training in chennai
react js training in Chennai
react js workshop in Chennai
react js courses in Chennai
react js training institute in Chennai
reactjs training Chennai
react js online training
react js online training india
react js course content
react js training courses
react js course syllabus
react js training
react js certification in chennai
best react js training

seoexpert said...

Nice Post...I have learn some new information.thanks for sharing.
ExcelR data analytics course in Pune | business analytics course | data scientist course in Pune

ExcelR Solutions said...

Thanks for sharing your valuable information to us, it is very useful.
digital marketing course

seoexpert said...

Such a very useful article. I have learn some new information.thanks for sharing.
data scientist course in mumbai

data science course in mumbai said...

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
Data Analytics Course in Mumbai

ameer said...

Excelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "
best data science course in hyderabad"
and get trained with Excelr.

Manikanta said...

Such a very useful Blog. Very interesting to read this article. I have learn some new information.thanks for sharing. know more about

Sankar said...

Great Article
IEEE Projects for CSE in Big Data
Final Year Project Centers in Chennai



Java Training in Chennai
Java Training in Chennai

ravali said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
ExcelR data science

Venkatesh CS said...

Thanks for sharing valuable information.
Digital Marketing training Course in chennai
digital marketing training institute in chennai
digital marketing training in Chennai
digital marketing course in Chennai
digital marketing course training in omr
digital marketing certification in omr
digital marketing course training in velachery
digital marketing training center in chennai
digital marketing courses with placement in chennai
digital marketing certification in chennai
digital marketing institute in Chennai
digital marketing certification course in Chennai
digital marketing course training in Chennai
Digital Marketing course in Chennai with placement
digital marketing courses in chennai

data science course in mumbai said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
ExcelR Data Analytics courses

datasciencecourse said...

Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
ExcelR Business Analytics Course

ek said...

Good to become visiting your weblog again, it has been months for me. Nicely this article that i've been waited for so long. I will need this post to total my assignment in the college, and it has exact same topic together with your write-up. Thanks, good share.
Please check ExcelR Data Science Course in Pune

ExcelR Pune said...

Awesome..I read this post so nice and very imformative information...thanks for sharing
Click here for data science course

Excelrsolutions said...

Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more. excelr data science

Data Science Courses In Mumbai said...

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
ExcelR data science course in mumbai

Priyanka said...

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
ExcelR Data Science Course

datasciencecourse said...

After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
data science course in mumbai

Jagna Co Kalani said...

Great Article
Data Mining Projects


Python Training in Chennai

Project Centers in Chennai

Python Training in Chennai

Post a Comment

UA-46724997-1