Thursday, January 30, 2014

Pentaho Big Data : Hadoop File Input

The Hadoop File Input step can be used to extract data from Hadoop cluster. This step can read comma separated, tab delimited , fixed width and other common types of text files.

Stepwise illustration on how to configure Pentaho Hadoop file input is given below.

Cloudera Quick Start VM used for demo purpose. Refer link for more info.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html

Step 1

Open Spoon and create a new transformation.
Expand Big data section on design palette and drag Hadoop File Input onto the Canvas.


Step 2

Configure the Input file name and location.

File "hdfs://${USER}:${PASSWORD}@localhost:8020/user/training/input/purchases.txt" used for demo purpose. Ensure that this file is available on HDFS.

[training@localhost conf]$ hadoop fs -ls /user/training/input
Found 1 items
-rw-r--r--   1 training supergroup  211312924 2014-01-26 13:33 /user/training/input/purchases.txt
[training@localhost conf]$


[training@localhost ~]$ hadoop fs -cat /user/training/input/purchases.txt | head -10
2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex
2012-01-01 09:00 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Diego Music 66.08 Cash
2012-01-01 09:00 Pittsburgh Pet Supplies 493.51 Discover
2012-01-01 09:00 Omaha Children's Clothing 235.63 MasterCard
2012-01-01 09:00 Stockton Men's Clothing 247.18 MasterCard
2012-01-01 09:00 Austin Cameras 379.6 Visa


Step 3

Configure the Input file type and format.
Tab separated file used for demo.


Step 4

Configure field names.
Can use "Get Fields" button if needed.


Step 5

Use "Preview rows" option to examine source data.

Step 6

Pass user name and password as parameters.

Step 7

Execution log and Results.




2014/01/29 15:22:53 - Spoon - Transformation opened.
2014/01/29 15:22:53 - Spoon - Launching transformation [tr_testar1]...
2014/01/29 15:22:53 - Spoon - Started the transformation execution.
2014/01/29 15:22:53 - tr_testar1 - Dispatching started for transformation [tr_testar1]
2014/01/29 15:22:54 - Hadoop File Input.0 - Opening file: hdfs://cloudera:***@localhost:8020/user/training/input/purchases.txt
2014/01/29 15:23:35 - Hadoop File Input.0 - Finished processing (I=4138476, O=0, R=0, W=4138476, U=1, E=0)
2014/01/29 15:23:35 - Dummy (do nothing).0 - Finished processing (I=0, O=0, R=4138476, W=4138476, U=0, E=0)
2014/01/29 15:23:35 - Spoon - The transformation has finished!!

30 comments:

  1. I had really like it very much for providing the great info is visible in this blog and the nice technology is visible in this blog Big data online course  | Big data and Hadoop Admin Training

    ReplyDelete
  2. You provided good records on sap it definitely is thrilling and helpful for the folks that are attempting to find big data. it's far accurate area to locate information on sap, many thanks concerning discussing facts.
    thank regards
    oracle fusion procurement online training
    oracle fusion procurement training

    ReplyDelete
  3. Hi,

    I am trying to connect PDI 7.1 comminuty with Cloudera 5.4
    I have followed ths steps in https://help.pentaho.com/Documentation/7.0/0H0/Set_Up_Pentaho_to_Connect_to_a_Cloudera_Cluster

    I want to read a HDFS File.
    I make the connection to Cloudera in 'Haddop clusters'

    Now I am using Hadoop file Input. This step can locate the file, but when it tried to read gives an error:

    "Couldn't open file #0 :hdfs://cloudera:***@192.168.132.128:8020/user/cloudera/pru.txt --> java.nio.channels.UnresolverdAddressException"

    The file and the directory og Cloudera has 777 permisions.

    Can you help me??

    Thanks

    ReplyDelete
  4. This site has lots of advantage awesome i really enjoyed reading thanks for sharing for grate info ..we are also providing calfre is India's local search destination, both online training institutes and offline institutes details.. It provides fast, free, reliable and comprehensive information to its users across internet, mobile internet, telephone (voice) and text (SMS) platforms. The Just type related course keyword this search engine shows top training institutes details.Devops training

    ReplyDelete
  5. With the enchanted investigation inferred out of constant information, they string the core of clients. data science course in pune

    ReplyDelete
  6. Thanks for sharing your valuable information to us, it is very useful.
    digital marketing course

    ReplyDelete
  7. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    Data Analytics Course in Mumbai

    ReplyDelete
  8. Excelr is providing emerging & trending technology training, such as for data science, Machine learning, Artificial Intelligence, AWS, Tableau, Digital Marketing. Excelr is standing as a leader in providing quality training on top demanding technologies in 2019. Excelr`s versatile training is making a huge difference all across the globe. Enable ?business analytics? skills in you, and the trainers who were delivering training on these are industry stalwarts. Get certification on "
    best data science course in hyderabad"
    and get trained with Excelr.

    ReplyDelete
  9. Such a very useful Blog. Very interesting to read this article. I have learn some new information.thanks for sharing. know more about

    ReplyDelete
  10. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    ExcelR data science

    ReplyDelete
  11. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    ExcelR Data Analytics courses

    ReplyDelete
  12. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    ExcelR Data Science Course

    ReplyDelete
  13. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one.
    data science course in india

    ReplyDelete
  14. I am a new user of this site so here i saw multiple articles and posts posted by this blog,I curious more interest in some of them hope you will give more information on this topics in your next articles. Data Science Courses

    ReplyDelete
  15. I am a new user of this site so here i saw multiple articles and posts posted by this blog,I curious more interest in some of them hope you will give more information on this topics in your next articles. Data Science Courses
    Data Scientist

    ReplyDelete

  16. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore

    ReplyDelete
  17. The information provided on the site is informative. Looking forward more such blogs. Thanks for sharing .
    Artificial Inteligence course in Aurangabad
    AI Course in Aurangabad

    ReplyDelete
  18. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
    data analytics courses mumbai

    ReplyDelete
  19. Banquets in Indore
    Thanks for sharing such information. This is really helpful for me. you can also visit our blog
    https://palmindore.in/blog/banquets-in-indore/

    ReplyDelete
  20. Thanks for sharing such information. This is really helpful for me. you can also visit our blog
    Banquets in Indore

    ReplyDelete
  21. I will really appreciate the writer's choice for choosing this excellent article appropriate to my matter.Here is deep description about the article matter which helped me more.
    I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
    Data Analytics Courses
    I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!

    ReplyDelete
  22. I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
    Machine Learning Courses in Pune I really enjoy reading and also appreciate your work.

    ReplyDelete
  23. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
    data science course in hyderabad with placements

    ReplyDelete
  24. Informative article. Thanks for sharing with us.keep it up.
    data science courses chennai

    ReplyDelete