Pentaho Data Integration : Aggregation using Group By step

This step can be used to perform various types of aggregations such as sum, average, min, max e.t.c. Input data always need to be sorted for this step to work properly.

This step support following aggregation methods.
  1. Sum
  2. Average or Mean
  3. Median
  4. Minimum
  5. Maximum
  6. Standard deviation
  7. Percentile
  8. Number of values
  9. Number of distinct values 
  10. Number of rows
  11. First non-null value
  12. Last non-null value
  13. First value null inclusive
  14. Last value null inclusive
  15. Cumulative sum for all rows
  16. Cumulative average for all rows
  17. Concatenate strings separated by comma
  18. Concatenate strings separated by a value.

Here are the commonly used options for this step.

  • Group fields - The fields for Group by. 
  • Aggregates fields - Specify aggregate field names and aggregate methods.  
  • Include all rows - Output will contain all rows instead of just the aggregate rows 
  • Add a line number, Restart in each group -  Option to be used along with "Include all rows" 
  • Always give back a result row - To make sure there is an output row in case of no input rows ( for count of 0 ). 

Step wise illustration on how to use "Group By" step given below.


Step 1 

Create a demo transformation.

This demo transformation will read from a CSV input file which contain product information. "Group By" step will be used to aggregate product price based on product category. The input records need to be sorted to use Group BY. A sort step is used for this.

Here are the configurations for the CSV input and Sort step.

Step 2

Drag the "Group By" step into the canvas.















Step 3

Set the group by field and aggregate type.

Category is used as the group by field here for demo purposes. Sum, maximum, minimum, first value and last value selected as aggregate types.


















Step 4

Drag the "Dummy" step into the canvas to complete the flow.



Execution Log 


trf_Demo_Aggregates - Transformation is pre-loaded.
Spoon - Transformation opened.
Spoon - Launching transformation [trf_Demo_Aggregates]...
Spoon - Started the transformation execution.
trf_Demo_Aggregates - Dispatching started for transformation [trf_Demo_Aggregates]
trf_Demo_Aggregates - Allocating Steps & StepData...
trf_Demo_Aggregates -  Transformation is about to allocate step [CSV file input] of type [CsvInput]
CSV file input.0 - distribution activated
trf_Demo_Aggregates -  Transformation is about to allocate step [Sort rows] of type [SortRows]
Sort rows.0 - distribution activated
trf_Demo_Aggregates -  Transformation is about to allocate step [Group by] of type [GroupBy]
Group by.0 - distribution activated
Group by.0 - Starting allocation of buffers & new threads...
Sort rows.0 - The number of binary string to data type conversions done in this sort block is 0
Sort rows.0 - Finished processing (I=0, O=0, R=10000, W=10000, U=0, E=0)
Group by.0 - Finished processing (I=0, O=0, R=10000, W=26, U=0, E=0)
Dummy (do nothing).0 - Finished processing (I=0, O=0, R=26, W=26, U=0, E=0)
trf_Demo_Aggregates - Handling extension point for plugin with id 'DataRefineryTransFinishListener' and extension point id 'TransformationFinish'
Spoon - The transformation has finished!!




Include All rows - Disabled

Results preview where "Include all rows" option is disabled. Only aggregated rows will be there in the output, in this case one row per category since group by was done based on category.


 

Include All rows - Enabled

Results preview where "Include all rows" option is enabled. All input rows will be there in the output, aggregated values such as sum_price, max_price e.t.c will be repeated for the same category in the output.



 

Include All rows - Enabled with Line number

Results preview where "Include all rows" option is enabled and "Add line number" is enabled. All rows will be there in the output with a line number for each row.



 

Always give back a result row

Add a filter step to the dummy transformation to filter out all input records.
There will be no results rows if the "Always give back a result row" is disabled. Need to enable this to get output result row with Counts as 0 and other aggregate values as Null.










 

26 comments :

Jef Gielen said...

How does Percentile aggregation work?

Dhiya L said...

Wow amazing i saw the article with execution models you had posted. It was such informative. Really its a wonderful article. Thank you for sharing and please keep update like this type of article because i want to learn more relevant to this topic.

Hadoop Training in Chennai

Abiya Carol said...

Great post! I am actually getting ready to across this information, It's very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.

Informatica training in chennai

Sam Maron said...

Great article. It helps to explain complicated things to people who are not competent and explicate why this is important.
dataroom software

Arjun Rishi said...

Good post. I like your post, Thanks for sharing.
web design institute in chennai

Sathya G said...

this is really too useful and have more ideas from yours. keep sharing many techniques. eagerly waiting for your new blog and useful information. keep doing more.
Digital Marketing Course in Chennai

Unknown said...

Dude, there are too many ads. I can't even search. It anonying...

venkatweetz said...

Very Nice Blog I like the way you explained these things. I’ve been looking for ways to improve my website and overall rankings.I hope your future article will help me further.Take SEO Training in Chennai to mould yourself.

mary Brown said...

Thank You for sharing your article, This is an interesting & informative blog. It is very useful for the developer like me. Kindly keep blogging. Besides that Wisen has established as Best Corporate Training in Chennai .

Nowadays JavaScript has tons of job opportunities on various vertical industry. Know more about JavaScript Framework Training visit Corporate Training Companies in India.

This post gives me detailed information about the technology. corporate training companies in india

srihariparu said...


Its interesting and informative. Thanks for your great information
Embedded Project Center in Chennai | Best Embedded Project Center in Velachery

Mary Adlena said...

Quite a useful post, I learned some new points here. Thanks admin please keep posting updates regularly to enlighten our knowledge.
ME/M.Tech Project Center in Chennai | ME/M.Tech Project Center in Velachery

Ancy merina said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site
Website Design and Development Company in Bangalore
ECommerce Web Design Company in bangalore

David Jones said...

Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog.
Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Front end developer learn from Javascript Online Training from India .
or learn thru JavaScript Online Training from India. Nowadays JavaScript has tons of job opportunities on various vertical industry. ES6 Training in Chennai

john jersy said...

I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.

python training in chennai
python course in chennai
python training in bangalore

sudha P said...

I would assume that we use more than the eyes to gauge a person's feelings. Mouth. Body language. Even voice. You could at least have given us a face in this test.
Selenium training in Chennai | Selenium training institute in Chennai | Selenium course in Chennai

Selenium training in Bangalore | Selenium training institute in Bangalore | Selenium course in Bangalore
Java training in Chennai | Java training institute in Chennai | Java course in Chennai

Java training in Bangalore | Java training institute in Bangalore | Java course in Bangalore

john brito said...

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
rpa training in Chennai | rpa training in bangalore | best rpa training in bangalore | rpa course in bangalore | rpa training institute in bangalore | rpa training in bangalore | rpa online training

gowsalya said...

This is a nice article here with some useful tips for those who are not used-to comment that frequently. Thanks for this helpful information I agree with all points you have given to us. I will follow all of them.
Best Devops training in sholinganallur
Devops training in velachery
Devops training in annanagar
Devops training in tambaram

dhanya sridhar said...

Thank you for allowing me to read it, welcome to the next in a recent article. And thanks for sharing the nice article, keep posting or updating news article.
Data Science training in rajaji nagar | Data Science Training in Bangalore
Data Science with Python training in chennai
Data Science training in electronic city
Data Science training in USA
Data science training in pune

SANDY said...

Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.

Best AWS Training in Chennai | Amazon Web Services Training in Chennai
Learn Amazon Web Services Tutorial |Best AWS Tutorials For Beginners
Best AWS Interview Questions And Answers
Best AWS Training in Chennai | No.1 AWS Training Institute in Chennai Velachery, Tambaram, OMR

cathryn leno said...

hank you for benefiting from time to focus on this kind of, I feel firmly about it and also really like comprehending far more with this particular subject matter. In case doable, when you get know-how, is it possible to thoughts modernizing your site together with far more details? It’s extremely useful to me 
angularjs Training in chennai

angularjs Training in chennai

angularjs-Training in tambaram

angularjs-Training in sholinganallur

angularjs-Training in velachery

angularjs-Training in pune

priya said...

Thanks for posting this info. I just want to let you know that I just check out your site and I find it very interesting and informative. I can't wait to read lots of your posts.
Data science course in bangalore | Data Science training with placement in Bangalore

siva said...

It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.
Most of ideas can be nice content.The people to give them a good shake to get your point and across the command.

Java training in Bangalore|best Java training in Bangalore



WUGI said...

Do not think! Just come with us to BGAOC and play with us. best casino games come to us and win soon.

Saro said...

All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.
microsoft azure training in bangalore
rpa training in bangalore
best rpa training in bangalore
rpa online training

Belago said...

Супер отличная гибкая світлодіодна стрічка на любой вкус и цвет, обычно покупаю в интернет магазине.

Aditi Ray said...

Home Mart is a site about Home Improvement, Furniture, Home Appliances and many more.
Check out the best
bedroom furniture nz
entertainment unit
shoe rack

Post a Comment

UA-46724997-1