I recently had a chat with a friend who works for an SMS aggregator company and he told me about the challenges he has with analyzing some SMS data they log and that it takes him weeks to put together with Excel. I showed him how to do this in minutes with R and start taking coffee at work :).
The Problem: He frequently wants to to count(SUM) and aggregate things in data frames. For example, he wants to know how much revenue the different musicians made from the ringtones and under what categories etc
A snap shot of the raw data frame is below. You might want to download the zipped file here and load into your R session with below.
The Problem: He frequently wants to to count(SUM) and aggregate things in data frames. For example, he wants to know how much revenue the different musicians made from the ringtones and under what categories etc
A snap shot of the raw data frame is below. You might want to download the zipped file here and load into your R session with below.
> sms <- read.csv(“sms.csv”, sep=”,”, header=T )
> dim(sms)
> dim(sms)
[1] 624 10
You must have 10 columns and 624 rows to make sure you loaded the right data and well :).
customer_name report_date content_name cms_provider artist mrp
1 AIRTELUG 11/22/2012 AYAKA AA_SMSDONE NANJEGO 30
2 AIRTELUG 11/26/2012 CHOICE YANGE AA_SMSDONE GOEFREY LUTAYA 30
3 AIRTELUG 11/5/2012 OJANGA NOSABA AA_SMSDONE BUGEMBE AND BOBI WINE 30
4 AIRTELUG 11/14/2012 ENSI EKUBA AA_SMSDONE CHRIS EVANS 30
5 AIRTELUG 11/15/2012 MR KATAALA AA_SMSDONE BOBI WINE 30
6 AIRTELUG 11/3/2012 ABAKUBI BA PULAANI AA_SMSDONE GEOFREY LUTAAYA 30
cms_clip_promo_id trans_desc revenue count
1 2579072 Song Download 120 4
2 2579000 Song Renewal 10140 338
3 2579196 Song Renewal 960 32
4 2517666 Song Renewal 120 4
5 2517527 Song Renewal 330 11
6 2517665 Song Renewal 360 12
1 AIRTELUG 11/22/2012 AYAKA AA_SMSDONE NANJEGO 30
2 AIRTELUG 11/26/2012 CHOICE YANGE AA_SMSDONE GOEFREY LUTAYA 30
3 AIRTELUG 11/5/2012 OJANGA NOSABA AA_SMSDONE BUGEMBE AND BOBI WINE 30
4 AIRTELUG 11/14/2012 ENSI EKUBA AA_SMSDONE CHRIS EVANS 30
5 AIRTELUG 11/15/2012 MR KATAALA AA_SMSDONE BOBI WINE 30
6 AIRTELUG 11/3/2012 ABAKUBI BA PULAANI AA_SMSDONE GEOFREY LUTAAYA 30
cms_clip_promo_id trans_desc revenue count
1 2579072 Song Download 120 4
2 2579000 Song Renewal 10140 338
3 2579196 Song Renewal 960 32
4 2517666 Song Renewal 120 4
5 2517527 Song Renewal 330 11
6 2517665 Song Renewal 360 12
How many times does each artist name occur in our data? There is a guy called Hadley Wickham who came up with an excellent plyr package and that’s what we shall use through out this blog post.
> library(plyr)
> ?count
> count(sms, "artist")
> ?count
> count(sms, "artist")
artist freq
1 2 STARS├â┬§├é┬┐├é┬╜ 3
2 AK47 4
3 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2
4 ALICIOS FT JULIANA KANYOMOZI 1
5 AMBASSADORS OF CHRIST 2
6 ANGELLA KALULE 1
1 2 STARS├â┬§├é┬┐├é┬╜ 3
2 AK47 4
3 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2
4 ALICIOS FT JULIANA KANYOMOZI 1
5 AMBASSADORS OF CHRIST 2
6 ANGELLA KALULE 1
How many times did each artist appear on specific promo clip? In case you want to tally things up by more than one column use the c function to combine things into a vector:
> count(sms, c("artist", "cms_clip_promo_id"))
artist cms_clip_promo_id freq
1 2 STARS├â┬§├é┬┐├é┬╜ 2517483 2
2 2 STARS├â┬§├é┬┐├é┬╜ 2517589 1
3 AK47 2579164 2
4 AK47 2579166 2
5 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2517454 1
6 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2517590 1
7 ALICIOS FT JULIANA KANYOMOZI 2599168 1
8 AMBASSADORS OF CHRIST 2517492 2
9 ANGELLA KALULE 2517507 1
10 BAINE VIOLLA 2585894 1
artist cms_clip_promo_id freq
1 2 STARS├â┬§├é┬┐├é┬╜ 2517483 2
2 2 STARS├â┬§├é┬┐├é┬╜ 2517589 1
3 AK47 2579164 2
4 AK47 2579166 2
5 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2517454 1
6 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2517590 1
7 ALICIOS FT JULIANA KANYOMOZI 2599168 1
8 AMBASSADORS OF CHRIST 2517492 2
9 ANGELLA KALULE 2517507 1
10 BAINE VIOLLA 2585894 1
R just makes life easy. All i did was just tell count which data frame i was using, then which columns i want to tally by, and it does the counting very quickly and efficiently even on millions of rows.
How much did each artist song gain in revenue total? How much revenue did I it gain in relation to promo clip ? Now i came across the awesome part of aggregate that will do the job for this kind of figuring.
> aggregate(revenue ~ artist + cms_clip_promo_id, data = sms, sum)
artist cms_clip_promo_id revenue
1 BOBI WINE PREV PHILLY LUTAAYA 2517453 90
2 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2517454 990
3 JULIUS MUHOOZI 2517455 750
4 FRED SEBATTA├â┬§├é┬┐├é┬╜ 2517456 1020
5 RONALD MAYINJA 2517457 1560
6 BOBI WINE 2517458 150
7 LADY DIANA 2517460 30
8 SOPHIE NANTONGO 2517462 30
9 WILSON BUGEMBE 2517464 2610
artist cms_clip_promo_id revenue
1 BOBI WINE PREV PHILLY LUTAAYA 2517453 90
2 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 2517454 990
3 JULIUS MUHOOZI 2517455 750
4 FRED SEBATTA├â┬§├é┬┐├é┬╜ 2517456 1020
5 RONALD MAYINJA 2517457 1560
6 BOBI WINE 2517458 150
7 LADY DIANA 2517460 30
8 SOPHIE NANTONGO 2517462 30
9 WILSON BUGEMBE 2517464 2610
To a lay man this command can be interpreted as "I want to apply the sum function to the revenue column while aggregating rows based on unique values in the artist and cmc_clip_promo_id columns." #damn this is so easy and cool right?
How much did each artist gain in revenue total? Forget about aggregating by clip_promo_id, and just aggregate by artist name:
> aggregate(revenue ~ artist, data = sms, sum)
artist revenue
1 2 STARS├â┬§├é┬┐├é┬╜ 720
2 AK47 16320
3 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 1080
4 ALICIOS FT JULIANA KANYOMOZI 810
5 AMBASSADORS OF CHRIST 1170
6 ANGELLA KALULE 30
7 BAINE VIOLLA 30
8 BEBE COOL FT ALPHA 120
9 BOBI WINE 3360
10 BOBI WINE PREV PHILLY LUTAAYA 90
1 2 STARS├â┬§├é┬┐├é┬╜ 720
2 AK47 16320
3 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 1080
4 ALICIOS FT JULIANA KANYOMOZI 810
5 AMBASSADORS OF CHRIST 1170
6 ANGELLA KALULE 30
7 BAINE VIOLLA 30
8 BEBE COOL FT ALPHA 120
9 BOBI WINE 3360
10 BOBI WINE PREV PHILLY LUTAAYA 90
What was the mean revenue that the artists gained ? Change sum to mean in the formula:
> aggregate(revenue ~ artist, data = sms, mean)
artist revenue
1 2 STARS├â┬§├é┬┐├é┬╜ 240.00000
2 AK47 4080.00000
3 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 540.00000
4 ALICIOS FT JULIANA KANYOMOZI 810.00000
5 AMBASSADORS OF CHRIST 585.00000
6 ANGELLA KALULE 30.00000
7 BAINE VIOLLA 30.00000
8 BEBE COOL FT ALPHA 40.00000
9 BOBI WINE 240.00000
10 BOBI WINE PREV PHILLY LUTAAYA 90.00000
11 BODDO S S 30.00000
artist revenue
1 2 STARS├â┬§├é┬┐├é┬╜ 240.00000
2 AK47 4080.00000
3 AKAYO CHOIR├â┬§├é┬┐├é┬╜ 540.00000
4 ALICIOS FT JULIANA KANYOMOZI 810.00000
5 AMBASSADORS OF CHRIST 585.00000
6 ANGELLA KALULE 30.00000
7 BAINE VIOLLA 30.00000
8 BEBE COOL FT ALPHA 40.00000
9 BOBI WINE 240.00000
10 BOBI WINE PREV PHILLY LUTAAYA 90.00000
11 BODDO S S 30.00000
No comments:
Post a Comment
Add any comments if it helped :)