Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

RではじめるTwitter解析

28 017 vues

Publié le

Rユーザ会 2011 で発表した資料です

  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • accessibility Books Library allowing access to top content, including thousands of title from favorite author, plus the ability to read or download a huge selection of books for your pc or smartphone within minutes.........ACCESS WEBSITE Over for All Ebooks ..... (Unlimited) ......................................................................................................................... Download FULL PDF EBOOK here { https://urlzs.com/UABbn } .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

RではじめるTwitter解析

  1. 1. R Twitter R 2011 (2011/11/26) @a_bicky
  2. 2. • Takeshi Arabiki ‣ ‣ Twitter & : @a_bicky & id:a_bicky• R• http://d.hatena.ne.jp/a_bicky/
  3. 3. R Osaka.R #4 Tokyo.R #16 Tsukuba.R #9http://www.slideshare.net/abicky/twitterr http://www.slideshare.net/abicky/r-9034336 http://www.slideshare.net/abicky/r-10128090
  4. 4.
  5. 5. Twitter
  6. 6. Mentionmapp
  7. 7. Mentionmapp
  8. 8. Mentionmapp
  9. 9. http://twilog.org/ http://twitraq.userlocal.jp/http://whotwi.com/ http://tweetstats.com/
  10. 10. http://twilog.org/ http://twitraq.userlocal.jp/ Rhttp://whotwi.com/ http://tweetstats.com/
  11. 11. Twitter•• reshape2• ggplot2•
  12. 12. Twitter•• reshape2• ggplot2•
  13. 13. twitteR twitteR> library(twitteR) # twitteR> # (twitteR 0.99.15 )> Sys.setlocale("LC_TIME", "C")[1] "C"> # @a_bicky 3,200 RT> statuses <- userTimeline("a_bicky", n = 3200)
  14. 14. status> # R5> ls.str(statuses[[1]])created : POSIXct[1:1], format: "2011-11-23 22:16:24"favorited : logi FALSE ↑ UTCid : chr "139467359571296256"initFields : Formal class refMethodDef [package "methods"]with 5 slotsinitialize : Formal class refMethodDef [package "methods"]with 5 slotsreplyToSID : chr(0)replyToSN : chr(0)replyToUID : chr(0)screenName : chr "a_bicky" ! TwitterstatusSource : chr "<a href="http://sites.google.com/site/yorufukurou/" rel="nofollow">YoruFukurou</a>"text : chr " "truncated : logi FALSE ↑
  15. 15. > statusDF <- twListToDF(statuses)> str(statusDF, vec.len = 1)data.frame: 3159 obs. of 10 variables: $ text : chr " " ... ↑ $ favorited : logi FALSE ... $ replyToSN : logi NA ... $ created : POSIXct, format: "2011-11-23 22:16:24" ... $ truncated : logi FALSE ... ↑ UTC $ replyToSID : logi NA ... $ id : chr "139467359571296256" ... $ replyToUID : logi NA ... $ statusSource: chr "<a href="http://sites.google.com/site/yorufukurou/" rel="nofollow">YoruFukurou</a>" ... $ screenName : chr "a_bicky" ...
  16. 16. > wday.abb <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")> statusDF <- within(statusDF, {+ attr(created, "tzone") <- "Asia/Tokyo" # JST+ statusSource <- factor(gsub("<a .*?>(.*?)</a>", "1",statusSource)) # HTML+ date <- factor(format(created, "%Y-%m-%d")) #+ hour <- NULL; month <- NULL; year <- NULL; wday <- NULL+ with(as.POSIXlt(created), {+ hour <<- factor(hour) #+ month <<- factor(mon + 1) #+ year <<- factor(year + 1900) #+ wday <<- factor((wday + 6) %% 7, labels = wday.abb) #+ })+ textLength <- nchar(text) #+ # , URL,+ cleanText <- removeSpecialStr(text)+ cleanTextLength <- nchar(cleanText) # URL+ })
  17. 17. > # Twitter> topSources <- names(head(sort(table(statusDF$statusSource),decreasing = TRUE), 5))> statusDF <- within(statusDF, {+ statusSource <- as.character(statusSource)+ statusSource[!statusSource %in% topSources] <- "other"+ #+ statusSource <- factor(statusSource, levels = names(sort(table(statusSource), dec = TRUE)))+ })
  18. 18. Twitter•• reshape2• ggplot2•
  19. 19. reshape2
  20. 20. Excel9 11 ”Twitter for iPhone”, ”YoruFukurou” Sat Mon 12 23
  21. 21. reshape2> library(reshape2)> acast(melt(statusDF, id.vars = c("statusSource", "wday", "month", "hour"),+ measure.vars = c("textLength")),+ month + statusSource ~ wday, mean,+ subset = .(statusSource %in% c("YoruFukurou", "Twitter for iPhone")+ & month %in% 9:11 & hour %in% 12:23+ & wday %in% c("Mon", "Sat", "Sun"))) Mon Sat Sun9_YoruFukurou 43 42.13333 54.764719_Twitter for iPhone 16 27.70000 20.5000010_YoruFukurou 61 41.70175 56.9833310_Twitter for iPhone NaN 27.00000 24.5000011_YoruFukurou 35 41.08197 57.3260911_Twitter for iPhone NaN NaN 32.00000
  22. 22. reshape2> library(reshape2)> acast(melt(statusDF, id.vars = c("statusSource", "wday", "month", "hour"),+ measure.vars = c("textLength")),+ month + statusSource ~ wday, mean,+ subset = .(statusSource %in% c("YoruFukurou", "Twitter for iPhone")+ & month %in% 9:11 & hour %in% 12:23+ & wday %in% c("Mon", "Sat", "Sun"))) Mon Sat Sun9_YoruFukurou 43 42.13333 54.764719_Twitter for iPhone 16 27.70000 20.5000010_YoruFukurou 61 41.70175 56.9833310_Twitter for iPhone NaN 27.00000 24.5000011_YoruFukurou 35 41.08197 57.3260911_Twitter for iPhone NaN NaN 32.00000
  23. 23. reshape2> library(reshape2)> acast(melt(statusDF, id.vars = c("statusSource", "wday", "month", "hour"),+ measure.vars = c("textLength")),+ month + statusSource ~ wday, mean,+ subset = .(statusSource %in% c("YoruFukurou", "Twitter for iPhone")+ & month %in% 9:11 & hour %in% 12:23+ & wday %in% c("Mon", "Sat", "Sun"))) Mon Sat Sun9_YoruFukurou 43 42.13333 54.764719_Twitter for iPhone 16 27.70000 20.5000010_YoruFukurou 61 41.70175 56.9833310_Twitter for iPhone NaN 27.00000 24.5000011_YoruFukurou 35 41.08197 57.3260911_Twitter for iPhone NaN NaN 32.00000 R
  24. 24. reshape2 melt melt cast meltcast> mstatus <- melt(statusDF,+ id.vars = c("statusSource", "wday", "year", "month", "hour", "date"),+ measure.vars = c("textLength", "cleanTextLength"))> mstatus[3157:3162, ] statusSource wday year month hour date variable value3157 web Sun 2011 3 20 2011-03-13 textLength 723158 web Sun 2011 3 16 2011-03-13 textLength 243159 web Sun 2011 3 14 2011-03-13 textLength 823160 YoruFukurou Wed 2011 11 1 2011-11-23 cleanTextLength 873161 YoruFukurou Wed 2011 11 1 2011-11-23 cleanTextLength 143162 YoruFukurou Wed 2011 11 1 2011-11-23 cleanTextLength 21 id
  25. 25. reshape2 cast castformula fun.aggregate> args(acast) # array acastfunction (data, formula, fun.aggregate = NULL, ..., margins = NULL, subset = NULL, fill = NULL, drop = TRUE, value_var = guess_value(data))NULL> args(dcast) # data.frame dcastfunction (data, formula, fun.aggregate = NULL, ..., margins = NULL, subset = NULL, fill = NULL, drop = TRUE, value_var = guess_value(data))NULLformula....acast hoge ~ fuga ~ piyo※dcast 1 hoge ~ fuga + piyo
  26. 26. > #> acast(mstatus, . ~ wday, length, subset = .(variable == "textLength")) ↑ cleanTextLength
  27. 27. > #> acast(mstatus, . ~ wday, length, subset = .(variable == "textLength")) Mon Tue Wed Thu Fri Sat Sun[1,] 408 360 258 294 334 801 704>
  28. 28. > #> acast(mstatus, . ~ wday, length, subset = .(variable == "textLength")) Mon Tue Wed Thu Fri Sat Sun[1,] 408 360 258 294 334 801 704>
  29. 29. > #> acast(mstatus, . ~ wday, length, subset = .(variable == "textLength")) Mon Tue Wed Thu Fri Sat Sun[1,] 408 360 258 294 334 801 704> #> acast(mstatus, hour ~ wday, length, subset = .(variable =="textLength"))
  30. 30. > #> acast(mstatus, . ~ wday, length, subset = .(variable == "textLength")) Mon Tue Wed Thu Fri Sat Sun[1,] 408 360 258 294 334 801 704> #> acast(mstatus, hour ~ wday, length, subset = .(variable =="textLength")) Mon Tue Wed Thu Fri Sat Sun0 65 69 26 46 46 49 401 48 19 11 15 27 44 372 31 24 6 16 17 23 173 27 19 4 11 14 17 104 4 15 1 7 4 5 75 5 11 1 4 3 4 56 4 14 3 6 9 8 1
  31. 31. > #> #> acast(mstatus, hour ~ wday + month, length, subset = .(variable =="textLength"))
  32. 32. > #> #> acast(mstatus, hour ~ wday + month, length, subset = .(variable =="textLength")) Mon_3 Mon_4 Mon_5 Mon_6 Mon_7 Mon_8 Mon_9 Mon_10 Mon_11 Tue_3 Tue_40 3 4 13 3 1 10 7 15 9 4 21 0 0 1 0 1 9 16 12 9 1 02 2 0 0 0 2 7 6 7 7 2 0
  33. 33. > #> #> acast(mstatus, hour ~ wday + month, length, subset = .(variable =="textLength")) Mon_3 Mon_4 Mon_5 Mon_6 Mon_7 Mon_8 Mon_9 Mon_10 Mon_11 Tue_3 Tue_40 3 4 13 3 1 10 7 15 9 4 21 0 0 1 0 1 9 16 12 9 1 02 2 0 0 0 2 7 6 7 7 2 0> # 3> acast(mstatus, hour ~ wday ~ month, length, subset = .(variable =="textLength"))
  34. 34. > #> #> acast(mstatus, hour ~ wday + month, length, subset = .(variable =="textLength")) Mon_3 Mon_4 Mon_5 Mon_6 Mon_7 Mon_8 Mon_9 Mon_10 Mon_11 Tue_3 Tue_40 3 4 13 3 1 10 7 15 9 4 21 0 0 1 0 1 9 16 12 9 1 02 2 0 0 0 2 7 6 7 7 2 0> # 3> acast(mstatus, hour ~ wday ~ month, length, subset = .(variable =="textLength")), , 3 Mon Tue Wed Thu Fri Sat Sun0 3 4 1 0 1 6 41 0 1 3 0 0 0 1
  35. 35. > #> #> acast(mstatus, hour ~ wday + month, length, subset = .(variable =="textLength")) Mon_3 Mon_4 Mon_5 Mon_6 Mon_7 Mon_8 Mon_9 Mon_10 Mon_11 Tue_3 Tue_40 3 4 13 3 1 10 7 15 9 4 21 0 0 1 0 1 9 16 12 9 1 02 2 0 0 0 2 7 6 7 7 2 0> # 3> acast(mstatus, hour ~ wday ~ month, length, subset = .(variable =="textLength")), , 3 3 Mon Tue Wed Thu Fri Sat Sun0 3 4 1 0 1 6 41 0 1 3 0 0 0 1
  36. 36. Twitter reshape2 1> #> dcast(mstatus, statusSource ~ .,+ function(x) list(c(mean = mean(x), sd = sd(x))),+ fill = list(c(mean = NaN, sd = NA)), ←+ subset = .(variable == "textLength"))
  37. 37. Twitter reshape2 1> #> dcast(mstatus, statusSource ~ .,+ function(x) list(c(mean = mean(x), sd = sd(x))),+ fill = list(c(mean = NaN, sd = NA)), ←+ subset = .(variable == "textLength")) statusSource NA1 YoruFukurou 47.51462, 32.579732 web 57.02720, 36.335343 Twitter for iPhone 33.42342, 23.064664 Twitter for Android 28.49048, 20.084575 Hatena 80.00000, 25.942126 other 52.58621, 33.12180>
  38. 38. Twitter reshape2 1> #> dcast(mstatus, statusSource ~ .,+ function(x) list(c(mean = mean(x), sd = sd(x))),+ fill = list(c(mean = NaN, sd = NA)), ←+ subset = .(variable == "textLength")) statusSource NA1 YoruFukurou 47.51462, 32.579732 web 57.02720, 36.335343 Twitter for iPhone 33.42342, 23.064664 Twitter for Android 28.49048, 20.084575 Hatena 80.00000, 25.942126 other 52.58621, 33.12180>
  39. 39. > # t> pc <- unlist(subset(statusDF,+ statusSource %in% c("YoruFukurou", "web"),+ textLength))> sp <- unlist(subset(statusDF,+ grepl("(iPhone|Android)", statusSource),+ textLength))> t.test(sp, pc, var.equal = FALSE) Welch Two Sample t-test !!data: sp and pct = -15.7921, df = 1588.246, p-value < 2.2e-16alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -19.85334 -15.46645sample estimates:mean of x mean of y 31.83945 49.49935
  40. 40. > # t> pc <- unlist(subset(statusDF,+ statusSource %in% c("YoruFukurou", "web"),+ textLength))> sp <- unlist(subset(statusDF,+ grepl("(iPhone|Android)", statusSource),+ textLength))> t.test(sp, pc, var.equal = FALSE) Welch Two Sample t-test !!data: sp and pct = -15.7921, df = 1588.246, p-value < 2.2e-16alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -19.85334 -15.46645sample estimates:mean of x mean of y 31.83945 49.49935
  41. 41. > extractScreenNames <- function(text, strict = TRUE) {+ if (strict) {+ # Twitter screen_name+ regex <- "(?:(?<!w)([@ ])((?>w+))(?![@ ])|[sS])"+ } else {+ # hoge@example.com+ regex <- "(?:([@ ])(w+)|[sS])"+ }+ screenNames <- gsub(regex, "12", text, perl = TRUE)+ unique(unlist(strsplit(substring(screenNames, 2), "[@ ]")))+ }> screenNames <- unlist(lapply(statusDF$text, extractScreenNames))> head(sort(table(screenNames), decreasing = TRUE), 10) # Top 10screenNames naopr __gfx__ hirota_inoue mandy_44 ask_a_lie 105 85 51 47 40 ken_nishi nokuno yokkuns JinJin0613 kanon19_rie 39 39 33 20 20
  42. 42. Twitter•• reshape2• ggplot2•
  43. 43. ggplot2
  44. 44. ggplot2plot(statusDF$wday, col = "blue") ggplot2 qplot(wday, data = statusDF, fill = I("blue"), alpha = I(0.7), xlab = "", ylab = "")
  45. 45. ggplot2qplot(wday, data = statusDF, fill = statusSource, xlab = "", ylab = "")
  46. 46. ggplot2 qplot(wday, data = statusDF, facets = ~ statusSource, fill = I("blue"), alpha = I(0.7), xlab = "", ylab = "")qplot(wday, data = statusDF, fill = statusSource, xlab = "", ylab = "")
  47. 47. ggplot2 qplot(wday, data = statusDF, facets = ~ statusSource, fill = I("blue"), alpha = I(0.7), xlab = "", ylab = "")qplot(wday, data = statusDF, fill = statusSource, xlab = "", ylab = "")
  48. 48. qplot ggplot2> args(qplot)function (x, y = NULL, z = NULL, ..., data, facets = . ~ ., margins =FALSE, geom = "auto", stat = list(NULL), position = list(NULL), xlim = c(NA, NA), ylim = c(NA, NA), log = "", main = NULL, xlab = deparse(substitute(x)), ylab = deparse(substitute(y)), asp = NA)NULL
  49. 49. qplot geom geomarea:bar:histogram:line:point:
  50. 50. qplot geom geomarea:bar:histogram:line:point: qplot(as.integer(wday), data = statusDF, geom = "area", stat = "bin", fill = statusSource, xlab = "", ylab = "", binwidth = 1)
  51. 51. qplot geom geomarea:bar:histogram:line:point: qplot(wday, data = statusDF, geom = "bar", stat = "bin", fill = statusSource, xlab = "", ylab = "")
  52. 52. qplot geom geomarea:bar:histogram:line:point: qplot(as.integer(wday), data = statusDF, geom = "line", stat = "bin", colour = statusSource, xlab = "", ylab = "", binwidth = 1)
  53. 53. qplot geom geomarea:bar:histogram:line:point: qplot(wday, data = statusDF, geom = "point", stat = "bin", colour = statusSource, xlab = "", ylab = "")
  54. 54. qplot position position geomdodge :fill : 1jitter :stack :
  55. 55. qplot position position geomdodge :fill : 1jitter :stack : qplot(wday, data = statusDF, fill = statusSource, position = "dodge", xlab = "", ylab = "")
  56. 56. qplot position position geomdodge :fill : 1jitter :stack : qplot(wday, data = statusDF, fill = statusSource, position = "fill", xlab = "", ylab = "")
  57. 57. qplot position position geomdodge :fill : 1jitter :stack : qplot(wday, data = statusDF, fill = statusSource, position = "jitter", xlab = "", ylab = "")
  58. 58. qplot position position geomdodge :fill : 1jitter :stack : qplot(wday, data = statusDF, fill = statusSource, position = "stack", xlab = "", ylab = "")
  59. 59. qplot facets facets geom~ : 1 ~ 2: 1, 2※reshape2 1 ~ 2 + 3
  60. 60. qplot facets facets geom~ : 1 ~ 2: 1, 2※reshape2 1 ~ 2 + 3 qplot(wday, data = statusDF, xlab = "", ylab = "", facets = ~ statusSource)
  61. 61. qplot facets facets geom~ : 1 ~ 2: 1, 2※reshape2 1 ~ 2 + 3 qplot(wday, data = statusDF, xlab = "", ylab = "", facets = month ~ statusSource)
  62. 62. qplotalpha :colour (color) :fill :linetype :size :colour, fill, linetype statusSource fill = I("blue") I (AsIs)
  63. 63. qplotalpha :colour (color) :fill :linetype :size : qplot(wday, data = statusDF, xlab = "", ylab = "", alpha = as.integer(wday))
  64. 64. qplotalpha :colour (color) :fill :linetype :size : qplot(wday, data = statusDF, xlab = "", ylab = "", colour = statusSource)
  65. 65. qplotalpha :colour (color) :fill :linetype :size : qplot(wday, data = statusDF, xlab = "", ylab = "", fill = statusSource)
  66. 66. qplotalpha :colour (color) :fill :linetype :size : qplot(wday, data = statusDF, xlab = "", ylab = "", linetype = statusSource, colour = statusSource)
  67. 67. whotwi http://whotwi.com/
  68. 68. whotwi http://whotwi.com/
  69. 69. whotwi> # Twitter> # melt cast xtabs> cnt <- as.data.frame(xtabs(~ hour + wday + statusSource, statusDF))> head(cnt, 3) hour wday statusSource Freq1 0 Mon YoruFukurou 482 1 Mon YoruFukurou 383 2 Mon YoruFukurou 25
  70. 70. whotwi> # Twitter> # melt cast xtabs> cnt <- as.data.frame(xtabs(~ hour + wday + statusSource, statusDF))> head(cnt, 3) hour wday statusSource Freq1 0 Mon YoruFukurou 482 1 Mon YoruFukurou 383 2 Mon YoruFukurou 25> freqSources <- by(cnt, cnt[c("hour", "wday")], function(df) {+ #+ freqSource <- with(df, statusSource[order(Freq, decreasing = TRUE)[1]])+ cbind(df[1, c("hour", "wday")], freqSource)+ })> freqSources <- do.call(rbind, freqSources)> head(freqSources, 3) hour wday freqSource1 0 Mon YoruFukurou2 1 Mon YoruFukurou3 2 Mon YoruFukurou
  71. 71. whotwi> #> cntSum <- as.data.frame(xtabs(Freq ~ hour + wday, cnt))> head(cntSum, 3) hour wday Freq1 0 Mon 652 1 Mon 483 2 Mon 31
  72. 72. whotwi> #> cntSum <- as.data.frame(xtabs(Freq ~ hour + wday, cnt))> head(cntSum, 3) hour wday Freq1 0 Mon 652 1 Mon 483 2 Mon 31> #> data <- merge(cntSum, freqSources)> #> data$wday <- factor(data$wday, levels = rev(levels(data$wday)))> #> data$Freq <- log2(data$Freq)
  73. 73. whotwi> #> cntSum <- as.data.frame(xtabs(Freq ~ hour + wday, cnt))> head(cntSum, 3) hour wday Freq1 0 Mon 652 1 Mon 483 2 Mon 31> #> data <- merge(cntSum, freqSources)> #> data$wday <- factor(data$wday, levels = rev(levels(data$wday)))> #> data$Freq <- log2(data$Freq)> p <- qplot(hour, wday, data = data, xlab = "", ylab = "",+ geom = "point", colour = freqSource, size = Freq)> p # print(p)
  74. 74. whotwi> #> cntSum <- as.data.frame(xtabs(Freq ~ hour + wday, cnt))> head(cntSum, 3) hour wday Freq1 0 Mon 652 1 Mon 483 2 Mon 31> #> data <- merge(cntSum, freqSources)> #> data$wday <- factor(data$wday, levels = rev(levels(data$wday)))> #> data$Freq <- log2(data$Freq)> p <- qplot(hour, wday, data = data, xlab = "", ylab = "",+ geom = "point", colour = freqSource, size = Freq)> p # print(p)
  75. 75. whotwi
  76. 76. whotwi
  77. 77. whotwi> # whotwi theme> theme_whotwi <- function() {+ opts( #+ panel.background = theme_rect(fill = NA, colour = NA),+ #+ legend.key = theme_rect(fill = NA, colour = NA),+ #+ axis.ticks = theme_segment(colour = NA))+ }> p2 <- p + theme_whotwi() + scale_size(legend = FALSE) +scale_colour_hue(name = "")> p2
  78. 78. whotwi> # whotwi theme> theme_whotwi <- function() {+ opts( #+ panel.background = theme_rect(fill = NA, colour = NA),+ #+ legend.key = theme_rect(fill = NA, colour = NA),+ #+ axis.ticks = theme_segment(colour = NA))+ }> p2 <- p + theme_whotwi() + scale_size(legend = FALSE) +scale_colour_hue(name = "")> p2
  79. 79. whotwi
  80. 80. whotwi
  81. 81. whotwi
  82. 82. whotwi PC
  83. 83. whotwiPC PC
  84. 84. Twitter•• reshape2• ggplot2•
  85. 85. TweetSentiments
  86. 86. TweetSentimentsR
  87. 87. 1. RMeCab2.3.
  88. 88. RMeCab MeCab R> library(RMeCab)> (docDF(data.frame(" "), column = 1, type = 1))number of extracted terms = 5now making a data frame. wait a while! TERM POS1 POS2 Row11 12 13 14 25 2
  89. 89. http://www.lr.pi.titech.ac.jp/~takamura/pndic_ja.html : : :1 : : :0.999995 : : :0.999979 : : :0.999979 : : :0.999645 : : :0.999486 : : :0.999314...
  90. 90. > #> pndic <- read.table("http://www.lr.pi.titech.ac.jp/~takamura/pubs/pn_ja.dic",+ sep = ":",+ col.names = c("term", "kana", "pos", "value"),+ colClasses = c("character", "character", "factor","numeric"),+ fileEncoding = "Shift_JIS")> #> #> pndic2 <- aggregate(value ~ term + pos, pndic, mean)
  91. 91. > # pndic> pos <- unique(pndic2$pos)> tweetDF <- docDF(statusDF, column = "cleanText", type = 1, pos = pos)number of extracted terms = 7164now making a data frame. wait a while!> tweetDF[2900:2904, 1:5] TERM POS1 POS2 Row1 Row22900 0 02901 0 02902 0 02903 0 02904 0 0> # pndic> tweetDF <- subset(tweetDF, TERM %in% pndic2$term)> #> tweetDF <- merge(tweetDF, pndic2, by.x = c("TERM", "POS1"), by.y = c("term", "pos"))
  92. 92. > #> score <- colSums(tweetDF[4:(ncol(tweetDF) - 1)] * tweetDF$value)> #> sum(score > 0)[1] 117
  93. 93. > #> score <- colSums(tweetDF[4:(ncol(tweetDF) - 1)] * tweetDF$value)> #> sum(score > 0)[1] 117
  94. 94. > #> score <- colSums(tweetDF[4:(ncol(tweetDF) - 1)] * tweetDF$value)> #> sum(score > 0)[1] 117> #> sum(score < 0)[1] 2765
  95. 95. > #> score <- colSums(tweetDF[4:(ncol(tweetDF) - 1)] * tweetDF$value)> #> sum(score > 0)[1] 117> #> sum(score < 0)[1] 2765
  96. 96. > #> score <- colSums(tweetDF[4:(ncol(tweetDF) - 1)] * tweetDF$value)> #> sum(score > 0)[1] 117> #> sum(score < 0)[1] 2765> #> sum(score == 0)[1] 277
  97. 97. > #> score <- colSums(tweetDF[4:(ncol(tweetDF) - 1)] * tweetDF$value)> #> sum(score > 0)[1] 117> #> sum(score < 0)[1] 2765> #> sum(score == 0)[1] 277
  98. 98. > #> score <- colSums(tweetDF[4:(ncol(tweetDF) - 1)] * tweetDF$value)> #> sum(score > 0)[1] 117> #> sum(score < 0)[1] 2765> #> sum(score == 0)[1] 277
  99. 99. > table(ifelse(pndic$value > 0, "positive",+ ifelse(pndic$value == 0, "neutral", "negative")))negative neutral positive 49983 20 5122
  100. 100. > table(ifelse(pndic$value > 0, "positive",+ ifelse(pndic$value == 0, "neutral", "negative")))negative neutral positive 49983 20 5122
  101. 101. > m <- mean(score)> #> tweetType <- factor(ifelse(score > m, "positive",+ ifelse(score == m, "neutral", "negative")),+ levels = c("positive", "neutral", "negative"))> table(tweetType)tweetTypepositive neutral negative 1912 0 1247
  102. 102. > statusDF$tweetType <- droplevels(tweetType)> #> qplot(month, data = statusDF,+ geom = "bar", fill = tweetType, position = "fill")
  103. 103. > statusDF$tweetType <- droplevels(tweetType)> #> qplot(month, data = statusDF,+ geom = "bar", fill = tweetType, position = "fill")
  104. 104. > statusDF$tweetType <- droplevels(tweetType)> #> qplot(month, data = statusDF,+ geom = "bar", fill = tweetType, position = "fill")
  105. 105. > statusDF$tweetType <- droplevels(tweetType)> #> qplot(month, data = statusDF,+ geom = "bar", fill = tweetType, position = "fill")
  106. 106. twitteR• RJSONIO•• ID status ID• fav favorited TRUE• truncated TRUE• DM• status character factor
  107. 107. twitteR• RJSONIO•• ID status ID• fav favorited TRUE• truncated TRUE• DM• status character factor
  108. 108. OAuth ” ” twitteR -
  109. 109. • twitteR• reshape2 R• ggplot2• RMeCab R
  110. 110. • twitteR• reshape2 R• ggplot2• RMeCab R• PC•
  111. 111. https://github.com/abicky/rjpusers2011_abicky
  112. 112. status> statuses[[1]]$text[1] " "> statuses[[1]]$getText() #[1] " "> #> statuses[[1]]$text <- " "> statuses[[1]]$getText()[1] " "> statuses[[1]]$setText("ggrks") #> statuses[[1]]$getText()[1] "ggrks"> #> statuses[[1]]$getCreated()[1] "2011-11-23 22:16:24 UTC"
  113. 113. removeSpecialStrremoveSpecialStr <- function(text) { removeURL(removeHashTag(removeScreenName(text)))}
  114. 114. removeScreenNameremoveScreenName <- function(text, strict = TRUE) { if (strict) { regex <- "(?<!w)[@ ](?>w+)(?![@ ])" } else { regex <- "[@ ]w+" } gsub(regex, "", text, perl = TRUE)}
  115. 115. removeURLremoveURL <- function(text, strict = TRUE) { if (strict) { regex <- "(?<![-.w#@=!"/])https?://(?:[^:]+:.+@)?(?:[0-9A-Za-z][-0-9A-Za-z]*(?<!-).)+[A-za-z]+(?:/[-w#%=+,.?!&~]*)*" } else { regex <- "https?://[-w#%=+,.?!&~/]+" } gsub(regex, "", text, perl = TRUE)}
  116. 116. removeHashTagremoveHashTag <- function(text, strict = TRUE) { delimiters <- "s,.u3000-u3002uFF01uFF1F" # cf. http://nobu666.com/2011/07/13/914.html validJa <- "u3041-u3094u3099-u309Cu30A1-u30FAu30FCu3400-uD7A3uFF10-uFF19uFF21-uFF3AuFF41-uFF5AuFF66-uFF9E" if (strict) { regex <- sprintf("(^|[%s])(?:([# ](?>[0-9]+)(?!w))|[# ][w%s]+)", delimiters, validJa, validJa) } else { regex <- sprintf("[# ][^%s]+", delimiters) } gsub(regex, "12", text, perl = TRUE)}

×