graphTweets 4.0 has been redisigned to work hand-in-hand with rtweet. Let’s start by getting some tweets. If you’re unsure how to get started, head over to the rtweet website, everything is very well explained. We’ll get 1,000 tweets on #rstats, exluding re-tweets.

library(rtweet)

# 1'000 tweets on #rstats, excluding retweets
tweets <- search_tweets("#rstats", n = 500, include_rts = FALSE)

Now we can start using graphTweets.

  1. Get the edges using gt_edges.
  2. Build an igraph object using gt_graph or collect results with gt_collect.

igraph

tweets %>% 
  gt_edges(text, screen_name, status_id) %>% 
  gt_graph() -> graph

class(graph)
#> [1] "igraph"

List

If you do not want to return an igraph object, use gt_collect, it will return a list of two data.frames; edges and nodes.

tweets %>% 
  gt_edges(text, screen_name, status_id) %>% 
  gt_collect() -> edges

names(edges)
#> [1] "edges" "nodes"

(It also returns nodes but it’s empty since we only ran gt_edges).

So far we only used gt_edges to extract the edges, we can also extract the nodes.

tweets %>% 
  gt_edges(text, screen_name, status_id) %>% 
  gt_nodes() %>% 
  gt_collect() -> graph

lapply(graph, nrow) # number of edges and nodes
#> $edges
#> [1] 252
#> 
#> $nodes
#> [1] 303
lapply(graph, names) # names of data.frames returned
#> $edges
#> [1] "source" "target"
#> 
#> $nodes
#> [1] "nodes"   "n_edges"

On graphTweets version 0.4.1 gt_nodes returns the number of edges the node is present in: n_edges. Here I used gt_collect, you can, again, use gt_graph if you want to return an igraph object.

Adding nodes has not bring much to table however, gt_nodes takes another argument, meta, which if set to TRUE will return meta data on each node, where availbale*.

tweets %>% 
  gt_edges(text, screen_name, status_id) %>% 
  gt_nodes(meta = TRUE) %>% 
  gt_collect() -> graph

# lapply(graph, names) # names of data.frames returned

Note that you can also pass meta-data to edges if needed.

tweets %>% 
  gt_edges(text, screen_name, status_id, datetime = "created_at") %>% 
  gt_nodes(meta = TRUE) %>% 
  gt_collect() -> graph

Before we plot out graph, we’re going to modify some of the meta-data, a lot of NA are returned (where the meta-data was not available *).

Here I use sigmajs to plot the graph.

library(dplyr)
library(sigmajs) # for plots

tweets %>% 
  gt_edges(text, screen_name, status_id, datetime = "created_at") %>% 
  gt_nodes(meta = TRUE) %>% 
  gt_collect() -> gt

nodes <- gt$nodes %>% 
  mutate(
    id = nodes,
    label = ifelse(is.na(name), nodes, name),
    size = n_edges,
    color = "#1967be"
  ) 

edges <- gt$edges %>% 
  mutate(
    id = 1:n()
  )

sigmajs() %>% 
  sg_force_start() %>% 
  sg_nodes(nodes, id, label, size, color) %>% 
  sg_edges(edges, id, source, target) %>% 
  sg_force_stop(10000)

Let’s look at communities, we’ll return an igraph object with gt_graph so we can easily run a community finding algorithm from the igraph package.

library(igraph)

tweets %>% 
  gt_edges(text, screen_name, status_id) %>% 
  gt_graph() -> g

# communities
wc <- walktrap.community(g)
V(g)$color <- membership(wc)

# plot
# tons of arrguments because defaults are awful
plot(g, 
     layout = igraph::layout.fruchterman.reingold(g), 
     vertex.color = V(g)$color,
     vertex.label.family = "sans",
     vertex.label.color = hsv(h = 0, s = 0, v = 0, alpha = 0.0),
     vertex.size = igraph::degree(g), 
     edge.arrow.size = 0.2, 
     edge.arrow.width = 0.3, edge.width = 1,
     edge.color = hsv(h = 1, s = .59, v = .91, alpha = 0.7),
     vertex.frame.color="#fcfcfc")

* Some nodes are mentioned in tweets only and therefore have no meta-data associated.