graphTweets 4.0 has been redisigned to work hand-in-hand with rtweet. Let’s start by getting some tweets. If you’re unsure how to get started, head over to the rtweet website, everything is very well explained. We’ll get 1,000 tweets on #rstats, exluding re-tweets.

library(rtweet)

# 1'000 tweets on #rstats, excluding retweets
tweets <- search_tweets("#rstats", n = 500, include_rts = FALSE)

Now we can start using graphTweets.

  1. Get the edges using gt_edges.
  2. Build an igraph object using gt_graph or collect results with gt_collect.

igraph

tweets %>% 
  gt_edges(screen_name, mentions_screen_name) %>% 
  gt_graph() -> graph

class(graph)
#> [1] "igraph"

List

If you do not want to return an igraph object, use gt_collect, it will return a list of two data.frames; edges and nodes.

tweets %>% 
  gt_edges(screen_name, mentions_screen_name) %>% 
  gt_collect() -> edges

names(edges)
#> [1] "edges" "nodes"

(It also returns nodes but it’s empty since we only ran gt_edges).

So far we only used gt_edges to extract the edges, we can also extract the nodes.

tweets %>% 
  gt_edges(screen_name, mentions_screen_name) %>% 
  gt_nodes() %>% 
  gt_collect() -> graph

lapply(graph, nrow) # number of edges and nodes
#> $edges
#> [1] 248
#> 
#> $nodes
#> [1] 309
lapply(graph, names) # names of data.frames returned
#> $edges
#> [1] "source" "target" "n"     
#> 
#> $nodes
#> [1] "nodes" "type"  "n"

On graphTweets version 0.4.1 gt_nodes returns the number of edges the node is present in: n_edges. Here I used gt_collect, you can, again, use gt_graph if you want to return an igraph object.

Adding nodes has not bring much to table however, gt_nodes takes another argument, meta, which if set to TRUE will return meta data on each node, where availbale*. More information on passing meta data to nodes further down the document.

tweets %>% 
  gt_edges(screen_name, mentions_screen_name) %>% 
  gt_nodes(meta = TRUE) %>% 
  gt_collect() -> graph

# lapply(graph, names) # names of data.frames returned

Note that you can also pass meta-data to edges if needed.

tweets %>% 
  gt_edges(screen_name, mentions_screen_name, created_at) %>% 
  gt_nodes(meta = TRUE) %>% 
  gt_collect() -> graph

Before we plot out graph, we’re going to modify some of the meta-data, a lot of NA are returned (where the meta-data was not available *).

Here I use sigmajs to plot the graph.

Let’s look at communities, we’ll return an igraph object with gt_graph so we can easily run a community finding algorithm from the igraph package.

tweets %>% 
  gt_edges(screen_name, mentions_screen_name) %>% 
  gt_graph() -> g

class(g)
#> [1] "igraph"

Users to Hashtags

library(rtweet)
tweets <- search_tweets("#rstats OR #python", n = 1000, include_rts = FALSE, token = token, lang = "en")

The same principles follow, we simply use gt_edges_hash and pass the hashtags column as returned by rtweet. This creates a tibble of edges from screen_name to hashtags used in each tweet.

net <- tweets %>% 
  gt_edges(screen_name, hashtags) %>% 
  gt_nodes() %>% 
  gt_collect()

We’ll visualise the graph with sigmajs. Let’s prepare the data to meet the library’s requirements.

  • We add id to both nodes and edges
  • We add rename a few columns to meet sigmajs’ convention
  • We color the nodes by type (hashtag or user)

Apologies for not getting into details here but sigmajs is very well documented, check the website if you want to understand it all.

Let’s visualise it.

We use sg_layout to layout the graph and sg_neightbours to highlight nodes on click.

Retweets

You can also build networks of retweets.

tweets <- search_tweets("#rstats filter:retweets", n = 500, include_rts = TRUE, token = token, lang = "en")
#> Searching for tweets...
#> Finished collecting tweets!
net <- tweets %>% 
  gt_edges(screen_name, retweet_screen_name) %>% 
  gt_nodes() %>% 
  gt_collect()

c(edges, nodes) %<-% net

edges$id <- 1:nrow(edges)
edges$size <- edges$n

nodes$id <- nodes$nodes
nodes$label <- nodes$nodes
nodes$size <- nodes$n

sigmajs() %>% 
  sg_nodes(nodes, id, size, label) %>% 
  sg_edges(edges, id, source, target) %>% 
  sg_layout() %>% 
  sg_cluster(colors = c("#0C46A0FF", "#41A5F4FF")) %>% 
  sg_settings(
    edgeColor = "default",
    defaultEdgeColor = "#d3d3d3"
  ) %>% 
  sg_neighbours()

Retweets & Quotes

We can bind quoted tweets (surely they should be considered as retweets) using gt_bind_edges.

net <- tweets %>% 
  gt_edges(screen_name, retweet_screen_name) %>% 
    gt_edges_bind(screen_name, quoted_screen_name) %>% 
  gt_nodes() %>% 
  gt_collect()

c(edges, nodes) %<-% net

edges$id <- 1:nrow(edges)
edges$size <- edges$n

nodes$id <- nodes$nodes
nodes$label <- nodes$nodes
nodes$size <- nodes$n

sigmajs() %>% 
  sg_nodes(nodes, id, size, label) %>% 
  sg_edges(edges, id, source, target) %>% 
  sg_layout() %>% 
  sg_cluster(colors = c("#0C46A0FF", "#41A5F4FF")) %>% 
  sg_settings(
    edgeColor = "default",
    defaultEdgeColor = "#d3d3d3"
  ) %>% 
  sg_neighbours()

## Meta data

You can pass meta data to the edges and subsequently to the nodes using gt_add_meta.

## Preprocess edges

You may also pre-process edges before computing the nodes.

prep <- function(df){
  df %>% 
    group_by(source, target) %>% 
    summarise(
      n = sum(n), # number of tweets
      nchar = sum(nchar(text)) / n # characters per tweet
    ) %>% 
    dplyr::ungroup()
}

gt <- tweets %>% 
    gt_edges(screen_name, retweet_screen_name, text) %>% 
    gt_preproc_edges(prep) %>% 
    gt_nodes()

gt$edges$id <- 1:nrow(gt$edges)
gt$nodes$id <- gt$nodes$nodes
gt$nodes$label <- gt$nodes$nodes
gt$nodes$size <- gt$nodes$n
gt$edges$color <- scales::col_numeric(c("blue", "red"), NULL)(gt$edges$nchar)

sigmajs() %>% 
  sg_nodes(gt$nodes, id, size, label) %>% 
  sg_edges(gt$edges, id, source, target, color) %>% 
  sg_layout() 

* Some nodes are mentioned in tweets only and therefore have no meta-data associated.