Social media collected over time using keywords, hashtags and accounts associated with a particular geographic community might reflect that community’s main events, topics of discussion, and social interactions. We are interested in evidence for the support of community involvement that the aggregated Web pages and social media might help to create. We collected and analyzed Twitter data related to a geographic area over a two-year period to identify and characterize relevant topics and social interactions, and to evaluate the support for community involvement that such Twitter use might indicate. This kind of data collection has built-in biases, of course, just as local print media or online newsgroups do. We analyzed our data using the open source tool NodeXL to identify topics and their changes over time, and to create social graphs based on retweets and @ mentions that suggest interactions around topics. Our findings show: 1) distinct topics, 2) large and small clusters of social interactions around a variety of topics, and 3) patterns suggesting what are called ‘community clusters’ and ‘tight crowd’ types of conversations, and 4) evidence that Twitter supports local community involvement among users. Modeling topics over time and displaying visualizations of social interactions around different topics in a community can offer insights into the important events and issues during a given period. Such visualizations also reveal hidden (or obscure) topics due to a smaller number of participants — whether government representatives, voluntary associations, or citizens. There is clear evidence that Twitter supports social interaction and informal discussion or exchange around local topics among users, thereby facilitating community involvement.
Introduction Theoretical framework and prior research Twitter data from local content aggregator Methods Results Topic modeling with LDA Social interactions Social interactions and Twitter conversations Interactions around the April 16th Memorial Discussion and conclusion
Web pages and social media generated by and related to a geographic community may reflect many of the local events and social interactions of that place. When aggregated and analyzed over time, these data can capture trends and ongoing exchanges among users, similar to an extended set of local newspaper stories, letters to the editor, opinion editorials or the public comments kept by the town or city hall records. Given the lack of local newspapers and other local media coverage of small towns and city neighborhoods in the U.S., the Internet is becoming the source for local news, information, and citizen exchange. The aggregation and display of such online content and social media exchange among individuals and local organizations, including government, can facilitate users’ awareness of and involvement in community events, interests, and issues (Ahuja, et al., 2009; Cohill and Kavanaugh, 1997; Kavanaugh, et al., 2005; Tauro, et al., 2008).
We have identified and collected syndicated (RSS) feeds of news and information of interest to the general public that have been posted by local organizations, including government, local voluntary associations, and community groups, on their own Web sites. In addition, we identified and collected local-oriented Twitter data, as well as blogs and Facebook content, posted by local organizations and individuals. We collected these data not only to archive and analyze topics and interactions over time, but also to display them through an online content aggregator, that is, an online community Web site that through RSS subscription, automatically scrapes Web content and displays the daily news, stories, and information for the region and the social commentary around that information. We call this community Web site the Virtual Town Square (VTS) (Kavanaugh, et al., 2010; Kavanaugh, Ahuja, et al., 2014; Kavanaugh, Krishnan, et al., 2014). VTS is no longer active, but was developed and evaluated as a prototype between 2011 and 2015 with a set of less than a hundred testers and beta users in a geographic area known as the New River Valley in southwest Virginia.
Local government, like other contributors, not only provided content to VTS through its RSS feed that VTS was subscribed to, but government could also see what community members were interested in, complaining about, or appreciating by reading content, ‘likes’, tags and comments on VTS. Through VTS, local government could ‘hear’ from community members that might not be as vocal or outspoken as others, especially if it was a minority group or opinion, or a topic of more focused interest.
In this paper, we present results from analyses of Twitter data, comprised of tweets that relate to the New River Valley (NRV). We collected tweets based on the associated account holder, keywords, or the terms or hashtags pertaining to the local area. We analyzed the Twitter data to model topics, build social graphs, measure interactions, and characterize conversations over a two-year period between September 2012 and October 2014.
Our goal in this paper is to present an analysis of the aggregated Twitter data we collected from VTS with a view to identifying the topics, social interactions, and related metadata of users (individuals and organizations). For example, what kinds of topics predominate? What topics are obscure? Are small or obscure conversations and social interactions more discoverable due to visualizations of multi-year Twitter collections? Is there much ‘talk’ about government or civic affairs? What are the most active set of Twitter users among organizations and individuals?
Our primary research question is: Does Twitter use help support local community involvement? That is, what evidence is there of community-related discussions and information sharing among local Twitter users? Such evidence would support the claim that Twitter use at the local level can enable and support local community involvement.
We draw on the idea of ‘political talk’ (which comes from political participation theory) to guide our analysis of multi-year Twitter data (Kim, et al., 1999). Political talk is one form of political (and civic) participation, specifically, political expression, discussion, and interaction, that occurs typically with members of a person’s social network, but also with local groups, and the larger public, whether off-line or online. Specifically, we are examining not only narrowly ‘political’ talk, but more broadly ‘community’ talk, that is, topics, issues, and interests that are related to the local geographic community. Community talk would encompass many topics, from everyday life, such as weather and recreation, to time-specific events, such as sports competitions or music performances, to long-term planning, such as infrastructure or land use. Community talk can be considered a form of community involvement, just as political talk is a form of political participation. The act of “getting together with other people who know what’s going on in the local community” is one of four measures in Rothenbuhler’s index of community involvement (Rothenbuhler, 1991).
The concept of community involvement has roots in sociology, and more recently, in the field of community informatics (CI). Community informatics is the study of online interaction, content creation, system use, and social impact at the local level (Gurstein, 2000). Most studies of community informatics have shown that the use of these locally-focused systems, platforms, and content, strengthens democratic processes through information sharing and discussion, raising local awareness and engagement among users (Cohill and Kavanaugh, 1997; Gurstein, 2000; Kavanaugh, et al., 2003; Kavanaugh, et al., 2005; Purcell, 2006; Schuler, 1996). We build on community informatics to examine specifically the use of Twitter, as displayed and archived by our VTS local content aggregator site (discussed below). Online interactions among community members using Twitter may constitute an informal discussion and show some underlying consistency that reflects the involvement in local interests of organizations and individuals in the geographic area.
Within digital government, community involvement is part of the research on online citizen participation (or e-participation) at the local level. Online citizen participation is part of the larger concept of online (and off-line) civic engagement which has a rich literature. Most studies agree that Internet use has contributed to information sharing and social interaction (Cohill and Kavanaugh, 1997; Hanrahan, et al., 2011; Kavanaugh, et al., 2005; Kavanaugh, Krishnan, et al., 2014; Purcell, 2006; Tauro, et al., 2008). These studies have not addressed, however, how Twitter use might affect local community involvement.
Related to civic engagement research and the work presented in this paper is the concept of weak social ties in geographic communities (Granovetter, 1973; Kavanaugh, et al., 2003; Putnam, 2000). Compared with strong social ties (such as, close friends and family), weak social ties (acquaintances) are relations between individuals that are less intimate, frequent, or mutually demanding. Individuals can also act as weak ties when they belong to two distinct groups, such as a church and a sports club. These individuals tend to share information of interest to both groups, thereby helping to disseminate information broadly throughout a community. As a result, communities that have many weak ties across groups are able to raise awareness, stimulate discussion, and solve collective problems more quickly than communities that do not have many weak ties across diverse groups (Putnam, 2000).
In relevant prior studies of social media and local communities, the Livehoods Project (Cranshaw, et al., 2012) has represented the dynamics, structure, and character of areas within the city of Pittsburgh based on social media data (specifically, FourSquare check-ins). Researchers developed an algorithm that maps the geographic areas of the city to suggest their distinguishing social, cultural, and economic characteristics. Unlike the Livehoods Project, we do not seek to characterize the economic, social, and cultural nature of our community. Rather, we explore the social interactions and topics of discussion among local residents, government, and other organizations, as reflected in their tweets. Other arenas in which interaction and discussion topics appear include the Facebook pages of local organizations, and anonymous platforms, such as Yik Yak (now extinct, once popular among colleges and universities in the U.S.). We focused on Twitter data because we sought a data collection based on keyword terms and specific user accounts related to a geographic area, as well as a dataset of user-identified posts that would show social interaction.
In Twitter, people connect to others by following accounts, replying to tweets, retweeting tweets, using the @ symbol to refer to (mention) others by their account name, or using the hashtag symbol (#) to connect with people interested the same terms or topics. boyd and colleagues (boyd, et al., 2010) have claimed that users who retweet and/or @mention others create a kind of conversation, especially when those behaviors are more than just a one-off action. Retweeting has been characterized as a diffuse conversation as well as a form of information diffusion, similar to link-based blogging (Ahuja, et al., 2009; boyd, et al., 2010; Tauro, et al., 2008). Of the various incentives for using Twitter (Java, et al., 2007) according to boyd, et al. (2010), those more likely to retweet others are users who are trying to engage in conversations or share information.
In a Pew Internet & American Life study of Twitter topic networks, researchers have identified at least six types of conversations in Twitter: Polarized Crowd, Tight Crowd, Brand Clusters, Community Clusters, Broadcast Network, and Support Network (Smith, et al., 2014). We consider these different conversation types in this paper, based on the patterns of our data and the implications of the conversation types for future research on community computing. The Polarized Crowd type has two large groups that have little connection between them. These are common in (separate) political discussions on the same topic. The Tight Crowd type of conversations is characterized by highly interconnected people and few isolated participants. This type is common with professional topics, conferences, and hobby groups or other subjects that attract participants who make up an interest community (for example, the community of shared interest could be around “growing tomatoes” or “artificial intelligence” or it could be anything to do with the geographic community of residents and organizations that make up a city neighborhood, a small town, or rural county). The pattern of communication in this type of ‘conversation’ represents networked learning communities with sharing and mutual support facilitated by social media. The Community Cluster pattern of conversation appears when there are popular topics that attract multiple groups, often forming around a few hubs with their own audiences. This type can represent diverse angles on a subject and a diversity of opinions.
In our prior work related to this research we designed, developed, and evaluated a prototype local aggregator Web site we called the Virtual Town Square (VTS) (Hanrahan, et al., 2011; Kavanaugh, et al., 2010; Kavanaugh, Ahuja, et al., 2014; Kavanaugh, Krishnan, et al., 2014). VTS existed as a prototype between 2012 and 2015 (when National Science Foundation funding ended after 2016 we had to shut down the site due to security breaches and hacks because we were no longer able to monitor it on a daily basis). Although the Web site is no longer live, all content was archived (Web pages, photos, tweets, and Facebook posts), representing its three-year operational period. This paper is an analysis of Twitter data from that archive.
We gave VTS a list of RSS feeds from local organization Web sites and social media, such as local government sites, community organizations, public Facebook groups, and Twitter accounts of known local users and organizations that post content and information of general public interest. We did not subscribe to any commercial sites or information. VTS refreshed the collection twice daily and displayed the first few sentences of news items, thereby directing users to source pages for further reading (see Figure 1 for a screenshot of VTS).
Anyone could create an account in order to comment and post content directly on VTS. When users were logged in, they could comment on new items, post photos, or blog entries directly on the site itself. If they choose to reply to a tweet, the site not only showed this on VTS it also reflected their reply on the original tweet as shown via Twitter.com. In this way, comments and posts appeared not only in the aggregator, but also in the original source page, including Web pages, tweets, and Facebook groups.
The geographic location encompassed by Virtual Town Square is known as the New River Valley (NRV), so-called due to the presence of the New River. The NRV spans four rural counties, two towns, and one small city in southwest Virginia. Christiansburg, Blacksburg, and the city of Radford are the three principal municipalities of the Metropolitan Statistical Area (MSA) that encompasses those municipalities and the rural counties of Montgomery, Floyd, Pulaski, and Giles for statistical purposes. The MSA has an estimated population of 159,587 and is currently one of the faster-growing MSAs in Virginia. The total population of this mixed rural-suburban area potentially served by VTS is about 180,000, according to 2010 U.S. Census (http://www.census.gov/quickfacts). The population that would have been most closely served by VTS is Montgomery County and the town of Blacksburg with it, which is home to the land grant university Virginia Polytechnic Institute and State University (Virginia Tech). The 2010 census showed the county population was 94,392 (about 47,000 of which lived in Blacksburg). The county population was predominantly Caucasian (87.6 percent); 5.4 percent Asian, 3.9 percent Black or African American; 0.2 percent Native American, and 2.7 percent were Hispanic or Latino (U.S. Census, n.d.).
Within Montgomery County, Blacksburg is the most active user population of the VTS due to the presence of Virginia Tech. The town population was 42,620 according to the 2010 Census, of which 23,895, or 60 percent, were college students. The median income for a family in 2010 was US$51,810. By contrast with Blacksburg, the median income for a family in Montgomery County within which the town is located was US$47,239 in 2010. About nine percent of families or 23 percent of the population were below the poverty line, including 14.6 percent of those under age 18 and 8.8 percent of the population age 65 or over.
In identifying Web pages, Facebook groups and Twitter accounts and hashtags for VTS to collect, we used our local knowledge and that of our Virginia Tech collaborators and local partner organizations (Blacksburg town government, Montgomery County government, Citizens First for Blacksburg, Literacy Volunteers of the NRV, and Christiansburg Civic League). Twitter collection was based on local terms, hashtags, and accounts, it encompassed many of the multiple, diverse interconnected communities, making up the geographic area called New River Valley.
As noted earlier, modeling the topics over time in a geographic area can provide us with insights into important community events and civic affairs during this period, and local commentary around these events and issues. It also reveals hidden (or obscure) topics and conversations that were emerging in the community, thereby making it easier to discover those conversations, become more familiar with them, and possibly join them.
Ruby on Rails was utilized to collect tweets by keyword, hashtag, or account name from New River Valley. The first-year data collection ran from 13 September 2012 to 6 October 2013; the collection was interrupted for three months in the summer (between 11 June 2013 and 14 September 2013) due to server problems. The second-year data collection ran from 28 October 2013 to 31 October 2014. Although we collected tweets in the second year during the three summer months (between 11 June and 14 September 2014), we omitted them in the analyses reported here in order to make the two collections more similar.
We used primarily the open source tool NodeXL to analyze the Twitter data collection (Hansen, et al., 2012; Hansen, et al., 2011) and checked it for comparison of topics using Latent Dirichlet Allocation (LDA) (Blei and Lafferty, 2009; Blei, et al., 2003). The analysis of Twitter data by topic distinguishes groups based on topical interest from groups based on social affinity (e.g., friends, followers) (Bhattacharya, et al., 2014). We focused in this paper on topical interest groups, not on social affinity as measured by friends and followers. We generated a social graph of user interactions using NodeXL based on retweets and @mentions.
NodeXL is a social media analysis tool that facilitates the exploration and visualization of social network data. Using NodeXL we measured the following: 1) frequencies of total tweet counts over time; 2) changes in tweet counts by hashtag and account over time; 3) centrality (i.e., popularity, influence) of tweets by hashtag and account to indicate popular topics or influential users; and, 4) social interactions among all users during an extended period based on re-tweets and @mentions by accounts. We wrote a Python script to count the number of tweets that contain @ symbols, hashtags (#), retweet symbols (RT), and hyperlinks. In some analyses reported here, we separated Twitter accounts into two types, organizations vs. individuals. While some researchers distinguish a third group, ‘journalists/media bloggers’ we did not, because we did not seek to test differences in this paper based on the ‘authority’ that journalists’ tweets might have offered (de Choudhury, et al., 2012).
We manually evaluated the topics derived from NodeXL that seemed to predominate within each of the distinct clusters of the social graphs resulting from re-tweets and @mentions. To compare our manual examination of topics using NodeXL, we ran probabilistic analyses using a slightly modified version of the classic unsupervised algorithm Latent Dirichlet Allocation (LDA) to perform topic modeling. LDA is one of various algorithms used for topic modeling.
Topic models are powerful tools to identify latent text patterns in content (Blei and Lafferty, 2009). Probabilistic topic models are based on statistical algorithms for discovering latent semantic structures of an extensive text body (Blei, et al., 2003). In a topic model, including LDA, a topic is typically considered a distribution over words and a document is in turn modeled as a distribution over topics. It assumes that documents are generated in two stages: (i) specify a distribution over topics, (ii) to generate words for a given document, sample a topic, and then sample words from the chosen topic’s distribution of words, repeating as necessary (Blei, et al., 2003).
The standard modification of the LDA algorithm we applied gave the learning algorithm a pre-conceived notion or a priori knowledge of what obvious topics would appear in the data (e.g., ‘football’) and essentially looked beyond that to find more latent topics (e.g., ‘Pulaski elections’). This ‘weighting’ was executed by forcing the algorithm to reduce the assigned weight it would give in the default implementation to known topics. This gives the algorithm the opportunity to discover “hidden” or not-so-obvious topics. Through this algorithm, we sought to identify diverse topics that were smaller and/or more obscure. We used the entire tweet data for the two-year period of our study (minus the missing data during summer months) as the corpus for analyses using both NodeXL and LDA techniques.
Our first year of Twitter data has a total of 289,760 tweets, published by 125,881 unique Twitter accounts (i.e., users); our second year has a total of 300,147 tweets, published by 146,206 unique users.
Among the 125,881 unique Twitter accounts in the first–year data set, the overwhelming majority (94.6 percent or 119,071 accounts) sent only 1–5 tweets during the nine–month period of the collection; 4.9 percent (N=6,121) posted 6–30 tweets (Figure 2, left side). Less than one percent (0.5 percent) of all accounts (N=689) posted more than 30 tweets in the first year. Very similar proportions emerged from the data in the second year, as well (shown in Figure 2, right side). Specifically, 93.6 percent posted 1–5 tweets total, 5.6 percent posted 6–30 tweets, and only 0.8 percent posted more than 30 tweets during the nine–month period of the second–year. Among the less than one percent that were active Twitter users during both years, many user accounts were the same.
The top 20 Twitter accounts in the first year ranked by the frequency of tweets range from “Blacksburg Stuff” being the most prolific in both years (6,650 in Year 1, and 7,236 in Year 2) to “Hokie nation” in 20 place (436 tweets in Year 1) and “Radford University” (1,213 tweets in Year 2), as shown in Figure 3. These were all organization accounts, rather than individuals, suggesting that the Twitter account was used for public announcements, news, and regular updates.
We examined all the hashtags and ranked them by degree using NodeXL to demonstrate the number of edges connected to a vertex. The degree of a given hashtag (#) shows the popularity of the hashtag. The hashtag “#Virginia Tech” (including other common forms) has the highest degree, followed by “#Hokie” and “#Blacksburg” (Figure 4).
Among the 289,760 total tweets in Year 1, 119,542 tweets contain @ (mention) symbol, 93,148 tweets contain hashtags (#), and 94,251 tweets were retweeted (RT) from other Twitter accounts; 124,386 tweets contain a hyperlink or URL. Figure 5 shows the proportion of @, #, RT, and URL in the first year versus the second year data set.
In the second year of Twitter data (Figure 5, on the right), collected from many of the same accounts, terms, and hashtags, as well as new accounts, the proportion of each of these features rose: @ mentions rose from about 30 percent to 55 percent; hashtags (#) rose from about 22 percent to 36 percent; re-tweets (RT) increased from about 22 percent to 45 percent; and, the proportion of URLs embedded in tweets rose from about 35 percent to 55 percent.
We mapped the distribution of tweets by date during the 10 months of our first year collection (Figure 6). It shows a distinct peak in mid-April 2013. When we examined this peak of Twitter activity carefully, we found the frequency of @ mentions for the Twitter account “MMFlint” was very high, and driving up the peak, with 1,627 @ mentions by the other Twitter users. “MMFlint” is an account associated with Michael Moore (filmmaker, activist, living in Flint, Michigan) who posted a tweet “This is the anniversary week of the Columbine massacre, the Oklahoma City bombing, the Virginia Tech massacre, the Bay of Pigs, Boston Marathon bombing, and Waco.”
Given Michael Moore’s celebrity status and high visibility, he has many Twitter followers. Many of them re-tweeted his tweet to their own followers, leading to a spike in mid-April 2013. In addition, his listing of multiple tragic events in a single tweet seemed to have resonated with diverse audiences, who identified perhaps with only one of the multiple tragic events over that time period (mid-April in multiple years). This gave relevance of the tweet to a large set of users. In the second year data, there was a peak in mid-April, but nowhere near as striking as in 2012, since there was not a similar tweet (e.g., by a popular celebrity) that was re-tweeted by many followers.
To test our results from NodeXL we also used LDA to model topics over the two-year combined Twitter data set. We obtained very similar results for topics using LDA as with the NodeXL results. There was a dominance of Virginia Tech football and other sports, Virginia Tech news, and (Blacksburg) information (Roble, et al., 2014). Using weighted LDA to discount the presence of most common topics related to football and other sports, we were able to discover latent topics, such as the Pulaski County local elections. Also with weighting, the distribution of top topics was fairly even. Most of the topic LDA derived topics were still related to Virginia Tech, Blacksburg, and athletics, so the weighting did not distort overall distribution but rather made the smaller, hidden topics more discoverable. In addition to the Pulaski elections, there were topics centered around fairly minor shooting incidents (local and non-local specific events). These LDA results helped confirm topics derived from NodeXL analyses.
Another finding from the LDA analysis over the two-year period was a time series correlation with topics presented. The topics distributed themselves in a “seasonal” pattern. That is, in the autumn, there were more football related topics, and in spring there were more posts around the memorial events of 16 April 2007.
After eliminating outliers in the data, we clustered the whole data set using NodeXL based on relationships among accounts indicated by @mentions and re-tweets (RT) in order to represent social interactions among users. We eliminated organizational accounts in order to focus on individual account activity, in order to consider social interactions among citizens versus organizations given the differences in motivation, audiences, and resources. Based on the @mentions and/or re-tweets, Twitter accounts in our data set formed a huge social network (Figure 7). The most active accounts (in terms of interaction) appeared as three big clusters with many smaller clusters. The gray edges between accounts represented a “retweet” and/or @“mention” among accounts (individuals). Since individuals still referred to organizations with @mentions and re-tweets, organizational accounts appeared in the groups shown in Figure 7, as well as Table 1.
As shown in Figure 7, there were three very large groups in the local Twitter community and several medium-sized and smaller groups intensively connected with the three large groups. We show in Table 1 the top accounts in the two largest groups ranked by the number of edges (@mentions and/or re-tweets). We also show the interests of the account holder based on profile data and/or a manual inspection of the content of their tweets.
In the largest group shown in Figure 7 the top 10 most active users are shown in Table 1, along with the general topics appearing in their tweets over the two-year period of the data collection. These users also appeared in some of the other groups as active twitterers. The general topics of interest, based on a manual examination, relate to sports, particularly football, at Virginia Tech.
Group 2, also one of the three largest in our data, shows some of the same users as Group 1, and some of the same topics (notably football), as indicated in Table 1. But this group also had more diversity of topics, such as Blacksburg, the city of Radford, and NRV area related messages, as well as Virginia Tech news. There was also a Blacksburg town council member in the group who posted tweets related to the council, its agenda items, upcoming meetings, highlights from committees, and other aspects of its work.
Group 3 is also large (Table 2). However, it also contained more diverse topics than sports, including weather, student affairs, relationship problems (girlfriend-boyfriend issues), Pulaski County, Pulaski High School football, and local social activities and entertainment.
Table 2 shows Group 4 and the other groups shown in Figure 7 that are medium-sized and smaller clusters with even more diverse topics, such as church events and activities, elections in neighboring Pulaski County, references to the New River Valley, and other local area information and news.
A relatively small number of users interacting around a hashtag (i.e., specific topic) can add up to a lot of interactions among users, as shown in Table 3. Column A shows the name of the hashtag; column B shows the number of tweets using that hashtag; column C is the number of people who sent tweets with that hashtag; and, column D shows the number of accounts interacting with others using the same hashtag (measured by @ mentions and retweets among these users). Column E shows how many times these users interacted with each other using this hashtag. To clarify whether accounts were held by individuals or organizations, we show in column F the number of hashtags that remained when we deleted organizational accounts; this allowed us to better understand the extent to which individuals were interacting around a given hashtag. Column G is the number of interactions remaining after we deleted organizational accounts. Overall, Table 3 illustrates that there was a great deal of interaction among users around these hashtags or topics.
Based on the idea of six types of Twitter conversations, we suggest our community communication shown in Figure 7 contains two types of Twitter conversations: Tight Crowd and Community Cluster. The Group 1 (G1), G2, G3 demonstrate an obvious character of Tight Crowd conversations. The Tight Crowd structure typically contains two–six medium groups. The groups’ interconnectivity was very high and there were very few isolated groups.
Our three big groups (G1, G2, G3) matched the Tight Crowd conversation pattern. There were intensive connections between these three groups. However, in further exploring the group cluster, we found that the community contained several small groups which was variant from the Tight Crowd. The many small groups in the right bottom of Figure 7 (Groups 4, 5, etc.) were barely connected to other groups, except for a few connected with the three large groups.
The conversation type called Community Cluster usually contains very small groups, and the connections between groups are very few. It also contains many isolated groups. For the small groups in the right bottom, we think their communication forms a pattern of Community Cluster conversation.
For a more focused look at social interactions around a key topic, we examined the hashtag #neVerforgeT — in honor of the April 16 Memorial held annually for the tragic shooting at Virginia Tech on that day in 2007. Figure 8 shows the social graph of @mentions and retweets among all accounts in our collection using this hashtag just before 16 April 2014 (Figure 8, left side) and just after 16 April 2014 (Figure 8, right side).
At the center of the graph in Figure 8 (left side, before April 16) is the account @BlacksburgStuff, the self-declared ‘all about Blacksburg’ account established in 2010 whose stated mission is “to connect residents and visitors with the greater Blacksburg community.” It had over 61,000 tweets and 12,000 followers. This ‘average day’ type of pattern, with a few retweets and @mentions among followers, was in stark contrast with the social interactions on the day of 16 April (Figure 9).
There was clearly a large cluster with a high number of interactions on the day of the April 16 Memorial. At the center of the cluster was an account @ears_of_JMU that had been retweeted and @mentioned by a lot of its followers and by followers of its followers. A closer look at the account and the tweet content on that day is shown in Figure 10.
The Twitter account @ears_of_JMU at the center of the social graph was an account that seems to belong to a college student at James Madison University, a private Virginia university about a two-hour drive from Virginia Tech. The content of the tweet from @ears_of_JMU on 16 April 2013 read: “New avatar for this week in honor of VT 4/16/07 #neVerforgeT ...” The twitterer added “... S/O to @got_wilk for the design!” The ‘new avatar’ is the JMU logo re-done in the colors of Virginia Tech: orange and maroon. The bulldog logo normally appears in purple and gold. Note that the account @BlacksburgStuff which was central in the days before the 16 April Memorial was at the far right, bottom corner of the social graph on 16 April. The logo art work of the @ears_of_JMU tweet (i.e., modified JMU bulldog in VT colors) combined with its sentiment honoring Virginia Tech may account for much of its popularity among followers who retweeted and/or @mentioned the original tweet.
We have presented results from analyses of Twitter data collected over a two–year period through a local aggregator site (Virtual Town Square or VTS) developed as a prototype to serve geographic communities that make up the New River Valley in southwest Virginia. The display of this local content provided news and discussion updates, as well as an archive for analysis of longer-term trends, changes over time and social interaction patterns. VTS acted as a sensor for a geographic area, capturing key events, location specific information, and exchange and informal discussion among users, comprised of individuals and organizations, such as government, voluntary associations, and citizens.
Our analyses of Twitter hashtags, keywords, and local accounts over the two-year period of the study shows some underlying consistency in certain topics and reflects some of the key interests and concerns of residents and local organizations, including government agencies. There were clear changes in topics over time based on seasonal activities as well as critical events.
Since the data set was dominated by a few specific topics, such as Virginia Tech football, it was potentially difficult for users of VTS to discover less popular topics. Therefore, in addition to a manual examination of tweet contents among each cluster of users visualized in our social graph, we used a weighted implementation of the LDA algorithm over the entire data set, to identify less popular topics. Using weighted LDA we discovered the same latent topics such as the Pulaski County local elections that also showed up in a more time-consuming manual examination of topics within the social graph in NodeXL. The LDA analysis served as a confirmation of topics identified manually.
The social graph in NodeXL shows three very large groups in the Twitter data and several other groups intensively connected with these three large groups. We looked deeply into the three large groups of users and found the top accounts have the most edges in these groups. Most of the top accounts in the same group shared the same interests. There were also several small groups that appeared in the social graph. Some of the small groups were connected to the three large groups, but most of them were separate from the three big groups with distinct topics. These connections across the clusters suggest weak social ties in the user community.
The clusters in the social graphs showed patterns consistent with Twitter conversation types called Tight Crowd and Community Clusters. These types have distinct clusters with interconnections among them, as well as medium-sized and smaller clusters with much interaction among participants, even though they are fewer in number. Given that the pattern of communication in the ‘tight crowd’ type of conversation represents networked learning communities with sharing and mutual support facilitated by social media, these conversations at the local level represent a form of community involvement. The ‘community cluster’ pattern of conversation appears when there are popular topics that attract multiple groups, often forming around a few hubs with their own audiences. This type can represent diverse angles on a subject and diversity of opinion, with weak social ties across groups. Given that a community with many weak social ties across groups is more capable of collective problem-solving than a community with few weak ties, a platform such as VTS and social media that support weak ties, such as Twitter, should help to contribute to the collective problem-solving capability of these communities.
Among our key findings is that the number of interactions among Twitter users around a single hashtag can be quite large. This clearly indicates that these users are actively engaged with others in relation to a given subject. These findings support prior work by boyd and colleagues (boyd, et al., 2010) showing that the retweet and @mention functions and interactions in Twitter constitute at least informal discussions, especially when those behaviors are repeated communications (i.e., more than one or two retweets and @mentions of the same account). As boyd argues, those more likely to retweet others are users who are trying to engage in conversations or share information. Engagement in conversations and local information sharing is clearly evident in our data, which points to the impact of Twitter use at the local level to enable and facilitate community involvement through discussion and information sharing.
We expect the predominant discussions and exchanges in a community would be about typical events of everyday life, such as weather, sports activities, and school (as is seen in our data). But the familiar practice of communicating with others routinely on social media about local events and interests makes it easy for users to switch to more ‘civic-oriented’ topics when necessary, such as local elections, zoning controversies, or weather crises. Our data show a lot of exchange around university football and other sports, which are very popular topics in a college town, but there is also clear evidence of interactions about civic concerns, such as county elections and the VT memorial. The greater ease of switching from brief posts and comments about football to brief exchanges about more civic-minded issues, as they arise, is an additional basis for increasing local awareness, collective response, and problem-solving in a community.
Ultimately, these kinds of content aggregators, computational analyses, and visualization tools, when available to local communities, could help increase social interaction and community involvement. With fewer traditional local newspapers, the analytic techniques and visualizations we describe in this paper should make it easier for interested users to find and participate in more localized information sharing and exchange of ideas and opinions, whether about a local sports team or revisions to a comprehensive town plan. When visualized and displayed online, as they are through an aggregator like VTS, the topics and social interactions are easier for users to discover and, as a result, to become aware of and possibly more involved in. Civic technology, such as VTS, and related visualization techniques and interaction tools contribute to community informatics research and to the “e-participation” area of digital government research because they advance our understanding of the use and impact of platforms that are designed to enable and facilitate topic discovery and informal discussion or exchange among government, local community groups, and citizens.
In future work, we are interested in a couple of follow-on studies. We will test some of the analytical tools that have emerged more recently (such as, social media integration, and participant-based approaches to event summarization, among others) to compare our results using NodeXL and LDA for topic modeling and social graphing of user populations and their local community involvement. We will also investigate the role of opinion leaders among our individual users in our Twitter data, since it is well known that these persons are highly influential and that a predominance of Twitter users are opinion leaders.
Andrea L. Kavanaugh, Ph.D., is a senior research scientist and the associate director of the Center for Human-Computer Interaction at Virginia Tech. E-mail: kavan [at] vt [dot] edu
Ziqian Song is Ph.D. candidate in the Department of Computer Science at Virginia Tech. E-mail: ziqian[at] vt [dot] edu
We thank the U.S. National Science Foundation (NSF) for supporting the VTS project (SES-1111239) of which this work is part. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF. We would also like to thank our collaborators and colleagues Manuel Pérez-Quiñones, John Tedesco, Naren Ramakrishnan, and Siddharth Krishnan for their collaboration on the VTS project.
S. Ahuja, M. Pérez-Quiñones, and A. Kavanaugh, 2009. “Rethinking local conversations on the Web,” In: T. Davies and S.P. Gangadharan (editors). Online deliberation: Design, research, and practice. Stanford, Calif.: Center for the Study of Language and Information, pp. 123–129.
P. Bhattacharya, S. Ghosh, J. Kulshrestha, M. Modal, M.B. Zafar, N. Ganguly, and K.P. Gummadi, 2014. “Deep Twitter diving: Exploring topical groups in microblogs at scale,” CSCW ’14: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 197–210. doi: https://doi.org/10.1145/2531602.2531636, accessed 7 March 2018.
D. Blei and J. Lafferty, 2009. “Topic models,” In: A. Srivastava and M. Sahami (editors). Text mining: Classificaton, clustering and applications. London: Chapman & Hall, pp. 71–94; version at http://www.cs.columbia.edu/~blei/papers/BleiLafferty2009.pdf, accessed 7 March 2018.
d. boyd, S. Golder, and G. Lotan, 2010. “Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter,” HICSS ’10: Proceedings of the 2010 43rd Hawaii International Conference on System Sciences, pp. 1–10. doi: https://doi.org/10.1109/HICSS.2010.412, accessed 7 March 2018.
J. Cranshaw, R. Schwartz, J. Hong, and N. Sadeh, 2012. “The Livehoods Project: Utilizing social media to understand the dynamics of a city,” Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, at https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4682, accessed 7 March 2018.
M. de Choudhury, N. Diakopoulos, and M. Naaman, 2012. “Unfolding the event landscape on Twitter: Classification and exploration of user categories,” CSCW ’12: Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pp. 241–244. doi: https://doi.org/10.1145/2145204.2145242, accessed 7 March 2018.
M. Granovetter, 1973. “The strength of weak ties,” American Journal of Sociology, volume 78, number 6, pp. 1,360–1,380. doi: https://doi.org/10.1086/225469, accessed 7 March 2018.
B. Hanrahan, S. Ahuja, M. Pérez-Quiñones, and A. Kavanaugh, 2011. “Evaluating software for communities using social affordances,” CHI EA ’11: CHI ’11 Extended Abstracts on Human Factors in Computing Systems, pp. 1,621–1,626. doi: https://doi.org/10.1145/1979742.1979818, accessed 7 March 2018.
D.L. Hansen, D. Rotman, E. Bonsignore, N. Milic-Frayling, E.M. Rodriges, M. Smith, B. Shneiderman, and T. Capone, 2012. “Do you know the way to SNA? A process model for analyzing and visualizing social media data,” SOCIALINFORMATICS ’12: Proceedings of the 2012 International Conference on Social Informatics, pp. 304–313. doi: https://doi.org/10.1109/SocialInformatics.2012.26, accessed 7 March 2018.
D. Hansen, B. Shneiderman, and M. Smith, 2011. Analyzing social media networks with NodeXL: Insights from a connected world. Burlington, Mass.: Morgan Kaufmann.
A. Java, T. Finin, X. Song, and B. Tseng, 2007. “Why we Twitter: Understanding microblogging usage and communities,” WebKDD/SNA-KDD ’07: Proceedings of the Ninth WebKDD and First SNA-KDD 2007 workshop on Web Mining and Social Network Analysis, pp. 56–65. doi: https://doi.org/10.1145/1348549.1348556, accessed 7 March 2018.
A. Kavanaugh, D. Reese, J.M. Carroll, and M.B. Rosson, 2003. “Weak ties in networked communities,” In: M. Huysman, E. Wenger, and V. Wulf (editors). Communities and technologies: Proceedings of the First International Conference on Communities and Technologies, C & T 2003. Boston, Mass.: Kluwer Academic, pp. 265–286.
A. Kavanaugh, J.M. Carroll, M.B. Rosson, T.T. Zin, and D.D. Reese, 2005. “Community networks: Where offline communities meet online,” Journal of Computer-Mediated Communication, volume 10, number 4. doi: https://doi.org/10.1111/j.1083-6101.2005.tb00266.x, accessed 7 March 2018.
A. Kavanaugh, M. Pérez-Quiñones, J.C. Tedesco, and W. Sanders, 2010. “Toward a virtual town square in the Era of Web 2.0,” In: J. Hunsinger, L. Klastrup and M. Allen (editors). International handbook of Internet Research. New York: Springer, pp. 279–294. doi: https://doi.org/10.1007/978-1-4020-9789-8_17, accessed 7 March 2018.
A. Kavanaugh, A. Ahuja, S. Gad, S. Neidig, M.A. Pérez-Quiñones, N. Ramakrishnan, and J. Tedesco, 2014. “(Hyper) local news aggregation: Designing for social affordances,” Government Information Quarterly, 31, 1, pp. 30–41. doi: https://doi.org/10.1016/j.giq.2013.04.004, accessed 7 March 2018.
J. Kim, R.O. Wyatt, and E. Katz, 1999. “News, talk, opinion, participation: The part played by conversation in deliberative democracy,” Political Communication, volume 16, number 4, pp. 361–385. doi: https://doi.org/10.1080/105846099198541, accessed 7 March 2018.
R.D. Putnam, 2000. Bowling alone: The collapse and revival of American community. New York: Simon & Schuster.
B. Roble, J. Cheng, and M. Sbitani, 2014. “Tweets region X,” CS 4624: Hypertext and Multimedia Capstone Course Report, Computer Science, Virginia Tech.
M.A. Smith, L. Rainie, B. Shneiderman, and I. Himelboim, 2014. “Mapping Twitter topic networks: From polarized crowds to community clusters,” Pew Research Center (20 February), at http://www.pewinternet.org/2014/02/20/mapping-twitter-topic-networks-from-polarized-crowds-to-community-clusters/, accessed 7 March 2018.
C. Tauro, S. Ahuja, M. Pérez-Quiñones, A. Kavanaugh, and P. Isenhour, 2008. “Deliberation in the wild: A visualization tool for blog discovery and citizen-to-citizen participation,” dg.o ’08: Proceedings of the 2008 International Conference on Digital Government Research, pp. 143–152.
U.S. Census Bureau, n.d. “QuickFacts,” at http://www.census.gov/quickfacts, accessed 7 March 2018.
Engaging a community through social media-based topics and interactions by Andrea L. Kavanaugh and Ziqian Song. First Monday, Volume 23, Number 4 - 2 April 2018 http://journals.uic.edu/ojs/index.php/fm/article/view/8146/6660 doi: http://dx.doi.org/10.5210/fm.v23i4.8146