SNAP/twitter7

SNAP network: twitter follower network
Name twitter7
Group SNAP
Matrix ID 2796
Num Rows 41,652,230
Num Cols 41,652,230
Nonzeros 1,468,365,182
Pattern Entries 1,468,365,182
Kind Directed Graph
Symmetric No
Date 2010
Author H. Kwak, C. Lee, H. Park, S. Moon
Editor J. Leskovec
Structural Rank
Structural Rank Full
Num Dmperm Blocks
Strongly Connect Components 8,044,728
Num Explicit Zeros 0
Pattern Symmetry 36.2%
Numeric Symmetry 36.2%
Cholesky Candidate no
Positive Definite no
Type binary
Download MATLAB Rutherford Boeing Matrix Market
Notes
SNAP (Stanford Network Analysis Platform) Large Network Dataset Collection,
Jure Leskovec and Anrej Krevl, http://snap.stanford.edu/data, June 2014.   
email: jure at cs.stanford.edu                                             
                                                                           
476 million Twitter tweets                                                 
                                                                           
Dataset information                                                        
                                                                           
467 million Twitter posts from 20 million users covering a 7 month period  
from June 1 2009 to December 31 2009. We estimate this is about 20-30% of  
all public tweets published on Twitter during the particular time frame.   
                                                                           
For each public tweet the following information was available:             
                                                                           
    Author                                                                 
    Time                                                                   
    Content                                                                
                                                                           
We have no Twitter social graph (who-follows-whom graph) available. You can
find a copy of the graph at http://an.kaist.ac.kr/traces/WWW2010.html      
(thanks to Haewoon Kwak, et al.).                                          
                                                                           
Dataset statistics                                                         
Number of users      17,069,982                                            
Number of tweets    476,553,560                                            
Number of URLs      181,611,080                                            
Number of Hashtags   49,293,684                                            
Number of re-tweets  71,835,017                                            
                                                                           
Source (citation)                                                          
J. Yang, J. Leskovec. Temporal Variation in Online Media. ACM Intl.        
Conf. on Web Search and Data Mining (WSDM '11), 2011.                      
                                                                           
As per request from Twitter the data is no longer available.               
                                                                           
-------------------------------------------------------------------------  
http://an.kaist.ac.kr/traces/WWW2010.html :                                
-------------------------------------------------------------------------  
                                                                           
What is Twitter, a Social Network or a News Media?                         
                                                                           
Haewoon Kwak (http://an.kaist.ac.kr/~haewoon),                             
Changhyun Lee (http://an.kaist.ac.kr/~chlee),                              
Hosung Park (http://an.kaist.ac.kr/~hosung),                               
and Sue Moon (http://an.kaist.ac.kr/~sbmoon)                               
                                                                           
Proceedings of the 19th International World Wide Web (WWW) Conference,     
April 26-30, 2010, Raleigh NC (USA)                                        
                                                                           
Twitter, a microblogging service less than three years old, commands more  
than 41 million users as of July 2009 and is growing fast. Twitter users   
tweet about any topic within the 140-character limit and follow others to  
receive their tweets. The goal of this paper is to study the topological   
characteristics of Twitter and its power as a new medium of information    
sharing.                                                                   
                                                                           
We have crawled the entire Twitter site and obtained 41.7 million user     
profiles, 1.47 billion social relations, 4,262 trending topics, and 106    
million tweets. In its follower-following topology analysis we have found a
non-power-law follower distribution, a short effective diameter, and low   
reciprocity, which all mark a deviation from known characteristics of human
social networks~\cite{Newman03}. In order to identify influentials on      
Twitter, we have ranked users by the number of followers and by PageRank   
and found two rankings to be similar. Ranking by retweets differs from the 
previous two rankings, indicating a gap in influence inferred from the     
number of followers and that from the popularity of one's tweets. We have  
analyzed the tweets of top trending topics and reported on their temporal  
behavior and user participation. We have classified the trending topics    
based on the active period and the tweets and show that the majority (over 
85%) of topics are headline news or persistent news in nature. A closer    
look at retweets reveals that any retweeted tweet is to reach an average of
1,000 users no matter what the number of followers is of the original      
tweet. Once retweeted, a tweet gets retweeted almost instantly on next     
hops, signifying fast diffusion of information after the 1st retweet.      
                                                                           
To the best of our knowledge this work is the first quantitative study on  
the entire Twittersphere and information diffusion on it.                  
                                                                           
http://an.kaist.ac.kr/~haewoon/papers/2010-www-twitter.pdf                 
                                                                           
@inproceedings{Kwak10www,                                                  
author = {Kwak, Haewoon and Lee, Changhyun and Park, Hosung and            
    Moon, Sue},                                                            
title = "{W}hat is {T}witter, a social network or a news media?",          
booktitle = {WWW '10: Proc. the 19th Intl. Conf. on World Wide Web},       
year = {2010},                                                             
isbn = {978-1-60558-799-8},                                                
pages = {591--600},                                                        
location = {Raleigh, North Carolina, USA},                                 
doi = {http://doi.acm.org/10.1145/1772690.1772751},                        
publisher = {ACM},                                                         
address = {New York, NY, USA},                                             
}                                                                          
                                                                           
Slides:                                                                    
https://www.slideshare.net/haewoon/                                        
    what-is-twitter-a-social-network-or-a-news-media-3922095               
                                                                           
Due to Twitter's new Terms of Services, we cannot share data containing    
tweets any more.  (for more info, read RWW's article "How Recent Changes to
Twitter's Terms of Service Might Hurt Academic Research")                  
                                                                           
Social graph                                                               
                                                                           
Format                                                                     
USER \t FOLLOWER \n                                                        
* USER and FOLLOWER are represented by numeric ID (integer).               
* These numeric IDs are the same as numeric IDs Twitter managed.           
* For details, see Twitter API Page                                        
                                                                           
Example                                                                    
12      13                                                                 
12      14                                                                 
12      15                                                                 
16      17                                                                 
* Users 13, 14 and 15 are followers of user 12.                            
* User 17 is a follower of user 16.                                        
                                                                           
Restricted user profiles (> 10,000 followers):                             
celebrities_profiles.txt                                                   
                                                                           
Format                                                                     
numeric_id \t verified \t profile_sidebar_fill_color \t profile_text_color 
\t followers_count \t protected \t location \t profile_background_color \t 
utc_offset \t statuses_count \t description \t friends_count \t            
profile_link_color \t profile_image_url \t notifications \t                
profile_background_image_url \t screen_name \t profile_background_tile \t  
favourites_count \t name \t url \t created_at \t time_zone \t              
profile_sidebar_border_color \t following \t gender (infered by name) \n   
                                                                           
* All fields except gender are returned by user method of Twitter API      
                                                                           
* For the description of each field see Returns Values page in Twitter     
    API Wiki                                                               
                                                                           
---------------------------------------------------------------------------
Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:      
---------------------------------------------------------------------------
                                                                           
The SNAP data set is 1-based.  In the SuiteSparse Matrix collection, the   
celebrity profiles are stored in Problem.aux.*, using the same names as    
above, except for the first ("numeric_id" is replaced with "celebrity_id").
The gender is excluded from the SuiteSparse Matrix Collection since it is  
only inferred from first names, and not part of the original Twitter data. 
Inferring gender from first names is very error prone.                     
                                                                           
The userid's range from 12 to 61,578,414, but of those userid's, 19,926,184
have no followers and no one follows them.  The graph Problem.A only holds 
the Twitter follower graph for non-empty nodes.  A(i,j)=1 if user with     
nodeid(i) is followed by user with nodeid(j).  For example, the first two  
non-empty userid's are nodeid(1)=12 and nodeid(2)=13, and the first edge is
(12,13).  Thus A(1,2)=1 since nodeid(1)=12 is followed by nodeid(2)=13.    
The user id's of the nodes in the graph are held in Problem.aux.nodeid.