REVEAL Results Vol. 5: Topic modelling and more
Vol. 5 in our series about REVEAL results is out. Here, the University of Koblenz outline what they have been up to over the past years. This ranges from topic modelling to user role analysis.
Promoss – a Topic Modelling Toolbox
Promoss is a topic modelling toolbox able to handle a variety of social media contexts, such as geographical location, timestamps or ordinary variables. It is a free software developed by the Institute for Web Science and Technologies at the University of Koblenz-Landau and GESIS, Leibniz Institute for the Social Sciences in Cologne. Promoss implements Latent Dirichlet Allocation (LDA) with an efficient online stochastic variational inference scheme, meaning that the memory consumption is lower than for standard implementations, and inference is significantly sped-up.
Get the code and further information on
http://topicmodels.west.uni-koblenz.de/#toolbox and
https://github.com/ckling/promoss.
Topic Model Tutorial
If you are interested in topic modelling in general, here you can find a basic introduction to topic modelling for web scientists. In this tutorial, the fundamental mechanisms and ideas behind topic modelling are explained, without using distracting formal notation unless necessary. Intuition and the assumptions behind topic models are taught. Topics explain co-occurrences of words in documents with sets of semantically related words, called topics. These topics are semantically coherent and can be interpreted by humans.
Starting with the most popular topic model, Latent Dirichlet Allocation (LDA), the fundamental concepts of probabilistic topic modelling are explained. The tutorial is organised as follows: after a general introduction, participants are enabled to develop an intuition for the underlying concepts of probabilistic topic models. Building on this intuition, the technical foundations of topic models are covered, including graphical models and Gibbs sampling. The tutorial is concluded with an overview of the most relevant adaptations and extensions of LDA.
Find the tutorial at http://topicmodels.west.uni-koblenz.de/#tutorial
*******************
In the area of User Role Analysis, the Institute for Web Science and Technologies came up with the following:
Role Analysis Toolboxes
Work here was guided by the question how social roles of people can be recognised, given a completely unlabelled social network. Useful toolboxes for user-role analysis in social networks are provided. They are able to extract features from the networks, and to analyse user roles dynamically.
Code on https://github.com/gottron/community-interaction and
https://github.com/Institute-Web-Science-and-Technologies/reveal-role-dynamic
Wiki-Talk Dataset
Here, 28 user interaction networks from Wikipedia of 28 most popular languages were extracted. Nodes (original Wikipedia user IDs) represent users of the Wikipedia, and an edge from user A to user B denotes that user A wrote a message on the talk page of user B at a certain timestamp. Users are assigned with different “roles”, to allow further analysis. Furthermore, an open-sourced software to extract user interaction network from arbitrary Wikipedia data is provided.
Code and more on http://doi.org/10.5281/zenodo.49561 and
https://github.com/yfiua/wiki-talk-parser
*******************
Possibly also of interest in this context:
KONECT – The Koblenz Network Collection
KONECT (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields. Data is collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.
KONECT contains several hundred network datasets of various types, including directed, undirected, bipartite, weighted, un-weighted, signed and rating networks. The networks of KONECT cover many diverse areas such as social networks, interaction networks, and communication networks. The KONECT project has developed free software network analysis tools which are used to compute network statistics, to draw plots and to implement various link prediction algorithms. The results of these analyses are presented on the pages linked to below. Whenever possible, a download of the respective network is provided.
More at http://konect.uni-koblenz.de/
KONECT Software
The KONECT Toolbox is a toolbox for the Matlab programming language for analysing large networks. You can use the KONECT Toolbox to compute statistics of graphs such as the clustering coefficient, the diameter, the assertiveness, sizes of connected components, power laws, etc. You can also use it for generating degree distribution plots, spectral plots, clustering coefficient distribution plots, etc. All statistics and plots on the website were generated with the KONECT Toolbox.
More at https://github.com/kunegis/konect-toolbox
KONECT-Analysis uses the above mentioned toolbox and allows to run analyses of KONECT networks in parallel.
Get the data at https://github.com/kunegis/konect-analysis
Contact
If you would like to provide any feedback on the work performed and illustrated above, this would be much welcome. We appreciate your comments and feedback – in that case please get in touch with the Jérôme Kunegis, Jun Sun and / or Steffen Staab at the Institute of Web Science and Technologies at the University of Koblenz–Landau.
Notice
All services and demos to which links are provided above are operated by the Institute of Web Science and Technologies at the University of Koblenz–Landau. They provide the services as well as access to code and data “as is” and for demonstration purposes only, not assuming any responsibility / liability of whatever nature. Neither the REVEAL consortium nor the hosts of the REVEAL website are liable for anything resulting from usage of the above demos / code either.