Image Image Image Image Image Image Image Image Image Image

REVEAL | March 24, 2017

Scroll to top

Top

REVEAL Results Vol. 2: research outcomes of NCSR’D

REVEAL Results Vol. 2: research outcomes of NCSR’D
Jochen Spangenberg

Here’s the next article (Vol. 2) in our series about REVEAL results. In what follows NCSR’D-IIT (the National Center for Scientific Research “Demokritos” – Institute of Informatics and Telecommunications, Athens) provide information about what they have achieved in the project. The focus of NCSR’D work was on community issues, stylometry and relation extraction.

Detecting Communities in Complex Networks

NCSR-D Community Detection

NCSR-D Community Detection

In social networks people form implicit communities by virtue of their behaviour. To determine those communities we often consider the user interactions, thus people that interact often will be in the same community.  We extended the community notion by taking into account other criteria of similarity that are related to the topics that are discussed. In particular we used tags and named entities (i.e. names of places, peoples’ names and organisations) as well as peoples’ interactions to discover communities.

The code of this work can be downloaded from https://github.com/iit-Demokritos/hypergraph-community-density.  Documentation is provided in Deliverable 2.3: Contributor modeling modules (available on the project web site once it is approved for publication, expected to be in late in Q1/ 2017).

Topic sensitive influence estimation

The influence of users with respect the topics of discussion is the purpose of this module. In particular a structural (page-rank like) measure, along with topic detection techniques, are combined with the aid of  machine learning models to produce an influence estimation for each user. Thus each user is assigned a score from 0 to 1 (the higher the number, the more influential the user), as well some keywords that represent the topic in which he/she is influential. All this information can be visualized so that a researcher or a journalist can filter the information according to multiple criteria.

Documentation is provided in Deliverable D2.3 “Contributor modeling modules” on the project web site (available once it is approved for publication, expected to be in late in Q1/ 2017) and in: Katsimpras, G., Vogiatzis, D. and Paliouras, G., 2015. Determining influential users with supervised random walks. In: Proceedings of the 24th International Conference on World Wide Web (pp. 787-792). ACM.

Community Evolution Prediction

As users continuously interact in online social networks, highly dynamic communities are formed which evolve over time. Community evolution prediction concerns the application of machine learning models to forecast the evolution of a community in the near future, based on the current characteristics exhibited by the community. The community evolution prediction tool allows users to see whether communities appearing in Twitter streams will grow or shrink, continue as they are, or completely disappear. Such information can help end users make informed decisions on whether the topic discussed by a community is likely to attract more attention in the future, or the interest on it will decline.

Documentation is provided in Deliverable D2.3 “Contributor modeling modules” on the project web site (available once it is approved for publication, expected to be in late in Q1/ 2017) and in Karna, D., Diakidis, G.,Fassarakis-Hilliard,  D., Vogiatzis, D. and Paliouras G., “Predicting the Evolution of Communities in Social Networks”. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics. Limassol, Cyprus, 2015.

Trustworthiness estimation of Twitter posts

Trustworthiness of Twitter posts is estimated by considering the social position of the user that effectuated the post, as well as various measures related to the content of the post. Trustworthiness can be low, medium or high. In particular, there are three factors that contribute to the trustworthiness of a post: First, it is the influence of a user, which is extracted by considering structural measures. Second, the topic of a post is extracted with computational linguistic techniques, and finally the Twitter lists in which the user participates are retrieved. The general idea is that a post from an influential user that is closely related to the lists he/she participates in is deemed as trustworthy.

Documentation is provided in deliverable D4.3 Context Analysis Toolbox on the project web site (available once it is approved for publication, expected to be in late in Q1/ 2017).

Relation Extraction

NCSR-D Relation Extraction Module

NCSR-D Relation Extraction Module

This module has a web-based user interface that allows for fast and advanced querying on relations extracted from tweets. Given a collection of tweets, this module creates a knowledge base of the important relations found in the collection. Specifically, it extracts the named entities (i.e. people, places and organisations) found, and the relations between them. Moreover, it performs a semantic grouping of the named entities thus clustering contextually similar words and relations (e.g. Obama and USA). This allows for semantically richer results for the queries at hand, exploiting the inherent connection of the named entities, leading to more efficient knowledge mining and information extraction.

Documentation is provided in Deliverable D3.3 Multimedia Forensic Analysis on the project web site (available there once it is approved for publication, expected to be in late in Q1/ 2017). For a demonstration see: http://relex.iit.demokritos.gr/merge/syria/USA%20invades%20Iraq/collection/gsubs

Stylometry Profiling

This module provides predictions of the age and gender of authors, based on their posts. In the context of REVEAL it is applied in streaming batches of tweets. It is heavily based on the analysis of the lexical and contextual content of tweets, in order to find discriminating features to be used as indicators for the different age classes and gender.

The underlying algorithm has been tested in practice as part of the Author Profiling Challenge for PAN16, where it was placed first on average for the English Language, thus being among the state-of-the-art models for this task.

Contact

If you would like to provide any feedback on the work performed and illustrated above, this would be much welcome. Also, if you want to get in touch for any related issues, please do not hesitate to get in touch with NCSR-D’s Dimitrios Vogiatzis and/or Anastasia Krithara.

Notice

All services and demos to which links are provided above are operated by NCSR-D. NCSR-D provide the services as well as access to code and data “as is” and for demonstration purposes only, not assuming any responsibility / liability of whatever nature. Neither the REVEAL consortium nor the hosts of the REVEAL website are liable for anything resulting from usage of the above demos / code either.