oxford.berlin Thesis Collaboration

The oxford.berlin social data science collaboration aims to improve knowledge exchange and collaboration between students and researchers interested in data science applications in the social sciences. For this reason, we have initiated a Thesis Exchange Collaboration, which encourages students to conduct their Master’s thesis in cooperation both with a supervisor from TU Berlin and from the University of Oxford engaged in the oxford.berlin Social Data Science collaboration. Data science master students from the Oxford Internet Institute or TU Berlin can express their interest in conducting data-driven research on timely and exciting topics under the supervision of researchers from both institutions. During their thesis, they will have the opportunity to visit the respective partner university as a guest researcher for one or two weeks, being provided a work place and in-depth mentoring.

Interested?

Students are accepted throughout the year, however, places are limited. Please submit your proposal for the oxford.berlin thesis exchange collaboration either to

including a letter of motivation, your CV, as well as an overview of your grades.

Some Topics

Measuring sustainable tourism with online platform data

In times of climate change, environmental sustainability becomes more important in all parts of the economy. In particular travel and tourism have been criticized for their negative environmental impact. Accordingly, the United Nations have made sustainable tourism one of their Sustainable Development Goals. In order to reduce the environmental footprint of the tourism industry, it is important to provide an up-to-date, accurate measure of environmental sustainability of the sector. This is captured by SDG indicator 8.9.2 Proportion of jobs in sustainable tourism industries out of total tourism jobs.

Currently, there is no established methodology to measure the indicator and it is, hence, classified as Tier 3 indicator (https://datapopalliance.org/measuring-the-unmeasured/).

In order to develop a prototype measure of the indicator, we have worked with the UNDP country office in Albania (Braesemann, 2018). Using a list of environmentally sustainable hotels in Albania and data from TripAdvisor and Booking.com, we could train a statistical model that identified the ‘green’ hotels with high accuracy and allowed us to provide a first measure of the indicator. While this exercise was a first proof that it is possible to use freely available data from online platforms to measure a sustainable development indicator, the scope was limited to one country.

The aim of this project is to extend the methodology used in the Albanian case to provide a global measure of sustainable tourism using data from online platforms. As a measure of sustainability, TripAdvisor’s GreenLeader award, which is available in more than 60 countries, could be used. After obtaining hotel-specific data from the online platforms, a predictive model should be trained to identify those features that can be associated with a hotel’s sustainability.

The model should then be used to extrapolate to those countries for which no sustainability data are available. The resulting global indicator could be presented as an online dashboard and it will be possible to present the results of the project during a UNDP conference in 2020.

In summary, the project will be policy-relevant. Moreover, it adds to a better understanding of the global distribution of sustainable tourism and the potential of nowcasting to measure sustainable development goals (Fatehkia et al., 2018).

References

Braesemann, F. (2018) Estimating the Number of Jobs in Sustainable Tourism in Albania using Big Social Data from TripAdvisor and Booking.com, Project Report (LINK)

Fatehkia M., Kashyap R., Weber, I. (2018). Using Facebook ad data to track the global digital gender gap. World Development, 107, 189-209.

Reputation Portability and Inequality

Social Media and online platforms enable users to more easily compare and find products and services on the web. While such platforms allow users to make more informed decisions, these platforms can also drive demand to those suppliers that appear at the top of search queries or rankings. Customer reviews play a crucial role in that process, as those users with more online reviews are ranked higher, which in turn drives more customers to them. The resulting winner-takes-it-all phenomena could be exacerbated if users can transfer their reputation between platforms, leading to so-called superstar-markets (Sheffer et al., 2017).

Consider two platforms (or markets) A and B on which users/businesses collect ratings (and hence build up reputation) where a significant fraction of users is active on both platforms (“multi-homing”). An example would be TripAdvisor and GoogleMaps which are both being used by restaurants. A common observation on such rating sites is that the distribution of the number of ratings the businesses (e.g., restaurants) have received follows a power law distribution (Taeuscher 2019). Roughly speaking, this means that there exist very few restaurants with very many ratings while the great majority of restaurants have few or even zero ratings. Among various other factors, the restaurants’ capability to attract additional demand is likely to depend on the number of ratings they already have (inter alia, based on the mechanism of social proof).

Up to now, the different platforms represent silos of reputation with basically no cross-platform spillover whatsoever. However, in view of the EU’s regulation on data portability (GDPR, Article 20), it is likely that we will see an increased availability of hitherto platform-bound data across boundaries. First research on reputation portability has shown that ratings do in fact exert positive consumer effects also across platform boundaries (Teubner et al., 2020 in press). Allowing the superstars of one domain (e.g., on TripAdvisor) to transfer their reputation (and hence market power) to other domains seamlessly may thus result in even increased concentration of demand.1

In this project, a model of reputation portability between platforms and its effects on demand allocation is developed and contrasted against actual platform data from Google Maps and TripAdivsor. The project will be jointly supervised by Prof. Teubner from TU Berlin and Dr. Braesemann from Oxford University.

The resulting thesis will help to understand the potentially adverse implications of a currently discussed policy measure, which is thought to help alleviate inequalities in the platform economy.

References

Marten, et al. “Inequality in nature and society.” Proceedings of the National Academy of Sciences 114.50 (2017): 13154-13157.

Taeuscher, Karl. “Uncertainty kills the long tail: demand concentration in peer-to-peer marketplaces.” Electronic Markets (2019): 1-12.

Politician’s popularity and Twitter activity

We witness a steady process of political change, taking place in an era where traditional media is increasingly competing with social media as a source of information for many people, creating a high- choice media environment. This process of a wider media transformation raises important questions for democracies and political campaigning.

Within this political social media landscape, Twitter is considered as the most important and preferred media outlet for the purpose of launching targeted political campaigning (Darius and Stephany, 2019). While politicians increasingly opt to present political content on Twitter, it remains questionable whether individual political attention is actually influenced by the social media activity (Stier et al., 2017).

The project aims to better understand how the extent and style of online communication on Twitter affects the amount of attention politicians from different political parties receive. This overall research objective should be achieved by measuring the politicians’ activity on Twitter, which is to be compared with their coverage in large media outlets and webpage traffic to their Wikipedia sites.

After identifying the Twitter and Wikipedia pages of politicians in a given country (e.g. all members of the German Bundestag), time series data from the platforms should be collected via the respective APIs and connected with historical news data about the politicians from data providers such as Spinn3r, Reuters, or eventregistry. Based on the combined dataset, it is the aim of the project to understand whether politicians’ Twitter activity has a positive effect on their online attention, controlled for events covered in the media.

The project will assess the effectiveness of political campaigning to gain public attention. Thus, it helps to understand the transition of political communication in times of social media.

References

Darius P., Stephany F., (2019). Twitter ‘hashjacked’: Online Polarisation Strategies of Germany’s Political Far-right. SocArXiv. osf.io/preprints/socarxiv/6gbc9

Stier, S et al. (2018) ‘Systematically Monitoring Social Media: The Case of the German Federal Election 2017’. Preprint. SocArXiv. https://doi.org/10.31235/osf.io/5zpm9.