In recent years, social media platforms have increasingly restricted researchers' access to their application programming interfaces that allow researchers to retrieve data on user interactions, content, and trends in a structured and controlled manner that aligns with their terms of service and privacy policies. In an opinion piece for University World News Dr Douglas Parry from the Department of Information Science explains the impact of this decision on social media research.
- Read the original article below or click here for the piece as published.
Douglas Parry*
Social media platforms like Facebook, Instagram, X (formerly Twitter), and TikTok now permeate our lives, structuring not only our personal interactions but also our professional and civic engagements. For many people, including researchers, these platforms have become an integral part of their daily routines, consuming a significant amount of their time and attention. Social media platforms have not only changed how we communicate; as the digital town square, they have also reshaped our public sphere, influencing how we engage with others and with the world around us, shaping public discourse, and influencing societal norms.
When we use social media — whether we're scrolling through our newsfeeds on X or LinkedIn, posting an Instagram Reel, or swiping through TikTok — we leave behind digital footprints as by-products of our platform interactions. These digital traces offer researchers an unprecedented opportunity to study individual and collective behaviour. Through these digital traces, social media platforms provide researchers with a wealth of data that can shed light on human behaviour, enabling insights into key 21st century dynamics — from the spread of misinformation and election interference to the potential impacts of our online interactions on our mental health and well-being.
Over the past decade researchers have been able to access these data via platform-provided application programming interfaces (APIs). These APIs allow researchers to retrieve data on user interactions, content, and trends in a structured and controlled manner that aligns with the platform's terms of service and privacy policies.
In recent years, however, social media platforms have increasingly restricted researchers' access to their APIs, citing various concerns such as user privacy, scandals, reluctance to scrutiny, and worries over the use of platform data for training large language models. In 2023, for example, Twitter put its formerly free API behind a paywall, with the kind of access researchers previously enjoyed now starting at $42,000 per month. In a similar move that prompted widespread backlash, Reddit also placed their API behind a substantial paywall, effectively putting a halt to all research using data from the platform. More recently, Meta announced the shutdown of CrowdTangle, it's tool that previously provided researchers with limited access to data across various platforms. These API-access restrictions have significantly impeded social media research, hindering efforts to comprehend online behaviour and its broader societal implications.
While there are other methods that researchers can use to collect social media data, such as self-reports or web scraping, these methods come with substantial limitations, and often don't provide the same level of detailed data as direct access to platform APIs or, in the case of web scraping, they fall in a legal grey-area. Alternatively, some researchers have been able to work directly with social media companies to access data, but this opportunity is usually limited to a select few based in wealthier countries. Without fair and direct access to this data, important questions about individuals and society will remain unanswered.
In response to concerns over the power of online platforms, the European Commission has introduced new regulations to govern the digital sphere in the European Union (EU). Among these, the Digital Services Act (DSA) aims to create a safer online environment by requiring platforms to remove posts containing illegal content, limit certain types of targeted advertising, and be more transparent about how their algorithms work. For “very large online platforms" (VLOPS) with at least 45 million monthly users in the EU (i.e., most social media platforms, but also other services like Booking.com or Google Maps), the DSA also mandates options for users to opt out of recommendation systems, undergo external audits, and share data with researchers and watchdog organizations.
Since the launch of the DSA, the European Commission has enacted formal proceedings against X (Twitter) under suspicion that the platform breached requirements relating to risk management, content moderation, dark patterns, advertising transparency and data access for researchers. More recently, TikTok has also come under investigation over possible breaches of the DSA.
For researchers, Article 40 of the DSA is particularly important as it requires very large online platforms to provide vetted researchers with access to data for research purposes. At present, however, the DSA only mandates access for researchers in the EU or those studying systemic risks within the EU.
But the potential systemic risks posed by social media platforms are not restricted to the EU. There are numerous examples around the world of systemic or individual risks on social media platforms. For example, in India misinformation poses a major risk during the general elections or, through the Global Kids Online and Disrupting Harm projects, UNICEF has an extensive programme investigating a variety of risks that children face when going online.
However, due to restrictions on platform APIs, researchers outside of the EU are currently unable to access the data necessary to investigate these potential individual and systemic risks at the scale needed to draw meaningful conclusions. Without this access, research on topics like misinformation and digital well-being is hindered, and insights into online behaviour and its risks are obstructed. To conduct unbiased research on social media use and its potential risks, we need reliable access to platform data.
To do so, I believe that we should attempt to leverage the “Brussels effect," wherein EU regulations influence global standards, by advocating for similar regulations in our own countries. This could involve lobbying relevant government agencies, or working with EU stakeholders to ensure broader access. Many other countries are now also developing legislation in this area, such as the UK Online Safety Bill, which contains provisions similar to the DSA, and we risk being left behind if we do not consider the online information environment.
While the EU has shown us a path to increased transparency and data access, developing and implementing similar legislation in other countries will not be an easy task. Alongside the necessary political and civil lobbying, it will require expertise from various fields and the establishment of data intermediaries to manage access to platform data. We will also need to address key region-specific ethical and privacy risks associated with increased data access. Despite these challenges, I am optimistic that there is sufficient appetite to make this a reality. For example, engagements have begun to develop an African Alliance for Access to Data to push for greater access to platform data on the continent.
The DSA has set a precedent for other countries seeking to ensure equitable access to critical data for research. Without proactive efforts, we risk being left behind without any robust means of studying the potential systemic risks at play.
*Dr Douglas Parry is a Senior Lecturer in Socio Informatics & Organisational Informatics in the Department of Information Science at Stellenbosch University, South Africa. This is an adapted version of his Commentary “Without access to social media platform data, we risk being left in the dark", published recently in the South African Journal of Science.