RAPID intelligence to analyse large datasets

Over half of the world’s population now use social media, generating enormous and diverse datasets. These datasets are made up of individual posts from a huge number of sources and locations, on a boundless range of topics.

These datasets are a rich source for researchers, particularly when studying fast-changing phenomena or events – but it is a huge challenge to turn this vast trove produced at speed into relevant information in a timely way.

The Real-time Analytics Platform for Interactive Data-mining (RAPID) was developed by scientists at the University of Melbourne in collaboration with  Australia’s Defence Science and Technology Group (DSTG).

RAPID absorbs and analyses exceptionally large datasets, producing high-quality targeted information for researchers from a range of real-time, fast-moving data streams, including social media.

“RAPID is a big-data processing platform that uses certain techniques, paradigms and algorithms to analyse a large amount of data in real time,” says Professor Shanika Karunasekera, of the School of Computing and Information Systems in the University’s Faculty of Engineering and IT.

Professor Karunasekera says that artificial intelligence lies at the heart of RAPID, and when the platform is trained on a data stream, it gets better at delivering the required information over time.

“Twitter is a popular source of data because it is so widely used,” she says. “It’s publicly and freely available, and the same applies to Reddit.

“RAPID can extract data from any data source once it has access, such as data from Google News, for example, so where an organisation has access to any external subscriber-pays database, RAPID can be used to run keyword and source analyses on that data to deliver extensive insights.”

How RAPID works

RAPID can track topics by extracting keywords and hashtags from millions of user posts and discussions across a social media site.

Topic discussions can be tracked by RAPID operators who can choose keywords, set time windows for the discussions they want to follow, and even choose how often the data is sampled (for example, for 10 minutes each hour over three days).

As RAPID follows a keyword, it can produce visual maps and graphs that can identify keyword clusters, show networks of users who are connected to each other and display the different interactions between social media post authors and followers.

The system automatically spots new keywords that emerge by using a co-occurrence keyword expansion algorithm, which then helps to extract more useful information from the data.

The platform can carry out a sentiment analysis, which gives a general indication of prevailing moods and attitudes towards a topic displayed by different groups. Discussion trees can also be generated for a specific time period, showing who was engaged in particular discussions (potentially adding further depth to real-time data.)

Unlike various Twitter-specific topic and keyword analysis programs that target a limited domain, RAPID can be applied to any application domain, topic or event.

RAPID was developed using two widely used open-source software projects, Apache Storm and Apache Kafka. These are distributed, real-time processing systems that handle large-scale data pipelines by spreading computing power across a cluster of machines.

A wide range of applications

RAPID was developed to assist Australia’s Defence Science and Technology Group (DSTG) which had a range of applications, including broader understanding of misinformation campaigns and the use of social media for terrorism propaganda and recruitment.

The platform has since been trialled in a range of different scenarios, including by DSTG and by researchers from the US Army Research Laboratory.

The NSW government has used RAPID to better understand the public response of residents to COVID-19 vaccinations, by creating and defining keywords and then tracking these keywords, refining regions, and using the platform’s topic-tracking capability to see a relevant tweet live-stream.

RAPID has an excellent track record in countering disinformation, and is able to extract messages and identify users who are actively undertaking disinformation, in real time, says Professor Karunasekera.

In a similar vein, a Fake-News detection project currently under way is exploring the unsupervised detection of fake news production.

RAPID has also been trialled in a public health communication scenario, analysing relevant public needs and beliefs, and crafting effective real-time messaging and communication responses with the potential to encourage healthy behaviours.

The RAPID team have been adding integrations including off-the-shelf smart automated analyses which could add value and enhance the customisation potential of the RAPID platform.

The platform’s ability to track data intelligently and provide sentiment analysis makes it attractive to a wide variety of stakeholders looking for timely insights from social media, Professor Karunasekera says.

These applications could include academic scholars, commercial companies, government organisations, advertising and marketing researchers, and professionals in crisis detection and disaster management.

Learn more about Artificial Intelligence research

First published on 30 January 2023.

Share this article
TechnologyResearch updatesFaculty of Engineering and Information Technology