I by no means thought I’d spend my summer time writing blogs — however right here we’re.
This weblog summarises the primary main step in my journey: a analysis proposal I submitted to SARC 2025 (Scientific and Educational Analysis Competitors) that happened throughout my summer time break. It’s my first analysis challenge, and it combines two of my largest pursuits — synthetic intelligence and politics.
Introducing The Difficulty
Political misinformation, generally generally known as faux information, is greater than some fancy time period. It refers to content material that appears to be like like actual information however has no proof or authentic supply. It spreads quickly throughout elections or breaking information occasions, particularly on social media the place it could actually go viral in minutes as a result of huge userbases and ease of information sharing.
However why does this matter?
Misinformation undermines democracy. It distorts election outcomes, weakens social unity and damages public belief of their elected authorities. As extra nations turn out to be globally related, misinformation now spreads past main languages like English and into low-resource languages, like Urdu that lack the sources for detection.
That is what received me pondering — how can I exploit AI to resolve this?
The analysis query
To what extent can fine-tuned massive language fashions (LLMs) outperform conventional machine studying classifiers in detecting multilingual political misinformation on social media platforms?
In less complicated phrases: Can newer LLMs (like BERT and mBERT), which might be skilled on a number of datasets and have a tendency to have higher language comprehension talents, do a greater job than older machine studying classifiers (like Assist Vector Machines and Naive Bayes) throughout a number of languages like Spanish, French, and Urdu too?
The Analysis Plan
I intend on conducting this experiment by way of 4 steps:
Acquire information, clear it so machines can perceive it, practice each forms of fashions, after which evaluate them utilizing normal efficiency metrics.
Finally, this analysis experiment goals to show that LLMs are simpler at detecting political misinformation on social media throughout a number of languages, whereas highlighting the pressing want for an efficient answer to this rising drawback.
Notice: The complete analysis, together with citations and supporting statistics, could be present in my analysis proposal that I hope to additional work on and publish in a serious journal. Though it is a hypothetical experiment primarily based on current analysis, I plan to really construct the AI mannequin as I proceed studying to code.