Reducing Algorithmic Bias in Generative Artificial Intelligence-Based Cyberbullying Detection Systems
Abstract
The explosive growth of social media has put more pressure on the issue of cyberbullying and its effect on the well-being of its users. Although definitive solutions have become common, using artificial intelligence-inspired detection systems to address harmful content to a moderate degree, there is growing evidence to suggest that they tend to be algorithmically biased, with a disproportionate rate of misclassification occurring when applied to linguistic variations that align with a certain demographic or cultural group. This defeats equity, confidence, and psychological security on the internet. This paper suggests a generative artificial intelligence framework to improve cyberbullying detection, in addition to the proactive reduction of bias. The model combines language modelling in contexts based on transformer-based generative representations and fairness aware optimization. Balanced data sampling, counterfactual data augmentation and loss functions that have fairness constraints are used as a bias reduction measure applied during model training. A dataset of multi-source cyberbullying composed of various linguistic phrases is experimentally tested. The measures of performance are accuracy, precision, recall, and F1 score, as well as having fairness metrics such as demographic parity difference and equal opportunity difference. Findings show that the given approach is competitive in terms of classification performance and its inter-group bias is lower than in the case of the baseline deep learning models. The results point to the significance of ethical and equity concerns in generating artificial intelligence systems of content moderation. The suggested framework will help to create inclusive, responsible, and safe psychological online spaces.