Detecting AI-Generated Text in Educational Content: Leveraging Machine Learning and Explainable AI for Academic Integrity
Authors:
Ayat A. Najjar,
Huthaifa I. Ashqar,
Omar A. Darwish,
Eman Hammad
Abstract:
This study seeks to enhance academic integrity by providing tools to detect AI-generated content in student work using advanced technologies. The findings promote transparency and accountability, helping educators maintain ethical standards and supporting the responsible integration of AI in education. A key contribution of this work is the generation of the CyberHumanAI dataset, which has 1000 ob…
▽ More
This study seeks to enhance academic integrity by providing tools to detect AI-generated content in student work using advanced technologies. The findings promote transparency and accountability, helping educators maintain ethical standards and supporting the responsible integration of AI in education. A key contribution of this work is the generation of the CyberHumanAI dataset, which has 1000 observations, 500 of which are written by humans and the other 500 produced by ChatGPT. We evaluate various machine learning (ML) and deep learning (DL) algorithms on the CyberHumanAI dataset comparing human-written and AI-generated content from Large Language Models (LLMs) (i.e., ChatGPT). Results demonstrate that traditional ML algorithms, specifically XGBoost and Random Forest, achieve high performance (83% and 81% accuracies respectively). Results also show that classifying shorter content seems to be more challenging than classifying longer content. Further, using Explainable Artificial Intelligence (XAI) we identify discriminative features influencing the ML model's predictions, where human-written content tends to use a practical language (e.g., use and allow). Meanwhile AI-generated text is characterized by more abstract and formal terms (e.g., realm and employ). Finally, a comparative analysis with GPTZero show that our narrowly focused, simple, and fine-tuned model can outperform generalized systems like GPTZero. The proposed model achieved approximately 77.5% accuracy compared to GPTZero's 48.5% accuracy when tasked to classify Pure AI, Pure Human, and mixed class. GPTZero showed a tendency to classify challenging and small-content cases as either mixed or unrecognized while our proposed model showed a more balanced performance across the three classes.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
Towards Refactoring of DMARF and GIPSY Case Studies -- A Team 5 SOEN6471-S14 Project Report
Authors:
Pavan Kumar Polu,
Amjad Al Najjar,
Biswajit Banik,
Ajay Sujit Kumar,
Gustavo Pereira,
Prince Japhlet,
Bhanu Prakash R.,
Sabari Krishna Raparla
Abstract:
This paper presents an analysis of the architectural design of two distributed open source systems (OSS) developed in Java: Distributed Modular Audio Recognition Framework (DMARF) and General Intensional Programming System (GIPSY). The research starts with a background study of these frameworks to determine their overall architectures. Afterwards, we identify the actors and stakeholders and draft…
▽ More
This paper presents an analysis of the architectural design of two distributed open source systems (OSS) developed in Java: Distributed Modular Audio Recognition Framework (DMARF) and General Intensional Programming System (GIPSY). The research starts with a background study of these frameworks to determine their overall architectures. Afterwards, we identify the actors and stakeholders and draft a domain model for each framework. Next, we evaluated and proposed a fused DMARF over GIPSY Run-time Architecture (DoGRTA) as a domain concept. Later on, the team extracted and studied the actual class diagrams and determined classes of interest. Next, we identified design patterns that were present within the code of each framework. Finally, code smells in the source code were detected using popular tools and a selected number of those identified smells were refactored using established techniques and implemented in the final source code. Tests were written and ran prior and after the refactoring to check for any behavioral changes.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.