Defining Metrics for Evaluating General Artificial Intelligence Across Diverse Problem Domains
Abstract
Defining robust and universal metrics for evaluating General Artificial Intelligence (GAI) is essential for its development and implementation across diverse problem domains. This paper explores the theoretical and practical aspects of GAI evaluation, focusing on frameworks that assess adaptability, problem-solving capability, and generalization. Existing benchmarks often emphasize narrow tasks, failing to capture the broader spectrum of intelligence characteristics. By integrating insights from psychology, neuroscience, and computational theory, this work proposes a multidimensional evaluation model incorporating metrics such as knowledge transferability, reasoning depth, and robustness against novel challenges. The study also discusses the importance of domain-agnostic evaluation standards to ensure fairness and comprehensiveness.