Paper Title

Increasing Data Discovery in Data Lake Platforms Through AI Integration

Keywords

  • artificial intelligence
  • big data management
  • data discovery
  • data lakes
  • machine learning

Publication Info

Volume: 15 | Issue: 3 | Pages: 45-76

Published On

May, 2025

Downloads

Abstract

With the current big data, business organization accommodate digital platform and collected enormous structured and unstructured dossier. There is significant centralized repository established to store big data in raw without predetermined schema called data lakes. The Data lakes provide an effective solution to massive data, accommodating influx information, supporting data scalability and flexibility. Noteworthy Data Lakes faces underpinning challenges in metadata effective management silos data, and inadequate query processing techniques. Similarly lacking predetermined schema in Data lakes may cause significant concerns on data discovery retrieval, understanding and applications. In this paper, there is an underpinning exploration on how specific Artificial Intelligence (AI) integration into data lakes supports efficiency, applicability, and big data analytical capabilities. AI presents underpinning techniques such as machine learning, automated metadata tagging, and natural language processing are promising in Data lakes and integration to counteracts and manage current challenges in data discovery in Data Lakes through AI. Integrating AI specific techniques such as AI-based automation is integral in discovering in data in data lakes, supporting sematic search, and increased predictive data analysis for insightful vast datasets management. The main focus of this study is to develop a comprehensive groundwork for increasing data discovering through AI integration in Data Lakes to enhance data retrieval, management and insightful applications for AI-based automaton for business organizations. Importantly, the framework will addresses significant shortcomings in Data Discovery in Data Lakes, giving automatic data categorization, and, reduced time in manual data management. Moreover, the study delve in Machine learning models to enhance data discovery in data lakes through enhanced search semantics, supporting context-aware queries and increased user engagement and usage to increase effectiveness, efficiency and relevancy in data locations and names. AI integration into the data discovery within Data Lakes supports data analysis and anomaly detection different from traditional methods that consumes time in pattern identification and irregularities management in enormous datasets, eliminating human error. Integrating AI-automation systems into predictive analytics are integral for business organizations to identify partnered trends in customers and sales, forecasting outcomes, and effective and accurate anomalies detection in big data discovery and AI integrations. According to the study experiments, AI integration into Data Lakes during data discovery results into high performance and outcomes. Noteworthy, there is a 50% reduction in time consuming in data retrieval through AI integration into Data Lakes with 40% anomaly detection improvement from the previous traditional data discovery methods, showing how AI integration can optimize data management processes. The study’s implication posit direct impact on data science and data engineering, application in myriad of industries for big data analytics and data-driven decision-making. Noteworthy AI integration in Data Lakes can support quality care within healthcare industry by enhancing patient data retrieval and use of predictive analysis for speedy diagnosis. Similarly, Enhanced data discovery in Data Lakes through AI supports finance industry by early fraud detection risk analysis and transaction compliance and regulation learning, hence improved performance in the sector. In the current Digital marketing and smart management, AI-Based Data Lakes and data discovery ensure

View more »

Uploaded Document Preview