Abstract
Financial organizations deal with large amounts of information on transactions, markets and risks that must be sorted through rapidly and correctly. This research aims to discover how Teradata, Hive SQL and PySpark can be managed and improved to meet the requirements of large-scale workloads. The researchgathered and compared data from white papers, research papers and documented studies on each platform, with the dates ranging up until 2020. It appears that Teradata is strong in handling complex reporting, but is very costly when systems need to be scaled. While Hive SQL is best for batch overnight analytics, it is not suitable for queries in real time; in comparison, PySpark balances quick streaming analytics with ease of ETL. It seems that using Teradatafor reporting, Hive SQL for mass processing and PySpark for quick analytics will provide excellent results and ensure strong cost management for modern finance companies.
View more >>