The concept of federated queries isn’t new. Facebook PrestoDB popularized the idea of distributed structured query language (SQL) query engines in 2013.
Over the years, AWS, Google, Microsoft, and many others in the industry have accelerated the adoption of a distributed query engine model within their products. For example, AWS developed Amazon Athena on top of the Presto code base, while Google’s BigQuery is based on Cloud SQL.
Organizations store their data in several database and storage systems, such as relational databases (MySQL, SQL Server, Postgres) and object storage systems (S3, HDFS) for storing large amounts of data at a cheaper rate. Organizations often faced the challenge of being able to quickly access the right data at the right time.
Presto simplified this by enabling the querying of relational and non-relational databases and object stores - or disparate sources as you can call it collectively - via SQL, allowing for easier access to data from BI tools and even organizations’ own code.
Facilitating access to data from multiple sources in a single query, and that too fast is what is so revolutionary about Query Federation. It’s because consolidating data from different sources used to be a long, tedious process. You need Extract, Transform, Load (ETL) processes to bring data together into a shared format.
But ETL tools typically aren’t considered suitable by the community experts if you are looking for near-real-time or on-demand data access. They were designed for a batch mode of working where commands are read and acted upon as a batch without user intervention. ETL was more suited for established, slow-changing data. Also, ETL is not accessible to data analytics and business users directly.
The benefits of federated queries are immense compared to the traditional querying approaches of other database solutions. Here are a few:
There is no need for users to remember credentials or log into individual databases as everything is centralized within the federated query service. This enables unified access to data across all source types and IT environments.
Federated queries make it easier for data scientists and analysts to analyze data as the traditional ETL tools were geared more toward developers and coders who understood database language.
Federated queries are usually optimized before execution, enabling hundreds of user queries to be load-balanced and de-duplicated in real-time. This leads to higher throughput and lowers costs when using advanced analytics or business intelligence tools, promoting data-driven decision-making.
The biggest advantage is that the users don’t need to know the specific query or data language for each database. Automated Data Definition Language (DDL) conversion in federated queries allows anyone to perform queries on all data sources.
Our extensive study on digital transformations finds that over 90% of IT and data leaders are planning to use either multi-cloud or hybrid cloud architectures. Applications and data are therefore highly unlikely to be consolidated on a single cloud platform – be it AWS, Azure, GCP, IBM or your private virtualized environment.
This means that data will increasingly get distributed across data sources and become difficult to manage. While data lakes are one desirable solution, consolidating all enterprise data into a lake and continuously updating it can be expensive.
Looking for the fastest way to analyze data stored in Amazon S3?
Users simply point Athena at data stored in their S3 bucket, identify their fields, run the queries, and get results back in seconds.
Amazon’s new Athena platform uses federated queries that enable quick and easy SQL queries across data stored in several relational, non-relational, object, and custom data sources. Using data source connectors that run on AWS Lambda, data scientists, engineers, and analysts can analyze data from multiple sources running on-prem or the cloud with a single SQL query.
Trianz has built Athena Federated Query (AFQ) extensions on top of the Athena platform to simplify BI and facilitate cross-data-source analytics. These extensions typically scan data from S3 and execute the Lambda-based connectors to read data from on-prem Teradata, Amazon Redshift, Google BigQuery, and SAP HANA.
With a complete ecosystem of Trianz and AWS AFQ connectors, you can produce hybrid/multi-cloud analytics and visualizations without migrating or consolidating your data. The combined library of AFQ connectors can pull data from literally any source in your enterprise or other cloud platforms such as Azure or GCP.
Our AFQ connectors have been tested and tried by Amazon and large, complex data organizations in Fortune 1000 companies. The power of Athena is to allow technical and non-technical users to generate rich, powerful visualizations with simple queries, and saves time and resources in the long run.
Connecting more people to data has become imperative for organizations worldwide. In Top Trends in Data & Analytics for 2022, Gartner stated, “Connections between diverse and distributed data and people create truly impactful insight and innovation. These connections are critical to assisting humans and machines in making quicker, more accurate, trustworthy, and contextualized decisions while considering an increasing number of factors, stakeholders, and data sources.”Explore
Since the dawn of business, users have looked for three main components when it comes to data: Search | Secure| Share. Now let's talk about the evolution of data over the years. It's a story in itself if one pays attention. Back then, applications were created to handle a set of processes/tasks. These processes/tasks, when grouped logically, became a sub-function, a set of sub-functions constituted a function, and a set of functions made up an enterprise. Phase 1 – Data-AwareExplore
Practitioners in the data realm have gone through various acronyms over the years. It all started with "Decision Support Systems" followed by "Data Warehouse", "Data Marts", "Data Lakes", "Data Fabric", and "Data Mesh", amongst storage formats of RDBMS, MPP, Big Data, Blob, Parquet, Iceberg, etc., and data collection, consolidation, and consumption patterns that have evolved with technology.Explore
Enterprises have, over time, invested in a variety of tools, technologies, and methodologies to solve the critical problem of managing enterprise data assets, be it data catalogs, security policies associated with data access, or encryption/decryption of data (in motion and at rest) or identification of PII, PHI, PCI data. As technology has evolved, so have the tools and methodologies to implement the same. However, the issue continues to persist. There are a variety of reasons for the same:Explore
What Is an SQL Query Engine? SQL query engine architecture was designed to allow users to query a variety of data sources within a single query. While early SQL-based query engines such as Apache Hive allowed analysts to cut through the clutter of analytical data, they found running SQL analytics on multi-petabyte data warehouses to be a time-intensive process that was difficult to visualize and hard to scale.Explore