SQL Query Engines: The Fastest Way to Analyze Data

What Is an SQL Query Engine?

SQL query engine architecture was designed to allow users to query a variety of data sources within a single query. While early SQL-based query engines such as Apache Hive allowed analysts to cut through the clutter of analytical data, they found running SQL analytics on multi-petabyte data warehouses to be a time-intensive process that was difficult to visualize and hard to scale.

With the introduction of powerful interactive query engines like Athena Federated Query, users can quickly query multiple sources of data within a serverless, easy-to-use fully managed SQL interface. This process eliminates coding complexity by allowing users to perform cross-data source analytics using familiar SQL constructs for quick analysis, or by using scheduled SQL queries to extract and store results in Amazon S3 for subsequent analysis.


What is a Federated Query?


With a federated query, what is being “federated” or brought together are the sources of information that can answer the query. For instance, think of the advantage of being able to look in all the rooms in your house at once for the car keys that have gone missing? Now imagine being able to instantaneously look in every book in a library for an answer — no matter what language the book was written in.

This is akin to how Athena Federated Query connectors have improved database queries and the delivery of business intelligence insights. The connectors provide the ability to connect and query databases across multiple on-prem and public cloud environments at a much faster rate than traditional database management systems (DBMS).

Athena Rapid Analytics

What makes Amazon Athena Federated Query unique is that the tool allows non-technical users to scan multiple sources of data with simple SQL commands. Users can also begin analyzing and reporting on their structured and unstructured data within a single query using the built-in query editor.

In addition, with Athena’s Federated Query connectors, users no longer need to deal with setting up servers, frameworks, or clusters. Instead, users simply select the stored data in their data lake (Amazon S3), define the query, and the AFQ connectors automatically retrieve and aggregate data from specified repositories.


How do Athena Federated Query Connectors Work?


Athena Federated Query Connectors work by using data source connectors that run on AWS Lambda to run federated queries. A data source connector is a piece of code that can translate between the target data source and Athena.

An SQL connector can be thought of as an extension of Athena's query engine. Prebuilt Athena data source connectors allow Amazon data sources like Amazon DynamoDB and JDBC-compliant relational data sources such as MySQL and PostgreSQL to work in unison.

Users can also employ the Athena Query Federation SDK to write custom connectors. This allows them to choose, configure, and deploy a wide range of data source connectors to their account.

After the user deploys the data source connectors, the connector is associated with a catalog that the user can specify in SQL queries. From there, they can combine SQL statements from several catalogs, spanning multiple data sources with a single query.


How Does Athena Simplify and Unify Data Analytics?


When the query is submitted against a data source, Athena invokes the corresponding connector to identify parts of the specific tables that need to be read. Athena then manages data parallelism and pushes down filter predicates. Based on the user submitting the query, connectors can provide or restrict access to specific data elements to strengthen user-level security.

The Connectors use Apache Arrow as the format for returning data requested in a query. This enables connectors to be implemented in different programming languages and data formats, such as CSV, TSV, and JSON. It also supports open-source columnar formats and compressed data formats such as Zlib, LZO, and gzip.

An image showing how Amazon Athena connects data sources.

Since connectors are processed in an event-driven, serverless computing platform (Lambda), they can be used to access data from any data source that is accessible from Lambda in the cloud or on-prem infrastructure.

The result of using Athena Query Federation is that users no longer need multiple languages to build complex pipelines to extract, transform, and load into a data warehouse before users can query the data.


Federated SQL Query Engines in Action


Imagine an eCommerce store has noticed a spike in customers reporting their orders are not being processed. With key operational metrics and logs residing in multiple places, it would be next to impossible to quickly assess the scope of this problem — or even begin to do root cause analysis before having to refund the order.

By using Athena, the business can use federated queries to quickly find relevant details — even if the analyst is not experienced with each of the individual systems that make up their data architecture. Athena allows the user to run a query and learn how many customer orders are problematic. They can then observe commonalities between the orders to locate the root cause or any relevant information that helps them to investigate further.

In addition, the business can extract all the various warnings and errors from the specified connectors and AWS resources. By running a query search across multiple silos and data stacks, Athena can pull all the relevant criteria to determine what orders need to be processed.

After creating the tables in a matter of seconds, that information can then be easily viewed and used to dive deeper to troubleshoot the problem. What may have taken days without the use of a federated query now takes far less time and technical wrangling.

To learn more about how Athena Federated Query Connectors work under the hood, check out the following video:


Do I Need an SQL Query Engine?


As with any data analytics technology, the answer to the SQL query engine question depends on the size and scope of data that needs to be visualized. Some organizations have optimized their database management systems so that SQL query engines are not necessary to pull data across multiple sources. Other organizations have overwhelmed their data warehouses and are seeking new ways to simplify BI and facilitate cross data-source analytics.

This is where the scalability of SQL query engines become a viable alternative to traditional DBMSs. However, it is important to note that while database technologies have been around for decades, SQL query engines are a relatively new technology. If the query engine is not implemented correctly, it may disrupt workloads and lead to suboptimal cost reduction.

While query engines are a great way to visualize analytics for large sets of data, new users should consult an analytics professional who can help them better understand their use cases and business requirements before implementing a big data analytics solution.

If you are interested in learning more about an analytics solution that does not require a complex technology ecosystem, Trianz would be happy to demonstrate various querying and reporting solutions that can improve upon your current development time, accuracy, and performance.

Experience the Trianz Difference

Trianz enables digital transformation through effective strategies and excellence in execution. Collaborating with business and technology leaders, we help formulate and execute operational strategies to achieve intended business outcomes by bringing the best of consulting, technology experiences and execution models.

Powered by knowledge, research, and perspectives, we enable clients to transform their business ecosystems and achieve superior performance by leveraging infrastructure, cloud, analytics, digital and security paradigms. Reach out to get in touch or learn more.

×

You might also like...

Get in Touch

Let us help you
transform and grow


Let’s Talk

x

Status message

We're eager to assist you! Please leave a message and we'll get back to you shortly.

By submitting your information, you agree to our revised  Privacy Policy.