DataSunrise is sponsoring AWS re:Invent 2024 in Las Vegas, please visit us in DataSunrise's booth #2158

Athena JDBC Driver

Athena JDBC Driver

Are you a Java developer looking to connect to Amazon Athena from your applications? The Athena JDBC driver makes it easy to query data in Amazon S3 using standard SQL.

This article will explain the Athena JDBC driver. It will show how to use it with example code. It will also talk about the security features it has. By the end, you’ll have a solid foundation for using Athena with Java.

What is the Athena JDBC Driver?

The Athena JDBC driver is a type 4 driver. Java apps use it to connect to Athena data source. The JDBC API makes this connection. It converts JDBC method calls into HTTP requests that Athena can understand.

Using the JDBC API to query Athena has several advantages:

  • It abstracts away the details of the underlying HTTP communication
  • It allows using familiar JDBC code and SQL to work with Athena
  • It enables integrating Athena into any Java application or tool that supports JDBC

AWS provides the driver as a standalone JAR file. To use it, simply include the JAR in your application’s classpath.

Connecting to Athena

To connect to Athena using the JDBC driver, you need to construct a JDBC connection string with the following format:

jdbc:awsathena://AwsRegion=[Region];S3OutputLocation=[Output];[Property1]=[Value1];[Property2]=[Value2];...

The key components are:

  • The AWS region hosts your Athena instance.
  • S3OutputLocation – The S3 location where Athena should store query results
  • Additional optional properties for configuration

Here is an example of a JDBC URL that connects to Athena in the us-west-2 region. The system saves the results in a designated S3 bucket:

jdbc:awsathena://AwsRegion=us-west-2;S3OutputLocation=s3://my-athena-results/output;

To actually establish the connection in Java code, use the DriverManager class:

String url = "jdbc:awsathena://AwsRegion=us-west-2;S3OutputLocation=s3://my-athena-results/output";
Connection conn = DriverManager.getConnection(url);

This creates a Connection object that you can then use to execute queries.

Athena Authentication

In order to connect, the JDBC driver needs AWS credentials to authenticate with Athena. There are a few ways to provide credentials:

  1. Environment variables – Set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
  2. Java system properties – Set the aws.accessKeyId and aws.secretKey system properties.
  3. AWS credentials file – Put the access key and secret in the ~/.aws/credentials file.
  4. AWS instance profile – If running on EC2, assign an IAM role to the instance. The driver will automatically retrieve temporary credentials.

The JDBC driver looks for credentials in that order. You should use an instance profile or the credentials file to avoid hard-coding sensitive keys.

Basic Querying

Once you have a connection, you can execute SQL queries on it. Use the createStatement method to create a Statement object, then call executeQuery with your SQL:

Statement stmt = conn.createStatement();
String sql = "SELECT * FROM my_table LIMIT 10";
ResultSet rs = stmt.executeQuery(sql);

This sends the query to Athena, waits for it to finish, and returns the results as a ResultSet. You can then iterate over the rows and access column values:

while (rs.next()) {
String col1 = rs.getString(1);
int col2 = rs.getInt(2);
// ...
}

Remember that Athena queries data stored in S3. Before querying, you need to create an external table that maps to the S3 data:

CREATE EXTERNAL TABLE my_table (
col1 STRING,
col2 INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://my-data-bucket/input/';

This creates a table my_table with two columns backed by a CSV file in S3. With the table in place, you can query it via JDBC as shown above.

Parameterized Queries

For queries that accept parameters, use a PreparedStatement instead of a regular Statement. Construct the SQL with ? placeholders, then bind values to them:

String sql = "SELECT * FROM my_table WHERE col1 = ?";
PreparedStatement pstmt = conn.prepareStatement(sql);
pstmt.setString(1, "foo");
ResultSet rs = pstmt.executeQuery();

This binds the value “foo” to the first ? placeholder. Using a PreparedStatement has a few benefits:

  • It avoids SQL injection by sending parameters separately from the query.
  • Athena can cache and reuse the query plan
  • You can execute the same query multiple times with different parameter values

Security Features

The Athena JDBC driver supports several security-related features and configurations:

Encryption

By default, the driver connects to Athena using HTTPS for encryption in transit. All data sent between the application and Athena is encrypted using TLS.

Access Control

Athena respects the IAM permissions attached to the AWS credentials used by the JDBC driver. You can restrict what data a user can access by granting SELECT permissions on specific databases and tables.

For example, this policy allows querying tables in the ‘my_database’ database only:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:StopQueryExecution"
            ],
            "Resource": [
"arn:aws:athena:us-west-2:123456789012:workgroup/primary"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetTable",
                "glue:GetPartitions",
                "glue:GetPartition"
            ],
            "Resource": [
                "arn:aws:glue:us-west-2:123456789012:catalog",
"arn:aws:glue:us-west-2:123456789012:database/my_database",
"arn:aws:glue:us-west-2:123456789012:table/my_database/*"
            ]
        }
    ]
}

Attach this policy to the IAM user or role used by the JDBC connection to enforce access control.

S3 Encryption

The system stores query results in the S3 location specified in the JDBC URL. To protect this data at rest, you can configure the S3 bucket to use encryption.

The Athena JDBC driver transparently supports reading from and writing to encrypted S3 buckets.

Conclusion

The Athena JDBC driver helps Java apps run SQL queries on data in Amazon S3. It supports a variety of authentication methods and security features to protect data.

To learn more, consult the official Athena JDBC driver documentation.

About DataSunrise

If you need additional security, monitoring, auditing, and compliance features for Athena, consider DataSunrise Database Security. DataSunrise offers tools to control data masking, audit queries in real time, monitor, and ensure compliance with regulations.

To experience DataSunrise live and get a free trial license, contact our team to schedule an online demo. We’ll show you how DataSunrise can enhance Athena’s security.

Next

Azure Cloud: DataSunrise Configuration with OpenTOFU

Azure Cloud: DataSunrise Configuration with OpenTOFU

Learn More

Need Our Support Team Help?

Our experts will be glad to answer your questions.

General information:
[email protected]
Customer Service and Technical Support:
support.datasunrise.com
Partnership and Alliance Inquiries:
[email protected]