How to Configure DataSunrise to Mask Data for Amazon Athena
We expand our opportunities by introducing Dynamic Masking for Amazon Athena. Now you can obfuscate sensitive data from Athena on-the-fly to protect it from unauthorized persons.
Client applications use an encrypted connection to connect to Athena. DataSunrise can mask data in Athena only in Proxy mode. One secure connection is split into two at that: from a client application to DataSunrise and from DataSunrise to Athena. At the same time, it verifies the server’s certificate. If this certificate is self-signed, some applications may consider a connection insecure and reject it.
DataSunrise by default has a self-signed SSL certificate used to establish an encrypted connection between a client application and a DataSunrise proxy.
There are two options for a connection to Athena via a DataSunrise proxy:
- Use DataSunrise’s self-signed certificate;
- Use a properly signed SSL certificate from a certain Certification Authority.
If a certificate is authentic, a connection will be established. Otherwise, it will be considered by a client application as a man-in-the-middle attack and the connection will not be established unless the client application is properly configured. In this article, we will review this process using DBeaver as an example.
Creating Instance for Amazon Athena
First, you need to create an Instance for Athena in DataSunrise. This is necessary to create a database profile. The beginning is the same for every database: just input connection details for your Athena. Note that query results are stored in S3 buckets, so you need to specify an S3 bucket in the Query Result Location field.
Below, in the Capture Mode section, you need to select Proxy. In the Proxy Keys field select Create New to generate a new SSL Key Group and attach it to your proxy.
After that click Save.
Import Certificate into Keystore
Do the following:
- After attaching a new SSL Key Group to your proxy in DataSunrise, navigate to Configuration → SSL Key Groups. Locate your SSL Key Group in the list and open it, copy the Certificate from the corresponding field into a text file. For example, create a file with the name dsca.crt and paste the certificate there. Put the file in C:\athena folder.
- Run the command line as administrator and navigate to your DBeaver installation folder. For example C:\Program Files\DBeaver\jre\bin\.
- Add your Athena certificate to cacerts which is located in C:\Program Files\DBeaver\jre\lib\security. For example:
keytool.exe -importcert -trustcacerts -alias dsca -v -keystore "C:/Program Files/DBeaver/jre/lib/security/cacerts" -file "C:/athena/dsca.crt" -storepass changeit
Setting up and Running DBeaver
As far as we use DBeaver to be able to send queries to Athena via the DataSunrise proxy, you need to locate the dbeaver.ini file in your DBeaver installation folder and open it with a text editor. Add the following lines to the end of the file:
-Djavax.net.ssl.trustStore=<jks_file_path> -Djavax.net.ssl.trustStorePassword=<jks_file_password>
For example:
-Djavax.net.ssl.trustStore=C:/Program Files/Java/jdk-11.0.1/lib/security/cacerts -Djavax.net.ssl.trustStorePassword=changeit
Run DBeaver with the following parameters (example):
dbeaver.exe -vm "C:\Program Files\DBeaver\jre\bin" -vmargs -Djavax.net.ssl.trustStore="C:\Program Files\DBeaver\jre\lib\security\cacerts" -Djavax.net.ssl.trustStorePassword=changeit
Connecting Through the Proxy with DBeaver
Configure a connection to your Athena via the DataSunrise proxy. At the Driver properties tab, set ProxyHost and ProxyPort according to your Athena proxy’s settings.
Then test the connection. Now you should be able to connect to your Athena via the DataSunrise proxy.
Configuring Dynamic Masking
Having configured a proxy, you can create a Dynamic Masking Rule for Athena. This process is easy and is applicable to every database.
Below is an example of a table with real data and masked email addresses respectively:
Also, DataSunrise supports unstructured data masking for Athena due to NLP (Natural Language Processing). Unstructured masking enables you to mask sensitive data in any form and format. For example, it can be a Word document or a PDF file. Sensitive data will be masked with asterisks (*).
Dynamic Masking for Athena is based on query result sets modification. It means that a database receives an original query and returns original data too. Masking performs when the data is going through a DataSunrise proxy, and the client will get obfuscated sensitive data after that.