Masking Unstructured Text on AWS S3
Data in Clouds
We are living in a world where data is one of the most valuable assets. And the IT industry is constantly developing ways of storing this data in the most convenient way.
Storing data in clouds is one of the most popular ways of storing data. We all have heard and using such platforms as Amazon Web Services, Alibaba OSS, Minio, etc.
However, if data tends to be stored in clouds, hackers will be attacking these storages. Database owners may be thinking that their sensitive data is completely safe there. Let’s discuss, if it’s completely true.
Security is a shared responsibility between the cloud provider and the customer in the cloud: AWS manages the security of the cloud, and customers are responsible for managing security in the cloud.
However, there are types of documents that are hard to protect as the data inside is just a plain text if we talk about unstructured texts, CSV, XML and JSON files. DataSunrise allows you to control access to these files and mask its content if necessary.
Masking Possibilities
XML
XML has found an extremely wide application in numerous and various programs and devices to handle, structure, store, transmit and display data online. No wonder that that whatever we keep online using XML is extremely vulnerable to leaks and hacking.
Below you can see how an XML file protected by DataSunrise looks.
<people_test> <record> <id>1</id> <first_name>********</first_name> <last_name>*****</last_name> <email>[email protected]</email> <gender>Male</gender> <ip_address>181.236.58.217</ip_address> </record> <record> <id>2</id> <first_name>*******</first_name> <last_name>******</last_name> <email>[email protected]</email> <gender>Male</gender> <ip_address>201.187.144.70</ip_address> </record> <record> <id>3</id> <first_name>*******</first_name> <last_name>****</last_name> <email>[email protected]</email> <gender>Female</gender> <ip_address>113.21.227.26</ip_address> </record> </people_test>
As you can see, we have hidden sensitive data first name and last name. Using the XmlPath in DataSunrise in the tabular form you can specify the XML tags to be masked. To mask all data, leave the XmlPath field empty. After that you can choose the masking method and mask value.
JSON
JSON stands for JavaScript Object Notation. Nowadays it is a very popular way of exchanging data between a browser and server. The exchanged data can be only text. JSON can be also used for storing data, but in this case data is also stored in the text form. When masking JSON files using DataSunrise in the jsonPath field in the tabular form you can specify different attributes whose values to hide. If you leave the jsonPath field blank, then all values will be masked. As you can see below we have decided to mask data “first_name” and “last_name” values.
[ { "id":1, "first_name":"masked", "last_name":"masked", "email":"[email protected]", "gender":"Male", "ip_address":"252.132.213.37", "date":"2019-08-24" }, { "id":2, "first_name":"masked", "last_name":" masked", "email":"[email protected]", "gender":"Female", "ip_address":"184.85.69.129", "date":"2019-07-23" }, { "id":3, "first_name":"masked", "last_name":"masked", "email":"[email protected]", "gender":"Female", "ip_address":"16.195.117.101", "date":"2020-03-13" } ]
CSV
CSV is a special type of file with a special extension which saves data in a tabular format. One peculiarity of CSV files is that they are plain text. Below you can see how data looks in a masked CSV file. As you can see a lot of sensitive data has been masked: IDs, last names, e-mails and IP addresses. If you mask your CSV file using DataSunrise, you need to specify column numbers, then choose the masking method and mask value. In the picture below we are masking columns 1 (IDs), columns 3 (last name), columns 4 (e-mails) and column 6 (IP addresses).
id first_name last_name email gender ip_address * Gilfoyle ********* ***** Female ********** * Chilcotte ********* ***** Male ********** * Terrell ********* ***** Male ********** * Pearle ********* ***** Female ********** * Kits ********* ***** Male ********** * McAlpine ********* ***** Male **********
Unstructurued Text
The unstructured text (data) doesn’t have a pre-defined data model or is not organized in a pre-defined manner. Unstructured data is usually text-heavy, but may contain dates, numbers and other sensitive data. Unstructured data lacks metadata and cannot readily be indexed or mapped. Below is an example how DatSunrise can mask an unstructured text. As you can see, sensitive data is masked. Data to be masked is taken from DataSunrise built-in dictionaries (Lexicon).
Procedure Findings. The patient, **************, is a ** year old male born on October *, ****. He has a * mm sessile polyp that was found in the ascending colon and removed by snare, no cautery. *******'s address is ** *********. ************ *****. His SSN is **********. He experienced the polyp after getting out of his blue ************ with a license number of WDR-***. We were able to control the bleeding. Moderate diverticulosis and hemorrhoids were incidentally noted. Recurrent GI bleed of unknown etiology; hypotension perhaps secondary to this but as likely secondary to polypharmacy. He reports first experiencing hypotension while eating queso ***********.
DataSunrise Masking Rule for AWS S3
To mask data dynamically using DataSunrise you need to create a database instance, that is to specify what database you want to protect. In the picture below you can see a list of database instances. An AWS S3 database is on that list. Click Add New if you want to create a new database instance.
To set up a masking rule you need to go to the Masking section of the UI and select Add Rule
Specify all necessary information about a new rule in the window that pops up and scroll down to the bottom of the page.
In the Masking Settings section you can choose what type of document you want to mask. It can be either CSV, XML, JSON or unstructured text.
Then depending on your needs, tick the type of document you want protected in your S3 bucket. This article will guide you through 4 types of documents available and the first is CSV files.
XML
In the picture below we want to protect an XML file and put a tick near this type of files.After that you need to specify the full file name in our S3 bucket in the format shown below.
CSV
In the picture below we want to protect a CSV file and put a tick near this type of files. After that cliсk “Add File” and specify a CSV file in our S3 bucket we want protected.
Now scroll down specify the masking method and masking value (asterisk in the picture). After that click Save Rule to save and activate the new rule.
JSON
If you want to protect JSON, you need to choose this option and specify the full file name in a format shown below. Click Save Rule to activate the rule.
Unstructured Text
If you want to mask an unstructured text file, choose this option and enter the full file name in the format as shown in the picture below and click Save Rule to save and activate the rule.
Conclusion
DataSunrise Database Security Suite is a very powerful tool to protect your data both on-prem and in the cloud. Now you have a unique opportunity to download your trial version of DataSunrise and see how much it can do to make your sensitive data protected inside XML, JSON, CSV files and unstructured texts.