Lua script discovers sensitive data in JSON files
When working with the Data Discovery feature, DataSunrise enables you to use a variety of prebuilt search filters for various types of sensitive data. But is there a way of searching for unique data? Yes, to accomplish this task you can use Lua. The use of Lua scripts for Data Discovery enables searching for literally any text-type values not covered by the existing templates.
This article describes how you can use Lua to locate the database columns of interest in JSON files. A dedicated script is used to do that so you can base your own search on the algorithm described below. Note that this can be done not only with JSON but with any type of file – you just need to create an appropriate Lua script.
You can copy the script used in this article here:
-- Specify values you want to discover in sensitive_from_json list -- e.g. {"data","id","name"} sensitive_from_json = {"id", "data"} -- valStr will contain JSON as text local valStr = tostring(columnValue) local valStrLen = string.len(valStr) -- Function to get the length of a table local function tablelength(T) local count = 0 for _ in pairs(T) do count = count + 1 end return count end -- Get the count of elements in sensitive_from_json list local count = tablelength(sensitive_from_json) -- Identify if the column contains JSON formatted data if string.sub(valStr, 1, 1) == '{' and string.sub(valStr, valStrLen, valStrLen) == '}' then for i = 1, count do -- If JSON does contain at least 1 desired value, return 1, else 0 if string.find(valStr, '"' .. tostring(sensitive_from_json[i]) .. '":') then return 1 end end return 0 else return 0 end
First, create your Lua script for searching your own data of interest. Note that the particular script we created for this article among other things checks if the processed file is formatted like a JSON file. For other file types, you should use other validation algorithms. We fill in the required values in the script. For your convenience, we left some comments there.
So, our script is ready for processing and we can go to the DataSunrise’s Web Console.
We navigate to Data Discovery -> Information Types and create a new Information Type.
We Add a new Attribute and in the attribute’s settings, select Column Data. In the Column Data Type, we select Strings Only. In the Search Method, we select Lua Script.
Then we click Edit Lua Script for the script’s code. We paste our script into the Script field and save it.
Now we can create a new Data Discovery task. In the Search Filters subsection, we select Information Types and select our Information Type to use for discovery.
To run the task we have to select Manual Startup Frequency, press Apply button to save the changes and press Start Now to run the task.
If the task has been successful we can view the results by clicking on the Show button. This will show that database objects that have sensitive data in JSON format.
In conclusion, leveraging Lua scripts within DataSunrise’s Data Discovery feature significantly enhances your ability to identify unique and sensitive data beyond the scope of prebuilt filters. By following the outlined steps, you can create custom scripts tailored to search for specific text-type values in various file formats, including JSON. This method not only broadens the range of searchable data but also provides a flexible and powerful tool for database administrators seeking to protect sensitive information. The ease of integrating these scripts through DataSunrise’s Web Console and the subsequent ability to automate and view discovery tasks streamline the process, making it an efficient solution for comprehensive data security management.