
Static Data Masking for Scylla

Introduction
As organizations increasingly rely on distributed databases like ScyllaDB, ensuring data security becomes a top priority. Sensitive information such as personal identifiers, credit card details, and contact information must be protected from unauthorized access. One of the most effective ways to secure such data is through data masking.
Static Data Masking (SDM) involves creating a sanitized, non-reversible version of sensitive data for use in non-production environments. This approach allows developers, analysts, and testers to work with realistic datasets without exposing actual sensitive information. In this article, we explore how to implement ScyllaDB data masking using both native methods and advanced automated solutions like DataSunrise, a leading provider of security and compliance tools.
Why Data Masking for ScyllaDB is Essential
ScyllaDB is a high-performance NoSQL database known for its scalability and efficiency. However, it lacks built-in data masking capabilities. Without proper masking for ScyllaDB, organizations risk non-compliance with industry regulations such as:
- GDPR – Requires anonymization of personal data to protect user privacy.
- HIPAA – Mandates securing protected health information (PHI).
- PCI DSS – Enforces encryption and masking of payment card data.
By implementing data masking for ScyllaDB, organizations can mitigate risks associated with accidental data leaks and unauthorized access while ensuring compliance with these regulations.
Creating Sample Data in ScyllaDB
Before applying ScyllaDB data masking, we need sample data for testing. Below is a Python script that inserts mock customer records into ScyllaDB using the Faker library.
Generating Sample Data
import faker
from cassandra.cluster import Cluster
fake = faker.Faker()
def generate_data(n=10):
return [(fake.uuid4(), fake.name(), fake.email(), fake.phone_number(),
fake.credit_card_number(card_type="visa"), fake.address()) for _ in range(n)]
def connect_to_scylla():
session = Cluster(["127.0.0.1"]).connect("test_keyspace")
return session
def insert_data(session, data):
query = "INSERT INTO mock_data (customer_id, name, email, phone, credit_card, address) VALUES (?, ?, ?, ?, ?, ?)"
for entry in data:
session.execute(query, entry)
if __name__ == "__main__":
session = connect_to_scylla()
insert_data(session, generate_data(100))
How It Works
- Generates 100 records containing fake names, emails, phone numbers, credit card details, and addresses.
- Establishes a connection to a ScyllaDB instance running locally.
- Inserts the generated data into a
mock_data
table.
Implementing Static Data Masking in ScyllaDB
To mask sensitive customer data, we can create a sanitized version of the dataset using CQL.
CQL-Based Data Masking for ScyllaDB
CREATE TABLE test_keyspace.mock_data_masked AS
SELECT customer_id,
address,
'XXXX-XXXX-XXXX-' || substr(credit_card, -4) AS credit_card,
'XXX@' || substr(email, position('@' IN email)) AS email,
substr(name, 1, 1) || '***' AS name,
'XXX-XXX-' || substr(phone, -4) AS phone
FROM test_keyspace.mock_data;
Key Masking Techniques
- Credit card numbers retain only the last four digits.
- Emails display only the domain with an obfuscated username.
- Names reveal just the first letter.
- Phone numbers keep only the last four digits.
Although this approach is simple, it requires manual execution and does not support automatic updates.
Advanced Data Masking for ScyllaDB with DataSunrise
While creating duplicate masking tables can be effective for small projects, maintaining a reliable setup using only database queries can become difficult. This is where third-party solutions like DataSunrise offer a more efficient and scalable alternative.
Steps to Implement Data Masking for ScyllaDB with DataSunrise
Step 1: Add ScyllaDB to DataSunrise
First, add your ScyllaDB instance to DataSunrise using its web UI:

Step 2: Create an Object Group
Define an object group to identify and mask the necessary columns:

Step 3: Schedule Periodic Masking Tasks
Set up a scheduled task to scan for sensitive data based on the rules defined earlier. This ensures compliance with regulations such as GDPR and HIPAA:

Step 4: Define Static Masking Rules
Create a static masking rule that automatically sanitizes sensitive data. Select your database as the source and the target to perform in-place masking:

Advantages of Using DataSunrise for Data Masking in ScyllaDB
- Ease of Use – The DataSunrise web GUI simplifies configuration.
- Ready-to-Go Solution – It offers comprehensive security features beyond data masking.
- Scalability – Designed to support distributed databases like ScyllaDB, making it a reliable tool for complex environments.
In addition to data masking for ScyllaDB, DataSunrise provides compliance management and enhanced security. If you want a personalized review of its features, book an online demo. You can also download a trial version to explore its capabilities firsthand.
Conclusion
Data masking is crucial for safeguarding sensitive data while maintaining usability in non-production environments. While manual CQL-based masking provides a quick fix, DataSunrise offers a scalable, automated approach with advanced security, compliance, and auditing features.
By leveraging DataSunrise for data masking in ScyllaDB, organizations can ensure: – Continuous data protection against unauthorized access. – Automated compliance with industry regulations. – Reduced operational burden through seamless integration and automation.
Investing in a reliable data masking solution for ScyllaDB enhances both security and regulatory compliance, making it an essential strategy for modern enterprises.