DataSunrise Achieves AWS DevOps Competency Status in AWS DevSecOps and Monitoring, Logging, Performance

Database Audit for Impala

Database Audit for Impala

Introduction: The Importance of Advanced Auditing Tools

Before starting to explore the specific details of database audit for impala, it's important to first understand the broader landscape of data breaches and cybersecurity risks that continue to evolve at a rapid pace. In 2024 alone, cybersecurity challenges escalated, with the global cost of cybercrime projected to exceed $10.5 trillion by 2025. Furthermore, according to 2024 research by Ponemon 55% of data security threats are caused by employees being careless or negligent, underscoring the critical need for robust automated auditing and security tools to mitigate such risks.

Apache Impala and Data Integrity

As organizations continue to collect, store, and analyze massive amounts of data, securing this data becomes paramount. Apache Impala, as one of the leading distributed SQL engines, plays a central role in handling large-scale data queries and analytics in real-time across massive datasets. However, the sheer scale and complexity of these operations make Impala deployments particularly vulnerable to security risks, especially when it comes to ensuring data integrity and meeting compliance requirements.

Overview of Impala Logging

Impala provides various logging mechanisms to track system events and user activities, supporting both operational monitoring and auditing needs. This article explores Impala’s built-in logging features, with a focus on impalad logs and audit logs, which are most useful for the purposes of audit and compliance.

Primary Daemons and Their Logs

Impala’s architecture includes multiple daemons, each responsible for specific functionalities, and they produce corresponding logs:

  • impalad Logs: Generated by the core daemon responsible for query execution. These logs include query-related system events, making them critical for operational monitoring and troubleshooting.
  • catalogd Logs: Capture metadata management activities such as loading and updates. Useful for debugging metadata-related performance issues.
  • statestored Logs: Document cluster coordination activities like membership changes and heartbeat messages. These logs help monitor cluster health and resolve communication or failover problems.

More information about these logs and log levels could be found on this official documentation page.

File System Logs

Impala can operate on various storage solutions, such as HDFS or Kudu. These systems generate their own logs that capture storage and access patterns, errors, and performance metrics. While these logs can provide additional insights, configuring and analyzing them requires platform-specific setup.

Database Audit for Impala with Built-In impalad Logs

For the purposes of this article, we’ll concentrate on Impala logs most relevant to auditing:

  1. impalad System Logs: Automatically generated by the core query execution daemon.
  2. impalad Audit Logs: Require explicit configuration at startup, but offer more information about query execution details.

impalad System Logs

Impalad system logs view on the web interface
Impalad System Logs View on Web Interface

By default, these logs are already enabled with their level usually set to ALL, which means the system collects information such as system status, connections, and SQL queries as well.

(Other log levels include ERROR, DEBUG, INFO, OFF for more information about them you can refer to the documentation)

impalad System Logs Constraints

However, while Impala’s system logs capture SQL queries by default, they do not provide much useful information for auditing purposes. The logs primarily focus on recording the execution of queries, without detailed insights into user activity or security-related events that could be valuable for auditing.

Example of impalad system logs on the web interface
Example of Impalad System Logs on Web Interface

Impala Auditing Facility

Impala’s system logs offer basic information about query executions, connections, and system events, but they lack the detailed audit trails necessary for compliance and security monitoring. To obtain these, you need to configure separate audit-specific logs. For more detailed guidance on configuring audit logs, refer to the official Impala auditing documentation.

Modifying Impala Startup Flags

Before enabling Impala’s audit logs, it is crucial to adjust the impalad startup flags to ensure audit logging features are activated. Specifically, you need to set the following flags:


--audit_event_log_dir=/var/lib/impala/audit
--max_audit_event_log_file_size=5000
--max_audit_event_log_files=10

Once impalad is started with these flags, the system will generate audit logs for queries at the specified location.

Verifying the Configuration

You can check that the audit log configuration was successfully applied by navigating to the specified directory:


ls -la /var/lib/impala/log/audit

Testing Audit Logs

To ensure the logs are functioning as expected, run some test queries, for example:


CREATE DATABASE sales;
CREATE TABLE sales.customers (customer_id INT, name STRING, email STRING);
INSERT INTO sales.customers VALUES (1, 'John Smith', '[email protected]')
INSERT INTO sales.customers VALUES (2, 'Alice Johnson', '[email protected]');
SELECT * FROM sales. Customers;
Testing SQL queries execution in Impala
Testing SQL Queries Execution in Impala

Viewing the Logs

Unlike system logs, Impala audit logs are generated in JSON format, making them easier to read and process. You can use jq to filter logs based on specific criteria, such as queries executed on a particular table:


jq '.[] | select(.sql_statement | test("sales.customers"))' /var/lib/impala/audit/impala_audit_event_log_1.0*
Resulting output from audit logs in Impala
Audit Logs Resulting Output in Impala

Summary: Impala System Logs vs. Audit Logs

For audit purposes, Impala primarily provides two types of logs: system logs (impalad logs) and audit logs, each with a distinct purpose.

System LogsAudit Logs
Capture basic system information, such as connections and executed SQL queries.Capture detailed user actions, including SQL statements, user info, timestamps, and session details.
Lack detailed user activity or security data, not designed for auditing or compliance.Geared toward security auditing and compliance, stored in JSON format.
Stored in plain text.Stored in structured JSON format, easier to process for audit purposes.
Focused on system operations and troubleshooting.Focused on user actions, security, and compliance.

Limitations

Both system and audit logs have limitations:

  • System Logs: Track query execution and basic system events, but lack security details and user context. Their plain text format complicates analysis.
  • Audit Logs: Capture detailed user activity but only for successfully parsed SQL operations, missing system events and non-SQL activities. While stored in JSON, they would still require additional tools for efficient filtering and analysis.

Enhancing Database Audit for Impala: DataSunrise Solutions

Creating audit rules for Impala in DataSunrise
Creating Impala Audit Rules in DataSunrise

Unlike Impala's built-in logging and auditing capabilities, DataSunrise offers a sophisticated, scalable solution tailored for modern compliance, real-time monitoring, and advanced security needs. By adopting DataSunrise, organizations can elevate their auditing strategies while maintaining optimal performance and meeting stringent regulatory requirements.

Key Features of DataSunrise

  • Real-Time Monitoring: Track database activities, user interactions, and system events in real-time. Administrators can proactively detect anomalies and respond to potential threats instantly, ensuring better security outcomes.

  • Comprehensive Audit Logging: Record detailed logs of user activities, including SQL queries, session details, and system events. Each entry captures critical information like timestamps, user identities, query text, and affected database objects for a complete audit trail.

Viewing transactional trails for Impala in DataSunrise
Viewing Impala Transactional Trails in DataSunrise
  • Advanced Threat Detection: Leverage machine learning and user behavior analytics to identify suspicious patterns, unauthorized actions, or potential breaches. These insights empower organizations to fortify their database security effectively.

  • Automated Compliance Reporting: Simplify compliance by generating reports for standards like GDPR, HIPAA, and PCI DSS. With scheduled assessments and template reporting, regulatory adherence becomes more efficient and less resource-intensive.

Generating reports for Impala in DataSunrise
Generating Impala Reports in DataSunrise
  • Customizable Audit Rules: Define precise audit rules tailored to organizational needs. DataSunrise enables tracking specific user activities or sensitive data access with flexible conditions and alerts, streamlining compliance and security practices.

  • Cross-Platform Database Support: Supporting over 40 platforms, including Impala, DataSunrise provides a consistent auditing and database security framework across diverse environments, making it a robust and versatile choice for enterprises.

Multiple database instances connected in DataSunrise
Multiple Database Instances Connected in DataSunrise

Conclusion: Elevate your Database Audit for Impala with DataSunrise

Upgrading to DataSunrise ensures superior database audit for impala by integrating advanced tools for monitoring, security, and compliance. With its cross-platform support, rich feature set and and flexible deployment options, DataSunrise empowers organizations to stay ahead in an evolving regulatory landscape while safeguarding their databases.

Experience the difference by scheduling an online demo today, and discover how DataSunrise can redefine auditing and security for your Impala environment.

Next

Real-Time vs Periodic Database Auditing

Real-Time vs Periodic Database Auditing

Learn More

Need Our Support Team Help?

Our experts will be glad to answer your questions.

General information:
[email protected]
Customer Service and Technical Support:
support.datasunrise.com
Partnership and Alliance Inquiries:
[email protected]