top of page

UDF : Enhanced Encryption outside of built-in Functions

  • faikdahbul
  • Mar 26
  • 3 min read

Security is an important facet in today's Business environment,

without the correct implementation it will expose business to significant risks such as data breaches, financial losses, reputational damage, and legal liabilities.


One of our experience working in Hadoop Environment with Hive UDF / Impala UDF, we've implemented an extra layer of security to help with the encryption / decryption process of data in Hadoop






In this use case we've created a service to help with the encryption / decryption process

  1. Hive / Impala access HDFS


  1. Hive / Impala UDF uses our function to encrypt HDFS data


  1. Hive / Impala UDF requests secret key + salt using our services


  1. Our Service API authenticates request from Hive / Impala UDF and fetch the corresponding secret key + salt


  1. Our Service API sends back the secret key + salt to Hive / Impala UDF


  1. Hive / Impala UDF decrypts the HDFS data using our function and secret key + salt then displays it in HUE


  1. External BI Tools request access to Hive / Impala through JDBC / ODBC


  2. Hive / Impala sends decrypted data to display in the BI tools


This use case that we addressed contributed to adding an additional layer of security for the data owner, ensuring that the data remains accessible only to its authorized owner.


Why our UDF Encryption solution?


Traditional encryption solutions in Hadoop ecosystems, such as Hive’s built-in aes_encrypt function or Data-at-Rest encryption using KMS/KTS, often fail to provide a true separation of duties. In these models, platform administrators or users with elevated privileges can still access encrypted data by retrieving keys or decrypting data directly in SQL queries.

Our Impala/Hive AES Encryption UDF eliminates this risk by enforcing external key management through an API-based model. This means:

  • Even platform administrators (superusers) cannot decrypt the data without API access.

  • Keys are never stored within Hive, Impala, or HDFS, ensuring better security against insider threats.

  • Key retrieval and encryption processes are tightly controlled by business entities, rather than platform providers.

This UDF is designed for enterprises that need stronger security guarantees while maintaining flexibility and performance across both Impala and Hive environments.


Key Advantages of Impala/Hive Encryption UDF


1. Compatibility with Both Impala and Hive

  • Fully compatible with both Apache Impala and Apache Hive, making it a unified encryption solution.

  • Ensures consistent encryption and decryption across different query engines, unlike native functions that may behave differently.

2. Enhanced Security with External Key Management

  • Keys and initialization vectors (IVs) are never stored in Impala, Hive, or HDFS.

  • Key retrieval happens securely via a REST API, supporting both HTTP and HTTPS with Basic Authentication.

  • Allows multiple API endpoints for better key distribution and security segmentation.

3. True Separation of Duties (Superusers Cannot Decrypt)

  • Platform administrators (Hadoop superusers like hdfs, impala, or hive) cannot decrypt data, since keys are managed externally.

  • The udf.properties file contains only encrypted credentials for key retrieval, not the actual encryption key.

  • Unlike Hive’s built-in aes_encrypt, this function ensures no sensitive keys are exposed in SQL logs or query history.

4. Seamless Integration with Cloudera Impala and Hive

  • Designed for high performance in Impala and Hive, with minimal overhead.

  • Works natively with Cloudera Data Platform (CDP), without requiring complex setup.

  • Encryption and decryption can be easily integrated into SQL queries.


Why EXPECC?


With over eight years of experience in implementing Hadoop and its components, including Hive UDF and Impala UDF, we have consistently provided clients with enhanced security for their daily operations. 

Our deep expertise and proven best practices ensures that we deliver reliable and effective solutions, tailored to meet the unique needs of our clients.



Comments


© 2024
PT. Expecomputindo. 

​"No animals were harmed in the making of this site"

bottom of page