Data Transformation
This section provides an overview of the data transformation process. It covers the following information:
- The Data Transformation Process 
- Data Transformation Techniques 
- Protection Policies 
The Data Transformation Process
Data transformation is the process of transforming GuardPoint data from:
- Plaintext (clear) to encrypted with a key 
- Encrypted with a key to plaintext 
- Encrypted with one (old) key to encrypted with another (new) key 
Tip
Refer to CTE Agent Data Transformation for detailed instructions on how to use CTE on clients to transform GuardPoint data from clear text to encrypted text or from encrypted text to clear text.
This section covers the following information:
- Uses of Data Transformation 
- How CTE Protects Files 
- Components of the CTE Solution 
- Properties of Data Transformation 
Uses of Data Transformation
- Initial Data transformation: Encrypting GuardPoint data for the first time. 
- Rekeying: Changing the encryption key for GuardPoint data, also called key rotation. 
- Reverse Transformation: Decrypting GuardPoint data to clear text (not a common procedure). 
Note
Data transformation is complex and disruptive to data center operations. It is strongly recommended that you read and understand this section before you proceed to the initial data transformation and rekey.
How CTE Protects Files
The CTE Agent encrypts the data within a file one block at a time. It does not encrypt file metadata such a file’s name or size. This enables administrators to manage files without being able to view or modify their contents. Whether initially encrypting files, rekeying them, or decrypting them, the CTE Agent must therefore:
- Read each block of file data to be transformed. 
- Transform the block by encrypting, decrypting, or rekeying it. 
- Write the transformed block, either to its original location, or to an alternate one. 
Components of the CTE Solution
CTE protects data either at the file level or at the storage device level. CTE file-level protection consists of two main components:
- CipherTrust Manager - An appliance that manages a database of the file sets protected by CTE GuardPoints, the encryption keys that protect them, policies that specify access rights and encryption protections that can be applied to GuardPoints. The CipherTrust Manager is also a central point for logging events related to accessing protected files. 
- CTE Agents - Software components that run on clients with file sets to be protected. A CTE Agent manages the files behind a GuardPoint by enforcing the policy associated with it, and communicates data access events to the CipherTrust Manager for logging. - A GuardPoint is usually associated with a Linux mount point or a Windows volume, but may also be associated with a directory subtree. The CTE Agent sits between applications and the file system that clients files within the GuardPoint. It intercepts every file access request, and enforces the access and encryption rules defined in the GuardPoint’s policy.  
Properties of Data Transformation
For large file sets (hundreds of GBs or more), bulk transformation is time-consuming. Managing transformation time is important, because file set content must be frozen (inaccessible to applications) throughout the transformation process. Once transformation starts, it must continue until complete. So transformation time determines the window of data unavailability. Two major components contribute to transformation time:
- Number of Blocks of File Data - Because CTE must read, transform, and rewrite each block of file data, this component can be estimated by multiplying the number of file blocks to be transformed by the average read, transformation, and write time for a block. 
- Number of Files - Because the CTE Agent transforms data file by file, each file must be "looked up," opened, and closed during transformation, using underlying file system mechanisms. This typically requires multiple disk accesses. Therefore, file sets that consist of many small files, per file overhead, can actually exceed file block transformation time. - Other factors, such as file system fragmentation, and load from concurrent applications, may also affect transformation time. Mainly, the number of blocks and number of files to be transformed are fundamental because they cannot be reduced or eliminated. 
Data Transformation Techniques
Two methods to initially encrypt and rekey files are:
- The Copy/Restore Method: Using the operating system file copy utility, the client administrator can copy unprotected files to a location protected by a CTE GuardPoint with a standard/production policy. 
- The CTE Dataxform Utility Method: Every CTE Agent includes a utility program that can encrypt or transform protected files. The - dataxformutility encrypts, rekeys, or decrypts data in-place. Refer to the CTE Data Transformation Guide for details.
These methods have advantages and limitations that make them suitable in different scenarios. These are discussed in the subsequent sections.
Apart from encrypting data, you can reverse the transformation, that is, decrypt the data to plaintext. To decrypt protected files, copy them to an unprotected location.
Note
CTE can also be configured to protect data at the disk level. For data protected in this way, only the copy transformation technique is available for encryption.
The Copy Method
Properties of the Copy Method
The copy method performs initial encryption, rekeying, and decryption by copying data from one directory, or GuardPoint, to another directory or GuardPoint.
- Initial Encryption - The client administrator encrypts a file set by copying it to a directory protected by a CTE GuardPoint with a standard policy. Encryption is transparent to the copy utility.  
- Rekeying Protected Data - Encrypted files protected by a CTE GuardPoint are rekeyed by copying them to a directory protected by another GuardPoint with a different encryption key. Both decryption and re-encryption are transparent to copy utilities.  
- Decrypting Data by Copying - Decrypt a protected file set by copying files from their protected location to unprotected directories. The CTE Agent decrypts file blocks before delivery to the copy utility for rewriting.  - Caution - If the governing policy does not authorize the copy utility user to access data, CTE delivers encrypted file blocks to it. 
Advantages of the Copy Method
- Simplicity - After an Agent is installed and GuardPoints are activated on a client, the client’s administrator can encrypt, decrypt, or rekey file sets simply by copying them from one location to another. There are no procedures to learn, and no requirements to coordinate with the CipherTrust Manager Security Administrator. Data transformation is simply another routine administrative task. 
- Recoverability - If a copy-based transformation is interrupted, for example, by a power failure or a system crash, the transformation resumes at or prior to the point of interruption. This is because the source files remain available and can be recopied, overwriting files at the destination that may have been only partially re-encrypted. 
Limitations of the Copy Method
- Storage Resource Consumption - Copying a file set requires that both source and destination files exist simultaneously. Storage capacity sufficient for both must be available during initial encryption. For very large protected data sets, "extra" temporary storage may be a significant expense. However, a greater concern is likely to be the impact of moving production file sets as they are transformed. File data is unprotected while in the copy utility’s buffers. 
- Impact on Operating Procedures - Original and copied file sets have different path names and/or network addresses. After transformation, either both file sets must be renamed (the old path to a new name, and the new path to the old name), or applications must be adapted to process the transformed data set at the new directory. For a small data center with a few protected file sets, some combination of these options is usually practical. For data centers with hundreds of protected file sets, the administrative complexity and consequent chance of error make copying a complex option. 
The Restore Method
A variation of the copy method is to make a backup of the files for transformation and restore the backup to the destination location. This works because:
- Backing up data causes it to be read and decrypted. 
- Restoring data causes it to be written (re-encrypting it with an alternative key). 
- CTE protection is transparent to backup programs. 
This technique also creates a backup of the data set. However, a disadvantage is the time required to copy data twice (once from the source location to backup, and once from backup to destination location).
These considerations suggest that copying data to transform it is more suitable for initial encryption (and final decryption), and less so for rekeying. Additionally, the simplicity of recovering an interrupted transformation makes the copy/restore method useful in situations where the probability of interruption during transformation is significant.
The Dataxform Utility Method
The dataxform utility transforms data-in-place and contains two components:
- User mode that controls the overall operation 
- Kernel mode that transforms files block-by-block 

Advantages of the Dataxform Utility
Transforming data in place has two advantages:
- Minimal Storage Requirements - Because - dataxformtransforms files in place, where they reside, it does not require temporary file storage. However, the utility does need storage in which to create a list of files for transformation.
- Security - The period of time that the data transformed by the - dataxformutility appears in memory, outside the GuardPoint and therefore, unprotected, is shorter than with copying. This is significant for rekeying (compared to copying), which holds clear file data in memory between reading and rewriting. Moreover,- dataxformrequires coordination between the client and CipherTrust Manager Security Administrators, so that no one individual can subvert security during transformation.
Limitations of the Dataxform Utility
Offsetting the advantages of the dataxform utility method is the complexity of recovering from an interrupted dataxform run. Because dataxform transforms files in-place, data in a file undergoing transformation at the time of a failure may be only partly transformed. There is no way to determine which blocks have been transformed and which have not. These files must be recreated after the dataxform utility runs from a backup copy. The client administrator must:
- Determine (by examining the - dataxformlogs) which files may have been incompletely transformed.
- Delete them from the transformed file set. 
- Recreate them by selective copying from a backup. 
Summary
The following table summarizes the strengths and weaknesses of the two file set transformation methods.
| Factor | Copy Method | The dataxform Utility Method | 
|---|---|---|
| Temporary storage required | Equal to size of file set. | Sufficient to hold a list of path names of files in file set. | 
| Security | File data is unprotected while in copy utility’s buffers. | File data is never outside the CTE GuardPoint. | 
| Initial encryption | Files can be copied directly from source directory to a CTE-protected directory. | Files must be in a protected location before transformation. | 
| Operational impact | No access to files during transformation. Path names or operating procedures must be adjusted after transformation. | No access to files during transformation. | 
| Recoverability | Restart copy operation at, or prior to, point of failure. | Files undergoing transformation at the point of failure must be discovered from the dataxformlogs and restored from backup. | 
Protection Policies
The basic unit of CTE data protection policy application is the GuardPoint. GuardPoints are typically associated with file system mount points, but may also be associated with directory sub-trees.
Note
Nested mount points within a directory, or mount points protected by a GuardPoint, are also protected in Linux environments.

All files in the directory hierarchy, below a GuardPoint, are subject to the GuardPoint’s policy, which consists of rules that specify:
- Protected files: Filenames or filename patterns (example: *.dat) to which the policy applies. 
- Authorized users: User(s) group(s), and application(s) permitted to access the protected files. 
- Permissions: Actions permitted to users (example: create/delete, read/write, rename, decrypt). 
Policies also specify the name of an encryption algorithm and a key for encrypting protected files. For example, a policy might specify that all Excel workbooks protected by a GuardPoint be encrypted using an AES256 key called EXCEL-KEY. Additionally, only users in group 128 have access to the files. All other files that are not encrypted, are freely accessible to all users.
Types of Policies
CTE Agents use two types of policies:
- Initial Data Transformation - Dataxform policies contain the elements listed above, plus a data transformation key, used by the - dataxformutility to rekey file data. Transformation policies contain strict access control rules that prevent application and user access to files during transformation. CTE only uses- dataxformpolicies for the initial transformation. Afterwards, you replace it with a production policy.- The - dataxformutility operates on a per-GuardPoint basis. For initial encryption, the dataxform policy specifies a clear production key (meaning that the utility does not decrypt data because the data is unencrypted) and a new data transformation key to encrypt the data.
- Production/Standard - Production policies contain the elements listed above. They protect data within GuardPoint(s) during day-to-day IT operations. 
For decryption, the policy specifies a clear data transformation key (that is, the utility does not re-encrypt files as it rewrites them) and the current production key.
A rekeying transformation policy specifies both a current production ("old") key and a transformation ("new") key.