Big Data Data Stores
DDC supports two types of Big Data data stores:
Hadoop Cluster - Apache Hadoop provides a software framework for distributed storage and processing of big data by means of MapReduce.
Teradata (Teradata 14.10.00.02 and above)
Hadoop Cluster Considerations and Requirements
Nodes where data blocks distributed by Hadoop Distributed File System (HDFS) are stored are called DataNodes. DataNodes are treated as “slaves” in a Hadoop cluster.
A node that maintains the index of directories and files and manages data blocks stored on DataNodes is called a NameNode. A NameNode is treated as “master” in a Hadoop cluster.
To be able to scan a Hadoop cluster with HDFS, you must have:
A Target NameNode running Apache Hadoop 2.7.3, Cloudera Distribution for Hadoop (CDH), or similar.
A Proxy host running the Linux 3 Agent with database runtime components for Linux systems.
A valid Kerberos ticket if Kerberos authentication is enabled. Refer to, Generating Kerberos Authentication Ticket.
Teradata data stores require Teradata Tools and Utilities 16.10.xx to be installed on the Agent. These utilities are also mandatory:
- ODBC Driver for Teradata
You may have to restart the Agent after the installation.
A scan of a Teradata data store may create temporary tables named erecon_fexp_<YYYYMMDDHHMMSS><PID><RANDOM>. Do not remove these tables while the scan is in progress. They are automatically removed when a scan completes. If a scan fails or is interrupted by an error, the temporary tables may remain in the database. In this case, it is safe to delete the temporary tables.
Scanning of large binary objects is now supported for Teradata. For a list of binary file types supported for Teradata scans, refer to the table in Binary Large Objects.
Adding Big Data Stores
Use the Add Data Store wizard to add a big data type data store. Adding a Big Data data store involves the following steps:
1. Select Store Type
In the Select Store Type screen of the wizard select Big Data in the Select Data Store Category.
From the Select Database Type drop-down list select Hadoop Cluster or Teradata.
Click Next to go on to the Configure Connection screen.
2. Configure Connection
In the Configure Connection screen of the wizard, provide the following configuration details for your data store:
Hostname/IP - Specify Hostname/IP of the Hadoop cluster's active NameNode. Specify a valid hostname, IP address, or Uniform Resource Identifier (URI). The hostname must be longer than two characters. For example, if your HDFS share path is
hdfs://hadoop-server-name/share-name, the host name of the Name Node is
share-name. This is a mandatory field.
Port - The port on which the NameNode is accessed. Default is 8020. This is a mandatory field.
Hostname - Specify a valid Hostname of the Teradata server. The hostname must be longer than two characters. This is a mandatory field.
Port - Default 1025. This is a mandatory field.
User - The name of the Teradata user.
Due to known Teradata limitations DDC cannot use the following internal Teradata users to scan:
DBC, tdwm, LockLogShredder, External_AP, TDPUSER, SysAdmin, SystemFe, TDMaps, Crashdumps, Sys_Calendar, viewpoint, console.
Password - The password of the Teradata user.
Scroll down to the Agent Selection section, and in the Add Label: field, add an agent label, by entering a label or removing and existing label. Agent labels represent the agent capabilities.
Click Next to go to the General Info screen.
3. General Info
Configure the General Info part per the information in General Info.
Click Next to go to the Add Tags & Access Control screen.
4. Add Tags & Access Control
Configure the Tags & Access Control par per the information in Tags & Access Control.
Click Save. The newly created data store appears on the Data Stores page. By default, data stores are displayed in alphabetic order by name. Depending on the number of entries per page, you might need to navigate to other pages to view the newly created data store.
Generating Kerberos Authentication Ticket
To generate a Kerberos authentication ticket for your HDFS cluster, run these commands in a terminal on the designated Proxy Agent host.
To check if a valid Kerberos ticket has been issued for the principal user, do:
To generate a Kerberos ticket as a principal user:
# kinit <username>@<domain> kinit DDCuser@example.com