The Teradata Connector for Hadoop (TDCH) provides scalable, high performance bi-directional data movement between the Teradata database system and Hadoop system. Some new features included in this release include:. Added an access lock option for importing data from Teradata to improve concurrency. If one chooses to use lock-for-access, the import job will not be blocked by other concurrent accesses against the same table. Added the support for importing data into an existing hive partitioned table.
Allow a Hive configuration file path to be specified by the -hiveconf parameter, so the connector can access it in either HDFS or a local file System. This feature would enable users to run hive importor/export jobs on any node of a Hadoop cluster (see section 8.5 of the REAME file for more information). With Teradata Database Release 14.10, a new split.by.amp import method is supported (see section 7.1(d) of the README file for more information). Some problems fixed in this release include:. Inappropriate exceptions reported from a query-based import job. Only the split.by.partition method supports a query as an import source. A proper exception will be thrown if a non split.by.partition import job is issued with the 'sourcequery' parameter.
One gets an error when the user account used to start Templeton is different from the user account used by Templeton to run a Connector job. A time-out issue for large data import jobs. In the case of a large-size data import, the Teradata database may need a long time to produce the results in a spool table before the subsequent data transfer.
If this exceeds the time-out limitation of a mapper before the data transfer starts, the mapper would be killed. With this fix, the mapper would be kept alive instead. A timeout issue for export jobs using internal.fastload. The internal.fastload export method requires synchronization of all mappers at the end of their execution. If one mapper finishes its data transfer earlier than some others, it has to wait for other mappers to complete their work.
If the wait exceeds the time-out of an idle task, the mapper would be killed by its task tracker. With this fix, that mapper would be kept alive instead. Fix the limitation that the user should have authorization to create local directory while executing Hive job on one node without Hive configuration (hive-site.xml) file.
Before the bug fixing, the TDCH needs to copy the file from HDFS to local file system. Case-sensitivity problems with the following parameters: '-jobtype', '-fileformat', and '-method'. With this fix, values of these parameters do not have to be case-sensitive any more. Incorrect delimiters used by an export job for Hive tables in RCFileFormat.
For more detailed information on the Teradata Connector for Hadoop, please see the Tutorial document in the article as well as the README file in the appropriate TDCH. The Tutorial document mainly discusses the TDCH (Command Line Edition). The download packages are for use on commodity hardware. For Teradata appliance hardware, it will be distributed with the appliance. TDCH is supported by Teradata CS in certain situations where the user is a Teradata customer. For more information about Hadoop Product Management (PM), Teradata employees can go to.
Thanks for the update, glad to see active work being done on TDCH. If you are taking suggestions for the next release. Currently the table loaded cannot be more than 24 charaters due to the six characters added to it ERR1 and ERR2 for the load jobs, this is a big contraint where there are already tablenames greater than 24 characters. Work with JDBC team to provide option to specify error databasename and error tablename for fastload. (important to have) 2. Samsung yp-s3 usb drivers for mac. Support for queryband for the entire process and in specific for the load and export operators as TASM regulates the number of sessions in most teradata shops. (important to have) 3.
Ability to provide path where users can have pre and post load/export SQL. (nice to have).