Sqoop Introduction
- Sqoop Introduction
1. What is Sqoop
Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Here is the Link of Sqoop Official website. http://sqoop.apache.org/
There are two parts of the Sqoop : Import and Export
2. Import
RDBMS ->hadoop, hive
1). First Try
bin/sqoop import \
--connect jdbc:mysql://<hostname>:3306/<databaseName> \
--username username \
--password password \
--table tableName
--target-dir /use/root/sqoop/....
--num-mappers 1 (assign map-reduce number, the default number is 4.
2). Saving the file in different format
--as-sequencefile (saving the data in Hadoop SequenceFile)
--as-avrodatafile (saving the data in Avro binary format)
--as-parquetfile (saving the data in parquet file)
3). Import assign column to hdfs
bin/sqoop import \
--connect jdbc:mysql://<hostname>:3306/<databaseName> \
--username username \
--password password \
--table tableName
--target-dir /user/root/sqoop/imp_my_user_column \
--num-mappers 1 \
--columns id,account
It also can use –query ‘select id, account from my_user where $CONDITIONS’ \ to clean the data.
4). Compress the data
bin/sqoop import \
--connect jdbc:mysql://<hostname>:3306/<databaseName> \
--username username \
--password password \
--table tableName
--target-dir /user/root/sqoop/imp_my_Snappy \
--delete-target-dir \
--num-mappers 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--fields-terminated-by '\t'
Before run above command, It can use “bin/hadoop checknative” under hadoop to check which compress is supported.
5). Direct mode
–direct (is only supported in mysql and postgresql.) Bottom line: “direct mode” means different things for different databases.
6). hive import
bin/sqoop import \
--connect jdbc:mysql://<hostname>:3306/<databaseName> \
--username username \
--password password \
--table tableName \
--fields-terminated-by '\t' \
--delete-target-dir \
--num-mappers 1 \
--hive-import \
--hive-database default \
--hive-table user_hive
3. Export
hadoop, hive -> RDBMS
1). Export mysql table
bin/sqoop import \
--connect jdbc:mysql://<hostname>:3306/<databaseName> \
--username username \
--password password \
--table tableName \
--export-dir /user/root/sqoop/exp/user/ \
--num-mappers 1
--input-fields-terminated-by '\t'
4. Scripting
Sqoop also supports script for all the command
bin/sqoop –options-file /opt/datas/sqoop-import-hdfs.txt(script name)