I currently have some HIVE code that creates a table where I declare all the column names and type. Then I use "load data inpath" to load a CSV into that table so I can then join to my database tables. The problem lies in that sometimes the CSV columns maybe in a different order. I can not control that as it is sent to me from a different source. I am wondering if there is a way to create the temp table I do daily without declaring the column names and just allow it to read from the CSV. This would allow me to not have to manually review the file every morning to check the columns are in the right order?
Because the column order keeps changing, as a first step you could use a shell script to read the header columns and generate a create table script for the temp table. Thereafter,execute the generated create table string and load the file into the temp table. From temp table, you can then load it to the target table.
A sample bash script to give you an idea.
#!/bin/bash
#File check
if [ -s "$1" ]
then echo "File $1 exists and is non-empty"
else
echo "File $1 does not exist or is empty"
fi
create_tbl_cmd="create table tblname ("
#Get fields from file header
fields=`head -1 $1`
#add string as the datatype for each field
fields="${fields//,/ string,} string)"
create_table_cmd="$create_tbl_cmd$fields"
#echo $create_table_cmd
#Execute the $create_table_cmd using Beeline or Hive command line invoking the necessary command
#Execute the Load into temp table
#Execute the Insert from temp table to target table
Execute the bash script above with the csv file argument as
bash scriptname.sh filename.csv
Related
The output of my sql query has multiple columns and contains string values which contains space. I need to write bash script where in i have to read values into variable and then use it further in the script also insert them into another database.
When i store output into an array the string value gets split based on space and stored into different indexes in array. How can i handle this situation in bash script.
CMD="SELECT * FROM upload where upload_time>='2020-11-18 00:19:48' LIMIT 1;"
output=($(mysql $DBCONNECT --database=uploads -N --execute="$CMD"))
echo ${output[9]}
Output:
version test_id upload_time parser_result 25 567 2020-11-18 00:19:48 <p1>box crashed with exit status 0</p1>
The upload time "2020-11-18 00:19:48" gets stored in two indexes.
The more problematic is 'parser_result' value which is string. '<p1>box crashed with exit status 0</p1>' gets stored in different indexes splitting based on space.
${output[8]} contains '<p1>box'
${output[9]} contains 'crashed'
Database is very huge and i need to parse every row in it.
Since string value can be anything i am unable come up with generic code. What is the best way to handle this scenario. Am using bash scripting for the first time!! I have to use bash script since this script will run as a cron job inside docker container.
The fields are separated by TAB. Use that as your $IFS to parse the result.
IFS=$'\t' output=($(mysql $DBCONNECT --database=uploads -N --execute="$CMD"))
echo "${output[9]}"
If $DBCONNECT contains options separated with spaces, you need to do this in two steps, since it's using $IFS to split that as well.
result=$(mysql $DBCONNECT --database=uploads -N --execute="$CMD")
IFS=$'\t' output=($result)
echo "${ouptut[9]}"
I have two txt files containing Json data available in Linux system.
I have created respective tables in Oracle NoSql for these two files.
Now, I want to load this data in to created table in Oracle NoSql Database.
Syntax:
put table -name <name> [if-absent | -if-present ]
[-json <string>] [-file <file>] [-exact] [-update]
Explanation:
Put a row into the named table. The table name is a dot-separated name with the format table[.childTableName]*.
where:
-if-absent
Indicates to put a row only if the row does not exist.
-if-present
Indicates to put a row only if the row already exists.
-json
Indicates that the value is a JSON string.
-file
Can be used to load JSON strings from a file.
-exact
Indicates that the input JSON string or file must contain values for all columns in the table and cannot contain extraneous fields.
-update
Can be used to partially update the existing record.
Now, I am using below command to load:
kv-> put table -name tablename -file /path-to-folder/file.txt
Error handling command put table -name tablename -file /path-to-folder/file.txt: Illegal value for numeric field predicted_probability: 0.0. Expected FLOAT, is DOUBLE
kv->
I am not able to find the reason. Learned members, Please help.
Thank You for helping.
Yeah, I solved it. Actually there was a conflict between table data type and json string data type. Later I realized this.
Thanks
I am trying to scan a folder for new files and reading those files and inserting its content into database and then delete file from folder.Till here its working but the issue that the whole content is getting inserted in one field in database.
Below is the code:
inotifywait -m /home/a/b/c -e create -e moved_to |
while read path action file; do
for filename in `ls -1 /home/a/b/c/*.txt`
do
while read line
do
echo $filename $line
mysql -uroot -p -Bse "use datatable; INSERT INTO
table_entries (file,data ) VALUES ('$filename','$line'); "
done <$filename
done
find /home/a/b/c -type f -name "*.txt" -delete
done
Basically the files contains:name,address,contact_no,email.
I want to insert name from file to name field in database,address in address. In php we use explode to split data,what do i use in shell script ?
This would be far easier if you use LOAD DATA INFILE (see the manual for full explanation of syntax and options).
Something like this (though I have not tested it):
inotifywait -m /home/a/b/c -e create -e moved_to |
while read path action file; do
for filename in `ls -1 /home/a/b/c/*.txt`
do
mysql datatable -e "LOAD DATA LOCAL INFILE '$filename'
INTO TABLE table_entries (name, address, contact_no, email)
SET file='$filename'"
done
find /home/a/b/c -type f -name "*.txt" -delete
done
edit: I specified mysql datatable which is like using USE datatable; to set the default database. This should resolve the error about "no database selected."
The columns you list as (name, address, contact_no, email) name the columns in the table, and they must match the columns in the input file.
If you have another column in your table that you want to set, but not from data in the input file, you use the extra clause SET file='$filename'.
You should also use some error checking to make sure the import was successful before you delete your *.txt files.
Note that LOAD DATA INFILE assumes lines end in newline (\n), and fields are separated by tab (\t). If your text file uses commas or some other separator, you can add syntax to the LOAD DATA INFILE statement to customize how it reads your file. The documentation shows how to do this, with many examples: https://dev.mysql.com/doc/refman/5.7/en/load-data.html I recommend you spend some time and read it. It's really not very long.
I need to create a single MySQL table with information from multiple CSV files. These CSV files are parallel, but each contains a lot of extra columns that I don't need. Is there a way to create one MySQL table with select columns from multiple CSV files?
Do you mean something like this?
mysql_connect('localhost','root','root'); // connect your db
mysql_select_db('yourdatabase'); // select your database
$files = glob('*.csv'); // get the csv files
foreach($files as $file){ // Loop through the file and do a LOAD DATA INFILE
mysql_query("LOAD DATA INFILE '".$file."' INTO TABLE yourtable");
}
If you need to drop csv columns, you could do the following
$ cut -d, -f2-4,7-9 --complement temp.csv
1,5,6,10
1,5,6,10
1,5,6,10
1,5,6,10
1,5,6,10
1,5,6,10
1,5,6,10
The above removes all columns in range 2-4 and 7,9
Or you can load the .csv into excel and manually remove the columns.
I have a bash script that computes the md5hash of every file inside directory recursively and I
output this into a text file (Column1 -- Hash Column2 -- path of the file). Now I want for each row of column one of the text file ,query the mysql database that contains a table of all hashes, filename. If its present, I want to delete the file. How do I query from a bash script to achieve this? The mysql database contains 2million entries.
##!/bin/bash
##BINARIES
MD5DEEP=$(which md5deep)
OPTIONS="-rl"
WORKDIR="/scripts/test"
##CREATE HASH FOR EACH FILES
myhash=`$MD5DEEP $OPTIONS $WORKDIR`
echo "$myhash"
The output of the text file is as shown:
f59f1294e66775ef47a28ef0cac8229a /scripts/test/t1
44d88612fea8a8f36de82e1278abb02f /scripts/test/eicar.com.txt
Now,I want to compare column1 value of each row of this text file with the database of hashes,if its present then,I want to delete that file.Can somebody please provide some guidance here?