I have a bash script that computes the md5hash of every file inside directory recursively and I
output this into a text file (Column1 -- Hash Column2 -- path of the file). Now I want for each row of column one of the text file ,query the mysql database that contains a table of all hashes, filename. If its present, I want to delete the file. How do I query from a bash script to achieve this? The mysql database contains 2million entries.
##!/bin/bash
##BINARIES
MD5DEEP=$(which md5deep)
OPTIONS="-rl"
WORKDIR="/scripts/test"
##CREATE HASH FOR EACH FILES
myhash=`$MD5DEEP $OPTIONS $WORKDIR`
echo "$myhash"
The output of the text file is as shown:
f59f1294e66775ef47a28ef0cac8229a /scripts/test/t1
44d88612fea8a8f36de82e1278abb02f /scripts/test/eicar.com.txt
Now,I want to compare column1 value of each row of this text file with the database of hashes,if its present then,I want to delete that file.Can somebody please provide some guidance here?
Related
I currently have some HIVE code that creates a table where I declare all the column names and type. Then I use "load data inpath" to load a CSV into that table so I can then join to my database tables. The problem lies in that sometimes the CSV columns maybe in a different order. I can not control that as it is sent to me from a different source. I am wondering if there is a way to create the temp table I do daily without declaring the column names and just allow it to read from the CSV. This would allow me to not have to manually review the file every morning to check the columns are in the right order?
Because the column order keeps changing, as a first step you could use a shell script to read the header columns and generate a create table script for the temp table. Thereafter,execute the generated create table string and load the file into the temp table. From temp table, you can then load it to the target table.
A sample bash script to give you an idea.
#!/bin/bash
#File check
if [ -s "$1" ]
then echo "File $1 exists and is non-empty"
else
echo "File $1 does not exist or is empty"
fi
create_tbl_cmd="create table tblname ("
#Get fields from file header
fields=`head -1 $1`
#add string as the datatype for each field
fields="${fields//,/ string,} string)"
create_table_cmd="$create_tbl_cmd$fields"
#echo $create_table_cmd
#Execute the $create_table_cmd using Beeline or Hive command line invoking the necessary command
#Execute the Load into temp table
#Execute the Insert from temp table to target table
Execute the bash script above with the csv file argument as
bash scriptname.sh filename.csv
I am bulk inserting values from a csv file into my access table. Things were working fine,till today I encountered this problem where access inserts all the values expect for one field called BN1. It simply leaves this column balnk when the data is non numeric. This is the batch name of products and in the design the field type is memo (legacy .mdb file so cant change it).
My sample data:
DATE,TIME,PN1,BN1,CH0,CH1,CH2
2019-02-18,16:40:05,test,prompt,0,294,0
2019-02-18,16:40:14,test,1,700,294,0
So in the above data the first row is inserted with a blank value for prompt where as the 2nd row is inserted properly with BN1 as 1.
My code to insert the data:
INSERT INTO Log_143_temp ([DATE],[TIME],PN1,BN1,CH0,CH1,CH2
) SELECT [DATE],[TIME],PN1,BN1,CH0,CH1,CH2
FROM [Text;FMT=Delimited;DATABASE=C:\tmp].[SAMPLE_1.csv]
The path and the file names are correct else it wouldn't have inserted any value
Hi here is how I solved the issue,
Changed the query to bulk insert to,
INSERT INTO Log_143_temp ([DATE],[TIME],PN1,BN1,CH0,CH1,CH2
) SELECT [DATE],[TIME],PN1,BN1,CH0,CH1,CH2
FROM [Text;FMT=CSVDelimited;HDR=Yes;DATABASE=C:\tmp].[SAMPLE_1.csv]
Then add a file named schema.ini in the folder containing the csv file to be imported.
Contents of the schema.ini file,
[SAMPLE_1.csv]
ColNameHeader=True
Format=CSVDelimited
DateTimeFormat=yyyy-mm-dd
Col1="DATE" Text
Col2="TIME" Text
Col3="PN1" Text
Col4="BN1" Text
Col5="CH0" Double
Col6="CH1" Double
Col7="CH2" Double
Now the csv files get imported without any issue.
For additional info on schema.ini visit the following link,
https://learn.microsoft.com/en-us/sql/odbc/microsoft/schema-ini-file-text-file-driver?view=sql-server-2017
I have two txt files containing Json data available in Linux system.
I have created respective tables in Oracle NoSql for these two files.
Now, I want to load this data in to created table in Oracle NoSql Database.
Syntax:
put table -name <name> [if-absent | -if-present ]
[-json <string>] [-file <file>] [-exact] [-update]
Explanation:
Put a row into the named table. The table name is a dot-separated name with the format table[.childTableName]*.
where:
-if-absent
Indicates to put a row only if the row does not exist.
-if-present
Indicates to put a row only if the row already exists.
-json
Indicates that the value is a JSON string.
-file
Can be used to load JSON strings from a file.
-exact
Indicates that the input JSON string or file must contain values for all columns in the table and cannot contain extraneous fields.
-update
Can be used to partially update the existing record.
Now, I am using below command to load:
kv-> put table -name tablename -file /path-to-folder/file.txt
Error handling command put table -name tablename -file /path-to-folder/file.txt: Illegal value for numeric field predicted_probability: 0.0. Expected FLOAT, is DOUBLE
kv->
I am not able to find the reason. Learned members, Please help.
Thank You for helping.
Yeah, I solved it. Actually there was a conflict between table data type and json string data type. Later I realized this.
Thanks
I want to upload csv data into BigQuery. When the data has different types (like string and int), it is capable of inferring the column names with the headers, because the headers are all strings, whereas the other lines contains integers.
BigQuery infers headers by comparing the first row of the file with
other rows in the data set. If the first line contains only strings,
and the other lines do not, BigQuery assumes that the first row is a
header row.
https://cloud.google.com/bigquery/docs/schema-detect
The problem is when your data is all strings ...
You can specify --skip_leading_rows, but BigQuery still does not use the first row as the name of your variables.
I know I can specify the column names manually, but I would prefer not doing that, as I have a lot of tables. Is there another solution ?
If your data is all in "string" type and if you have the first row of your CSV file containing the metadata, then I guess it is easy to do a quick script that would parse the first line of your CSV and generates a similar "create table" command:
bq mk --schema name:STRING,street:STRING,city:STRING... -t mydataset.myNewTable
Use that command to create a new (void) table, and then load your CSV file into that new table (using --skip_leading_rows as you mentioned)
14/02/2018: Update thanks to Felipe's comment:
Above comment can be simplified this way:
bq mk --schema `head -1 myData.csv` -t mydataset.myNewTable
It's not possible with current API. You can file a feature request in the public BigQuery tracker https://issuetracker.google.com/issues/new?component=187149&template=0.
As a workaround, you can add a single non-string value at the end of the second line in your file, and then set the allowJaggedRows option in the Load configuration. Downside is you'll get an extra column in your table. If having an extra column is not acceptable, you can use query instead of load, and select * EXCEPT the added extra column, but query is not free.
Hello and thank you for take your time reading my issue.
I have the next issue with Cassandra cqlsh:
When I use the COPY command to load a .csv into my table, the command prompt never finishes the executing and loads nothing into the table if I stop it with ctrl+c.
Im using .csv's files from: https://www.kaggle.com/daveianhickey/2000-16-traffic-flow-england-scotland-wales
specifically from ukTrafficAADF.csv.
I put the code below:
CREATE TABLE first_query ( AADFYear int, RoadCategory text,
LightGoodsVehicles text, PRIMARY KEY(AADFYear, RoadCategory);
Im trying it:
COPY first_query (AADFYear, RoadCategory, LightGoodsVehicles) FROM '..\ukTrafficAADF.csv' WITH DELIMITER=',' AND HEADER=TRUE;
This give me the error below repeatedly:
Failed to import 5000 rows: ParseError - Invalid row length 29 should be 3, given up without retries
And never finishes.
Add that the .csv file have more columns that I need, and trying the previous COPY command with the SKIPCOLS reserved word including the unused columns does the same.
Thanks in advance.
In cqlsh COPY command, All column in the csv must be present in the table schema.
In your case your csv ukTrafficAADF has 29 column but in the table first_query has only 3 column that's why it's throwing parse error.
So in some way you have to remove all the unused column from the csv then you can load it into cassandra table with cqlsh copy command