Print column names only once in shell script

Print column names only once in shell script - mysql

Here is my shell script and It is working fine without any errors. But i want to get output differently.
Script:
#!/bin/bash
DB_USER='root'
DB_PASSWD='123456'
DB_NAME='job'
Table_Name='status_table'
#sql=select job_name, date,status from $DB_NAME.$Table_Name where job_name='$f1' and date=CURDATE()
file="/root/jobs.txt"
while read -r f1
do
mysql -N -u$DB_USER -p$DB_PASSWD <<EOF
select job_name, date,status from $DB_NAME.$Table_Name where job_name='$f1' and date=CURDATE()
EOF
done <"$file"
Source Table:
mysql> select * from job.status_table
+---------+----------+------------+-----------+
| Job_id | Job Name | date | status |
+---------+----------+--------+---------------+
| 111 | AA | 2016-12-01 | completed |
| 112 | BB | 2016-12-01 | completed |
| 113 | CC | 2016-12-02 | completed |
| 112 | BB | 2016-12-01 | completed |
| 114 | DD | 2016-12-02 | completed |
| 201 | X | 2016-12-03 | completed |
| 202 | y | 2016-12-04 | completed |
| 203 | z | 2016-12-03 | completed |
| 111 | A | 2016-12-04 | completed |
+---------+----------+------------+-----------+
Input text file
[rteja#server0 ~]# more jobs.txt
AA
BB
CC
DD
X
Y
Z
A
ABC
XYZ
Output - Supressed coumn names
(mysql -N -u$DB_USER -p$DB_PASSWD <<EOF)
[rteja#server0 ~]# ./script.sh
AA 2016-12-01 completed
BB 2016-12-01 completed
Output - without Suppressed column names, output printing the columns names for every loop iteration.
(mysql -u$DB_USER -p$DB_PASSWD <<EOF)
[rteja#server0 ~]# ./script.sh
job_name date status
AA 2016-12-01 completed
job_name date status
BB 2016-12-01 completed
Challenges:
1. Want to print column names only once in output and the result i want to store in CSV file.
2. I don't want to expose password & username in code to everyone. Is there way to hide like i heard we can create environmental variables and call it in the script. And we can set the permissions for the environmental variable file to prevent everyone to access it, and only our should be able to access it.

Rather than executing a select query multiple times you can run a single query as:
job_name in ('AA','BB','CC'...)
To do that first read complete file in an array using mapfile:
mapfile -t arr < jobs.txt
Then format the array values into a list of values suited for IN operator:
printf -v cols "'%s'," "${arr[#]}"
cols="(${cols%,})"
Display your values:
echo "$cols"
('AA','BB','CC','DD','X','Y','Z','A','ABC','XYZ')
Finally run your SQL query as:
mysql -N -u$DB_USER -p$DB_PASSWD <<EOF
select job_name, date,status from $DB_NAME.$Table_Name
where job_name IN "$cols" and date=CURDATE();
EOF
To securely connecting to MySQL use login-paths (.mylogin.cnf)
As per MySQL manual:
The best way to specify server connection information is with your .mylogin.cnf file. Not only is this file encrypted, but any logging of the utility execution does not expose the connection information.

Related

MYSQL client issue - Having an access denied issue connecting via Bash Script (WSL2-Ubantu to a localhost windows MySQL DB 8.0 instance

I seem to be having an issue when running a bash script in windows WSL2 / Ubantu, but have no issue if I run the mysql command line with the same params.
This is a fairly straight forward script, albeit , I am new to bash scripting.
#!/bin/bash
PROPERTY_FILE="MySQLDB_glenn.properties"
SCRIPTS_DIR="/home/glenn/scripts/"
echo $SCRIPTS_DIR$PROPERTY_FILE
WSL_HOST_IP=$(ipconfig.exe | awk '/WSL/ {getline; getline; getline; getline; print substr($14, 1, length($14)-1)}')
ALT_WSL_HOST=$WSL_HOST_IP
function atestfunction()
{
myTest=$(echo "This is a test")
echo $myTest
}
function getProperty()
{
PROP_KEY=$1
PROP_VALUE=$(cat $SCRIPTS_DIR$PROPERTY_FILE | grep $PROP_KEY | cut -d'=' -f2)
echo $PROP_VALUE
}
echo "# Reading properties from $PROPERTY_FILE = " $SCRIPTS_DIR$PROPERTY_FILE
DB_USER=$(getProperty "db.username")
DB_PASS=$(getProperty "db.password")
DB_HOST=$(getProperty "db.hostname")
DB_DEFAULT=$(getProperty "db.defaultdb")
echo $DB_USER
echo $DB_PASS
echo $DB_HOST
echo $DB_DEFAULT
echo $ALT_WSL_HOST
echo "Selecting from DB " $DB_DEFAULT
mysql -u$DB_USER -p$DB_PASS -h$ALT_WSL_HOST $DB_DEFAULT -e "use test1; select * from products;"
I setup two versions of the account user , granting it all and with the ability to access via localhost and the second account via 172.0.0.0/255.0.0.0 to handle the fact that WSL comes up with different addresses on reboot.
the variable $ALT_WSL_HOST value is 172.25.208.1 (as I debug it as I type this)
In the Bash script debugger (VS Code) or while running the script ./simpleDBselect.sh, i get
ERROR 1045 (28000): Access denied for user 'wsl_root
'#'172.25.220.8' (using password: YES)
The same mysql command in the same session (i just invoked it in the Terminal Tab of VS Code) or quitting VS code , I get:
mysql -uwsl_root -pxxx-xxxxx -h172.25.208.1 test1 -e "use test1; select * from products;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+-----------+----------+-------------+---------------+---------+-------------+-----------+-------------+
| productID | widgetid | productName | productNumber | color | inhouseCost | listPrice | productSize |
+-----------+----------+-------------+---------------+---------+-------------+-----------+-------------+
| 1 | 101 | Widget1 | s1001 | Yellow | 25.00 | 40.00 | medium |
| 2 | 102 | Widget2 | s1002 | Black | 27.00 | 45.00 | small |
| 3 | 103 | Widget3 | s1003 | Black | 28.00 | 40.00 | medium |
| 4 | 104 | Widget4 | s1004 | Red | 21.00 | 34.00 | small |
| 5 | 105 | Widget5 | s1005 | Green | 15.00 | 26.00 | large |
| 6 | 106 | Widget6 | s1006 | Magenta | 40.00 | 75.00 | large |
| 7 | 107 | Widget7 | s1007 | Orange | 50.00 | 85.00 | medium |
| 8 | 108 | Widget8 | s1008 | Blue | 39.00 | 55.00 | small |
| 9 | 109 | Widget9 | s1009 | Gold | 189.00 | 300.00 | large |
+-----------+----------+-------------+---------------+---------+-------------+-----------+-------------+
Does anyone know what I might be missing. I have read loads of documentation and think I have the DB accounts setup correctly for the variable hosts. Note,I have also tried '%' as is the default wildcard . Should I just install MySQL in the ubantu running in WSL as opposed to the windows install. Also note that I am cognizant that WSL running its own virtual ip address , hence why i'm using WSL_HOST_IP=$(ipconfig.exe | awk '/WSL/ {getline; getline; getline; getline; print substr($14, 1, length($14)-1)}').
Any help would be appreciated as I'm at my feeble wits end :)
Best Regards,
Glenn Firester
see above description for my attempts, but wsl_root has DBA and all grants to the tables in db test1

Freeradius 3.0 doesn't send any data through mysql (radacct table empty)

I have deployed a freeradius server version 3.0 with MySQL in Ubuntu (20.04).
There are the respective versions of the radius and MySQL:
root#server:/etc/freeradius/3.0# mysql -V
mysql Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
root#server:/etc/freeradius/3.0# freeradius -v
radiusd: FreeRADIUS Version 3.0.20, for host x86_64-pc-linux-gnu, built on Jan 25 2020 at 06:11:13
FreeRADIUS Version 3.0.20
Copyright (C) 1999-2019 The FreeRADIUS server project and contributors
My situation is: from a mobile phone I need to connect to an access point and authenticate to that WLAN network (802.11i-WPA-802.1x) using a username and password specified in the radius database.
I would like to limit the number of concurrent logins per user (demouser in this case) to 1 because i only have one unique user. I have spent a lot of time searching the forums and documentation, but can't find anything to figure it.
This is the output of freeradius in debug mode.
I have several doubts about what would be the best way to create the user in the database.
At the moment it is configured in a basic way, that is to say I have specified in the radcheck table the user and in the nas table my client (the access point):
mysql> SELECT * FROM nas;
+----+-------------+-----------+-------+-------+----------+--------+-----------+---------------+
| id | nasname | shortname | type | ports | secret | server | community | description |
+----+-------------+-----------+-------+-------+----------+--------+-----------+---------------+
| 1 | 172.20.1.20 | AP_1 | other | NULL | 12345678 | NULL | NULL | Client Radius |
+----+-------------+-----------+-------+-------+----------+--------+-----------+---------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM radcheck; SELECT * FROM radgroupcheck; SELECT * FROM radgroupreply; SELECT * FROM radusergroup;
+----+----------+--------------------+----+----------+
| id | username | attribute | op | value |
+----+----------+--------------------+----+----------+
| 1 | demouser | Cleartext-Password | := | demopass |
+----+----------+--------------------+----+----------+
1 row in set (0.00 sec)
+----+------------+------------------+----+-------+
| id | groupname | attribute | op | value |
+----+------------+------------------+----+-------+
| 1 | demo_group | Simultaneous-Use | := | 1 |
+----+------------+------------------+----+-------+
1 row in set (0.00 sec)
+----+------------+--------------------+----+---------------------+
| id | groupname | attribute | op | value |
+----+------------+--------------------+----+---------------------+
| 1 | demo_group | Service-Type | := | Framed-User |
| 2 | demo_group | Framed-Protocol | := | PPP |
| 3 | demo_group | Framed-Compression | := | Van-Jacobsen-TCP-IP |
+----+------------+--------------------+----+---------------------+
3 rows in set (0.00 sec)
+----+----------+------------+----------+
| id | username | groupname | priority |
+----+----------+------------+----------+
| 1 | demouser | demo_group | 0 |
+----+----------+------------+----------+
1 row in set (0.00 sec)
And if I see the logs file the authentications makes successfull:
This is the output of /var/log/freeradius/sqllog.sql
INSERT INTO radpostauth (username, pass, reply, authdate) VALUES ( 'demouser', '', 'Access-Accept', '2021-04-27 09:04:19');
This is the other output of /var/radacct/172.20.1.20/auth-detail....
Tue Apr 27 09:37:46 2021
Packet-Type = Access-Request
User-Name = "demouser"
Service-Type = Framed-User
NAS-IP-Address = 172.20.1.20
NAS-Port = 1
NAS-Port-Id = "1"
State = 0x7eea823f76e09b79e9fd1595f5ebf47a
Called-Station-Id = "EC-E5-55-FF-FB-34:HIRSCHMANN_C"
NAS-Port-Type = Wireless-802.11
WLAN-RF-Band = 2
WLAN-Pairwise-Cipher = 1027076
WLAN-Group-Cipher = 1027076
WLAN-AKM-Suite = 1027073
Calling-Station-Id = "78-B8-D6-32-9A-32"
Connect-Info = "CONNECT 72 Mbps 802.11g/n"
NAS-Identifier = "AP_1"
Framed-MTU = 1500
EAP-Message = 0x020a002e190017030300230000000000000004425534a3fe3939c900e3f3f672628ae179e4a05c48c1964a31dba1
Message-Authenticator = 0xd87613c17b7b62950f6613cbca20e109
Event-Timestamp = "Apr 27 2021 09:37:46 CEST"
Timestamp = 1619509066
And if I perform a select from the radpostauth table I also see that the user has been successfully authenticated.
mysql> select * from radpostauth;
+----+----------+------+---------------+---------------------+
| id | username | pass | reply | authdate |
+----+----------+------+---------------+---------------------+
| 55 | demouser | | Access-Accept | 2021-04-27 09:04:19 |
| 56 | demouser | | Access-Accept | 2021-04-27 09:04:19 |
+----+----------+------+---------------+---------------------+
The problem is that in the radacct table no data is stored, when I perform a select it appears empty, I have created another table which is radaacct2 with the same structure and specified in the mysql configuration file but no data is inserted either.
From what I understand (correct me if I'm wrong) Radius stores accounting data in radacct table. I read this document
If no data is stored, the following query will not be executed correctly. /etc/freeradius/3.0/main/mysql/queries.conf :
#######################################################################
# Simultaneous Use Checking Queries
#######################################################################
# simul_count_query - query for the number of current connections
# - If this is not defined, no simultaneous use checking
# - will be performed by this module instance
simul_count_query = "\
SELECT COUNT(*) \
FROM ${acct_table1} \
WHERE username = '%{SQL-User-Name}' \
AND acctstoptime IS NULL"
In short, I need to control the simultaneous login of a specific user and the cause of not working correctly I think it is due to not inserting data into the mysql table. I am probably wrong because I am new to this, I ask please that in the case that it is so, someone provide me with information on how to fix this problem or specify me really what is the error in question.
I have been working on this for weeks and it is the last thing I need to configure but I can't figure it out.
If any more configuration files are needed please let me know and I will attach them.
Help please
PD: as extra information I have to add that I have followed the configuration steps on this page and it still does not work.
Thanks

CSV output file using command line for wireshark IO graph statistics

I save the IO graph statistics as CSV file containing the bits per second using the wireshark GUI. Is there a way to generate this CSV file with command line tshark? I can generate the statistics on command line as bytes per second as follows
tshark -nr test.pcap -q -z io,stat,1,BYTES
How do I generate bits/second and save it to a CSV file?
Any help is appreciated.

I don't know a way to do that using only tshark, but you can easily parse the output from tshark into a CSV file:
tshark -nr tmp.pcap -q -z io,stat,1,BYTES | grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" | awk -F '[ |]+' '{print $2","($5*8)}'
Explanations
grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" selects only the raw from the tshark output with the actual data (i.e., second <> second | transmitted bytes).
awk -F '[ |]+' '{print $2","($5*8)}' splits that data into 5 blocks with [ |]+ as the separator and display blocks 2 (the second at which starts the interval) and 5 (the transmitted bytes) with a comma between them.

Another thing that may be good to know:
If you change the interval from 1 second to 0.5 seconds, then you have to allow . in the grep part by adding \. between two digits \d .
Otherwise the result will be an empty *.csv file.
grep -P "\d{1,2}\.{1}\d{1,2}\s+<>\s+\d{1,2}\.{1}\d{1,2}\s*\|\s+\d+"

The answers in this thread gave me the keys to solving a similar problem with tshark io stats and I wanted to share the results and how it works. In my case, the task was to convert multiple columns of tshark io stat records with potential decimals in the data. This answer converts multiple data columns to csv, adds rudimentary headers, accounts for decimals in fields and variable numbers of spaces.
Complete command string
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10" \
| grep -P "\d+\.?\d*\s+<>\s+|Interval +\|" \
| tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g' > somefile.csv
Explanation
The command string has 3 major parts.
tshark creates the report with the data in columns
Extract the desired lines with grep
Use tr and sed to convert the records grep matched into a csv delimited file.
Part 1: tshark creates the report with the data in columns
tshark is run with -z io,stat at a 30 second interval, counting frames and bytes with various filters.
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10"
Here is the output when run against my test pcap file:
=================================================================================================
| IO Statistics |
| |
| Duration: 179.179180 secs |
| Interval: 30 secs |
| |
| Col 1: Frames and bytes |
| 2: FRAMES |
| 3: BYTES |
| 4: FRAMES()ip.src == 10.10.10.10 |
| 5: BYTES()ip.src == 10.10.10.10 |
| 6: FRAMES()ip.dst == 10.10.10.10 |
| 7: BYTES()ip.dst == 10.10.10.10 |
|-----------------------------------------------------------------------------------------------|
| |1 |2 |3 |4 |5 |6 |7 |
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
|-----------------------------------------------------------------------------------------------|
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
=================================================================================================
Considerations
Looking at this output, we can see several items to consider:
Rows with data have a unique sequence in the Interval column of "space<>space", which we will can use for matching.
We want the header line, so we will use the word "Interval" followed by spaces and then a "|" character.
The number of spaces in a column are variable depending on the number of digits per measurement.
The Interval column gives both the time from 0 and the from the first measurement. Either can be used, so we will keep both and let the user decide.
When using milliseconds there will be decimals in the Interval field
Depending on the statistic requested, there may be decimals in the data columns
The use of "|" as delimiters will require escaping in any regex statement that covers them.
Part 2: Extract the desired lines with grep
Once tshark produces output, we use grep with regex to extract the lines we want to save.
grep -P "\d+\.?\d*\s+<>\s+|Interval +\|""
grep will use the "Digit(s)Space(s)<>Space(s)" character sequence in the Interval column to match the lines with data. It also uses an OR to grab the header by matching the characters "Interval |".
grep -P # The "-P" flag turns on PCRE regex matching, which is not the same as egrep. With egrep, you will need to change the escaping.
"\d+ # Match on 1 or more Digits. This is the 1st set of numbers in the Interval column.
\.? # 0 or 1 Periods. We need this to handle possible fractional seconds.
\d* # 0 or more Digits. To handle possible fractional seconds.
\s+<>\s+ # 1 or more Spaces followed by the Characters "<>", then 1 or more Spaces.
| # Since this is not escaped, it is a regex OR
Interval\s+\|" # Match the String "Interval" followed by 1 or more Spaces and a literal "|".
From the tshark output, grep matched these lines:
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
Part 3: Use tr and sed to convert the records grep matched into a csv delimited file.
tr and sed are used for converting the lines grep matched into csv. tr does the bulk work of removing spaces and changing the "|" to ",". This is simpler and faster then using sed. However, sed is used for some cleanup work
tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g'
Here is how these commands perform the conversion. The first trick is to get rid of all of the spaces. This means we dont have to account for them in any regex sequences, making the rest of the work simpler
| tr -d " " # Spaces are in the way, so delete them.
| tr "|" "," # Change all "|" Characters to ",".
| sed -E 's/<>/,/; # Change "<>" to "," splitting the Interval column.
s/(^,|,$)//g; # Delete leading and/or trailing "," on each line.
s/Interval/Start,Stop/g' # Each of the "Interval" columns needs a header, so change the text "Interval" into two words with a , separating them.
> somefile.csv # Pipe the output into somefile.csv
Final result
Once through this process, we have a csv output that can now be imported into your favorite csv tool, spreadsheet, or fed to a graphing program like gnuplot.
$cat somefile.csv
Start,Stop,Frames,Bytes,FRAMES,BYTES,FRAMES,BYTES,FRAMES,BYTES
0,30,107813,120111352,107813,120111352,26682,15294257,80994,104808983
30,60,122437,124508575,122437,124508575,49331,17080888,73017,107422509
60,90,138999,135488315,138999,135488315,54829,22130920,84029,113348686
90,120,158241,217781653,158241,217781653,42103,15870237,115971,201901201
120,150,111708,131890800,111708,131890800,43709,18800647,67871,113082296
150,Dur,123736,142639416,123736,142639416,50754,22053280,72786,120574520

Sqoop incremental import and update does not work

How can I update the data in HDFS file similar to data in MySQL table?
I checked the internet, but all the examples given are with --incremental lastmodified example.
Where in my case my MySQL table does not contain a date or timestamp column.
How can I update the data in HDFS file similar to data in MySQL table that does not contain date column?
I have MySQL table as below
mysql> select * from employee;
+----+--------+--------+------+-------+-----------+
| id | name | gender | age | state | language |
+----+--------+--------+------+-------+-----------+
| 1 | user1 | m | 25 | tn | tamil |
| 2 | user2 | m | 41 | ka | tamil |
| 3 | user3 | f | 47 | kl | tamil |
| 4 | user4 | f | 52 | ap | telugu |
| 5 | user5 | m | 55 | ap | telugu |
| 6 | user6 | f | 43 | tn | tamil |
| 7 | user7 | m | 34 | tn | malayalam |
| 8 | user8 | f | 33 | ap | telugu |
| 9 | user9 | m | 36 | ap | telugu |
I imported to HDFS using the below command.
[cloudera#localhost ~]$ sqoop import --connect jdbc:mysql://localhost:3306/mydatabase --username root --table employee --as-textfile --target-dir hdfs://localhost.localdomain:8020/user/cloudera/data/employee
The data is imported as expected.
[cloudera#localhost ~]$ hadoop fs -ls /user/cloudera/data/employee/
Found 6 items
-rw-r--r-- 3 cloudera cloudera 0 2017-08-16 23:57 /user/cloudera/data/employee/_SUCCESS
drwxr-xr-x - cloudera cloudera 0 2017-08-16 23:56 /user/cloudera/data/employee/_logs
-rw-r--r-- 3 cloudera cloudera 112 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00000
-rw-r--r-- 3 cloudera cloudera 118 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00001
-rw-r--r-- 3 cloudera cloudera 132 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00002
-rw-r--r-- 3 cloudera cloudera 136 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00003
Now I updated values and inserted values in mysql table. But this table doesnot contain date column.
mysql> update employee set language = 'marathi' where id >= 8;
mysql> insert into employee (name,gender,age,state,language from people) values('user11','f','25','kl','malayalam');
I know the newly inserted values can be inserted to hdfs using --check-column, incremental append and --last-value.
But how can I update the values in hdfs for the mysql table rows 8 and 9 that were updated to 'marathi'? Also, my employee table does not contain a date or timestamp column.

For newly inserted row, you can always use:
--incremental append --check-column id --last-value 9
But for getting updates from table not having updated_at column, I don't think thats possible. If your table is very small, then probably just do a full dump every time.
Or if you somehow can maintain track of what all ids got updated since last import, then let's say you know ids 7, 3, 4 and 8 got updated since last import, you can use the minimum of updated ids and use as --last-value. So your config will be:
--incremental append --check-column id --last-value 3 --merge-key id
where --merge-key id will tell sqoop to merge the new incremental data with old based on id column.

Loading CSV with NULLs columns using bq load

I am trying to upload a CSV file(TSV actually) generated in mysql(using outfile) into Bigquery using bq tool. This table has following schema:
Here is the sample data file:
"6.02" "0000" "101" \N "Md Fiesta Chicken|1|6.69|M|300212|100|100^M Sourdough|1|0|M|51301|112|112" "6.5" \N "V03" "24270310376" "10/17/2014 3:34 PM" "6.02" "30103" "452" "302998" "2014-12-08 10:57:15" \N
And this is how I try to upload it using bq CLI tool:
$ bq load -F '\t' --quote '"' --allow_jagged_rows receipt_archive.receipts /tmp/rec.csv
BigQuery error in load operation: Error processing job
'circular-gist-812:bqjob_r8d0bbc3192b065_0000014ab097c63c_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 0 / Line:1 / Field:16: Could not parse '\N' as a timestamp.
Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]]
I think the issue is that updated_at column is NULL & hence skipped. so any idea how can I tell it to consider null/empty columns?

CuriousMind - This isn't an answer. Just an example of the problem of using floats instead of decimals...
CREATE TABLE fd (f FLOAT(5,2),d DECIMAL(5,2));
INSERT INTO fd VALUES (100.30,100.30),(100.70,100.70;
SELECT * FROM fd;
+--------+--------+
| f | d |
+--------+--------+
| 100.30 | 100.30 |
| 100.70 | 100.70 |
+--------+--------+
SELECT f/3+f/3+f/3,d/3+d/3+d/3 FROM fd;
+-------------+-------------+
| f/3+f/3+f/3 | d/3+d/3+d/3 |
+-------------+-------------+
| 100.300003 | 100.300000 |
| 100.699997 | 100.700000 |
+-------------+-------------+
SELECT (f/3)*3,(d/3)*3 FROM fd;
+------------+------------+
| (f/3)*3 | (d/3)*3 |
+------------+------------+
| 100.300003 | 100.300000 |
| 100.699997 | 100.700000 |
+------------+------------+
But why is this a problem, I hear you ask?
Well, consider the following...
SELECT * FROM fd WHERE f <= 100.699997;
+--------+--------+
| f | d |
+--------+--------+
| 100.30 | 100.30 |
| 100.70 | 100.70 |
+--------+--------+
...now surely that's not what would be expected when dealing with money?

To specify "null" in a CSV file, elide all data for the field. (It looks like you are using an unspecified escape syntax "\N".)
For example:
$ echo 2, > rows.csv
$ bq load tmp.test rows.csv a:integer,b:integer
$ bq head tmp.test
+---+------+
| a | b |
+---+------+
| 2 | NULL |
+---+------+

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Print column names only once in shell script - mysql

Related

MYSQL client issue - Having an access denied issue connecting via Bash Script (WSL2-Ubantu to a localhost windows MySQL DB 8.0 instance

Freeradius 3.0 doesn't send any data through mysql (radacct table empty)

CSV output file using command line for wireshark IO graph statistics

Sqoop incremental import and update does not work

Loading CSV with NULLs columns using bq load

Categories

Resources