Sqoop incremental import and update does not work

Sqoop incremental import and update does not work - mysql

How can I update the data in HDFS file similar to data in MySQL table?
I checked the internet, but all the examples given are with --incremental lastmodified example.
Where in my case my MySQL table does not contain a date or timestamp column.
How can I update the data in HDFS file similar to data in MySQL table that does not contain date column?
I have MySQL table as below
mysql> select * from employee;
+----+--------+--------+------+-------+-----------+
| id | name | gender | age | state | language |
+----+--------+--------+------+-------+-----------+
| 1 | user1 | m | 25 | tn | tamil |
| 2 | user2 | m | 41 | ka | tamil |
| 3 | user3 | f | 47 | kl | tamil |
| 4 | user4 | f | 52 | ap | telugu |
| 5 | user5 | m | 55 | ap | telugu |
| 6 | user6 | f | 43 | tn | tamil |
| 7 | user7 | m | 34 | tn | malayalam |
| 8 | user8 | f | 33 | ap | telugu |
| 9 | user9 | m | 36 | ap | telugu |
I imported to HDFS using the below command.
[cloudera#localhost ~]$ sqoop import --connect jdbc:mysql://localhost:3306/mydatabase --username root --table employee --as-textfile --target-dir hdfs://localhost.localdomain:8020/user/cloudera/data/employee
The data is imported as expected.
[cloudera#localhost ~]$ hadoop fs -ls /user/cloudera/data/employee/
Found 6 items
-rw-r--r-- 3 cloudera cloudera 0 2017-08-16 23:57 /user/cloudera/data/employee/_SUCCESS
drwxr-xr-x - cloudera cloudera 0 2017-08-16 23:56 /user/cloudera/data/employee/_logs
-rw-r--r-- 3 cloudera cloudera 112 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00000
-rw-r--r-- 3 cloudera cloudera 118 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00001
-rw-r--r-- 3 cloudera cloudera 132 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00002
-rw-r--r-- 3 cloudera cloudera 136 2017-08-16 23:56 /user/cloudera/data/employee/part-m-00003
Now I updated values and inserted values in mysql table. But this table doesnot contain date column.
mysql> update employee set language = 'marathi' where id >= 8;
mysql> insert into employee (name,gender,age,state,language from people) values('user11','f','25','kl','malayalam');
I know the newly inserted values can be inserted to hdfs using --check-column, incremental append and --last-value.
But how can I update the values in hdfs for the mysql table rows 8 and 9 that were updated to 'marathi'? Also, my employee table does not contain a date or timestamp column.

For newly inserted row, you can always use:
--incremental append --check-column id --last-value 9
But for getting updates from table not having updated_at column, I don't think thats possible. If your table is very small, then probably just do a full dump every time.
Or if you somehow can maintain track of what all ids got updated since last import, then let's say you know ids 7, 3, 4 and 8 got updated since last import, you can use the minimum of updated ids and use as --last-value. So your config will be:
--incremental append --check-column id --last-value 3 --merge-key id
where --merge-key id will tell sqoop to merge the new incremental data with old based on id column.

Related

Freeradius 3.0 doesn't send any data through mysql (radacct table empty)

I have deployed a freeradius server version 3.0 with MySQL in Ubuntu (20.04).
There are the respective versions of the radius and MySQL:
root#server:/etc/freeradius/3.0# mysql -V
mysql Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
root#server:/etc/freeradius/3.0# freeradius -v
radiusd: FreeRADIUS Version 3.0.20, for host x86_64-pc-linux-gnu, built on Jan 25 2020 at 06:11:13
FreeRADIUS Version 3.0.20
Copyright (C) 1999-2019 The FreeRADIUS server project and contributors
My situation is: from a mobile phone I need to connect to an access point and authenticate to that WLAN network (802.11i-WPA-802.1x) using a username and password specified in the radius database.
I would like to limit the number of concurrent logins per user (demouser in this case) to 1 because i only have one unique user. I have spent a lot of time searching the forums and documentation, but can't find anything to figure it.
This is the output of freeradius in debug mode.
I have several doubts about what would be the best way to create the user in the database.
At the moment it is configured in a basic way, that is to say I have specified in the radcheck table the user and in the nas table my client (the access point):
mysql> SELECT * FROM nas;
+----+-------------+-----------+-------+-------+----------+--------+-----------+---------------+
| id | nasname | shortname | type | ports | secret | server | community | description |
+----+-------------+-----------+-------+-------+----------+--------+-----------+---------------+
| 1 | 172.20.1.20 | AP_1 | other | NULL | 12345678 | NULL | NULL | Client Radius |
+----+-------------+-----------+-------+-------+----------+--------+-----------+---------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM radcheck; SELECT * FROM radgroupcheck; SELECT * FROM radgroupreply; SELECT * FROM radusergroup;
+----+----------+--------------------+----+----------+
| id | username | attribute | op | value |
+----+----------+--------------------+----+----------+
| 1 | demouser | Cleartext-Password | := | demopass |
+----+----------+--------------------+----+----------+
1 row in set (0.00 sec)
+----+------------+------------------+----+-------+
| id | groupname | attribute | op | value |
+----+------------+------------------+----+-------+
| 1 | demo_group | Simultaneous-Use | := | 1 |
+----+------------+------------------+----+-------+
1 row in set (0.00 sec)
+----+------------+--------------------+----+---------------------+
| id | groupname | attribute | op | value |
+----+------------+--------------------+----+---------------------+
| 1 | demo_group | Service-Type | := | Framed-User |
| 2 | demo_group | Framed-Protocol | := | PPP |
| 3 | demo_group | Framed-Compression | := | Van-Jacobsen-TCP-IP |
+----+------------+--------------------+----+---------------------+
3 rows in set (0.00 sec)
+----+----------+------------+----------+
| id | username | groupname | priority |
+----+----------+------------+----------+
| 1 | demouser | demo_group | 0 |
+----+----------+------------+----------+
1 row in set (0.00 sec)
And if I see the logs file the authentications makes successfull:
This is the output of /var/log/freeradius/sqllog.sql
INSERT INTO radpostauth (username, pass, reply, authdate) VALUES ( 'demouser', '', 'Access-Accept', '2021-04-27 09:04:19');
This is the other output of /var/radacct/172.20.1.20/auth-detail....
Tue Apr 27 09:37:46 2021
Packet-Type = Access-Request
User-Name = "demouser"
Service-Type = Framed-User
NAS-IP-Address = 172.20.1.20
NAS-Port = 1
NAS-Port-Id = "1"
State = 0x7eea823f76e09b79e9fd1595f5ebf47a
Called-Station-Id = "EC-E5-55-FF-FB-34:HIRSCHMANN_C"
NAS-Port-Type = Wireless-802.11
WLAN-RF-Band = 2
WLAN-Pairwise-Cipher = 1027076
WLAN-Group-Cipher = 1027076
WLAN-AKM-Suite = 1027073
Calling-Station-Id = "78-B8-D6-32-9A-32"
Connect-Info = "CONNECT 72 Mbps 802.11g/n"
NAS-Identifier = "AP_1"
Framed-MTU = 1500
EAP-Message = 0x020a002e190017030300230000000000000004425534a3fe3939c900e3f3f672628ae179e4a05c48c1964a31dba1
Message-Authenticator = 0xd87613c17b7b62950f6613cbca20e109
Event-Timestamp = "Apr 27 2021 09:37:46 CEST"
Timestamp = 1619509066
And if I perform a select from the radpostauth table I also see that the user has been successfully authenticated.
mysql> select * from radpostauth;
+----+----------+------+---------------+---------------------+
| id | username | pass | reply | authdate |
+----+----------+------+---------------+---------------------+
| 55 | demouser | | Access-Accept | 2021-04-27 09:04:19 |
| 56 | demouser | | Access-Accept | 2021-04-27 09:04:19 |
+----+----------+------+---------------+---------------------+
The problem is that in the radacct table no data is stored, when I perform a select it appears empty, I have created another table which is radaacct2 with the same structure and specified in the mysql configuration file but no data is inserted either.
From what I understand (correct me if I'm wrong) Radius stores accounting data in radacct table. I read this document
If no data is stored, the following query will not be executed correctly. /etc/freeradius/3.0/main/mysql/queries.conf :
#######################################################################
# Simultaneous Use Checking Queries
#######################################################################
# simul_count_query - query for the number of current connections
# - If this is not defined, no simultaneous use checking
# - will be performed by this module instance
simul_count_query = "\
SELECT COUNT(*) \
FROM ${acct_table1} \
WHERE username = '%{SQL-User-Name}' \
AND acctstoptime IS NULL"
In short, I need to control the simultaneous login of a specific user and the cause of not working correctly I think it is due to not inserting data into the mysql table. I am probably wrong because I am new to this, I ask please that in the case that it is so, someone provide me with information on how to fix this problem or specify me really what is the error in question.
I have been working on this for weeks and it is the last thing I need to configure but I can't figure it out.
If any more configuration files are needed please let me know and I will attach them.
Help please
PD: as extra information I have to add that I have followed the configuration steps on this page and it still does not work.
Thanks

Print column names only once in shell script

Here is my shell script and It is working fine without any errors. But i want to get output differently.
Script:
#!/bin/bash
DB_USER='root'
DB_PASSWD='123456'
DB_NAME='job'
Table_Name='status_table'
#sql=select job_name, date,status from $DB_NAME.$Table_Name where job_name='$f1' and date=CURDATE()
file="/root/jobs.txt"
while read -r f1
do
mysql -N -u$DB_USER -p$DB_PASSWD <<EOF
select job_name, date,status from $DB_NAME.$Table_Name where job_name='$f1' and date=CURDATE()
EOF
done <"$file"
Source Table:
mysql> select * from job.status_table
+---------+----------+------------+-----------+
| Job_id | Job Name | date | status |
+---------+----------+--------+---------------+
| 111 | AA | 2016-12-01 | completed |
| 112 | BB | 2016-12-01 | completed |
| 113 | CC | 2016-12-02 | completed |
| 112 | BB | 2016-12-01 | completed |
| 114 | DD | 2016-12-02 | completed |
| 201 | X | 2016-12-03 | completed |
| 202 | y | 2016-12-04 | completed |
| 203 | z | 2016-12-03 | completed |
| 111 | A | 2016-12-04 | completed |
+---------+----------+------------+-----------+
Input text file
[rteja#server0 ~]# more jobs.txt
AA
BB
CC
DD
X
Y
Z
A
ABC
XYZ
Output - Supressed coumn names
(mysql -N -u$DB_USER -p$DB_PASSWD <<EOF)
[rteja#server0 ~]# ./script.sh
AA 2016-12-01 completed
BB 2016-12-01 completed
Output - without Suppressed column names, output printing the columns names for every loop iteration.
(mysql -u$DB_USER -p$DB_PASSWD <<EOF)
[rteja#server0 ~]# ./script.sh
job_name date status
AA 2016-12-01 completed
job_name date status
BB 2016-12-01 completed
Challenges:
1. Want to print column names only once in output and the result i want to store in CSV file.
2. I don't want to expose password & username in code to everyone. Is there way to hide like i heard we can create environmental variables and call it in the script. And we can set the permissions for the environmental variable file to prevent everyone to access it, and only our should be able to access it.

Rather than executing a select query multiple times you can run a single query as:
job_name in ('AA','BB','CC'...)
To do that first read complete file in an array using mapfile:
mapfile -t arr < jobs.txt
Then format the array values into a list of values suited for IN operator:
printf -v cols "'%s'," "${arr[#]}"
cols="(${cols%,})"
Display your values:
echo "$cols"
('AA','BB','CC','DD','X','Y','Z','A','ABC','XYZ')
Finally run your SQL query as:
mysql -N -u$DB_USER -p$DB_PASSWD <<EOF
select job_name, date,status from $DB_NAME.$Table_Name
where job_name IN "$cols" and date=CURDATE();
EOF
To securely connecting to MySQL use login-paths (.mylogin.cnf)
As per MySQL manual:
The best way to specify server connection information is with your .mylogin.cnf file. Not only is this file encrypted, but any logging of the utility execution does not expose the connection information.

Get Number of A's in Result Table - MySQL

This is the case. In my school all classes prepare excel sheet for each class with marks for each subject in term end test. There are 17 classes. I combine them in to access table. Then again export all data in to excel. make csv file . And import to Mysql Database using phpmyadmin. now I have result table as follow.
| ID | Name | Religion | Sinhala | science | english | maths | History | Categery 1 | Categery 2 | Categery 3 | Total | Average | Rank | |
|---- |------- |---------- |--------- |--------- |--------- |------- |--------- |------------ |------------ |------------ |------- |--------- |------ |--- |
| 1 | manoj | 45 | 65 | 78 | 98 | 67 | 67 | 63 | 76 | 64 | 654 | 62 | 12 | |
Sectional Head Need to get number of students who got >75 for all Subject.
And Number of Student Who got >75 for 8 subject out of 9.
I need to retrieve number of A s, B s (marks >=75) from this table.
Ex. Student names and Number of A s
Total Number of A for all 9 subject - 45
Total Number of A for all 8 subject (any 8 subject ) - 45
Total Number of A for all 7 subject (any 7 subject ) - 45
I Tried following SQL Statement
SELECT COUNT(SELECT COUNT()
FROM result
WHERE religion >=75
AND Math >=75)
FROM result
I read about same scenario in stack overflow.
Access 2010
this one get some point. but I cant solve it for my scenario.

Use GROUP BY studentName and SUM(grade = 'A') AS numberOfAs.
[Quick answer bc question is quickly formatted]

MySQL, Determine value associated with MAX() of another value using GROUP BY [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 6 years ago.
I have a MySQL database that contains the table, "message_route". This table tracks the the path between hubs a message from a device takes before it finds a modem and goes out to the internet.
"message_route" contains the following columns:
id, summary_id, device_id, hub_address, hop_count, event_time
Each row in the table represents a single "hop" between two hubs. The column "device_id" gives the id of the device the message originated from. The column "hub_address" gives the id of the hub the message hop was received by, and "hop_count" counts these hops incrementally. The full route of the message is bound together by the "summary_id" key. A snippet of the table to illustrate:
+-----+------------+-----------+-------------+-----------+---------------------+
| id | summary_id | device_id | hub_address | hop_count | event_time |
+-----+------------+-----------+-------------+-----------+---------------------+
| 180 | 158 | 1099 | 31527 | 1 | 2011-10-01 04:50:53 |
| 181 | 159 | 1676 | 51778 | 1 | 2011-10-01 00:12:04 |
| 182 | 159 | 1676 | 43567 | 2 | 2011-10-01 00:12:04 |
| 183 | 159 | 1676 | 33805 | 3 | 2011-10-01 00:12:04 |
| 184 | 160 | 2326 | 37575 | 1 | 2011-10-01 00:12:07 |
| 185 | 160 | 2326 | 48024 | 2 | 2011-10-01 00:12:07 |
| 186 | 160 | 2326 | 57652 | 3 | 2011-10-01 00:12:07 |
+-----+------------+-----------+-------------+-----------+---------------------+
There are three total messages here. The message with summary_id = 158 touched only one hub before finding a modem, so row with id = 180 is the entire record of that message. Summary_ids 159 and 160 each have 3 hops, each touching 3 different hubs. There is no upward limit of the number of hops a message can have.
I need to create a MySQL query that gives me a list of the unique "hub_address" values that constitute the last hop of a message. In other words, the hub_address associated with the maximum hop_count for each summary_id. With the database snippet above, the output should be "31527, 33805, 57652".
I have been unable to figure this out. In the meantime, I am using this code as a proxy, which only gives me the unique hub_address values for messages with a single hop, such as summary_id = 158.
SELECT DISTINCT(x.hub_address)
FROM (SELECT hub_address, COUNT(summary_id) AS freq
FROM message_route GROUP BY summary_id) AS x
WHERE x.freq = 1;

I would approach this as:
select distinct mr.hub_address
from message_route mr
where mr.event_time = (select max(mr2.event_time)
from message_route mr2
where mr2.summary_id = mr.summary_id
);

Erasing duplicate records from MySQL

Due to a bug in my javascript click handling, multiple Location objects are posted to a JSON array that is sent to the server. I think I know how to fix that bug, but I'd also like to implement a server side database duplicate erase function. However, I'm not sure how to write this query.
The only affected table is laid out as
+----+------------+--------+
| ID | locationID | linkID |
+----+------------+--------+
| 64 | 13 | 14 |
| 65 | 14 | 13 |
| 66 | 14 | 15 |
| 67 | 15 | 14 |
| 68 | 15 | 16 |
| 69 | 16 | 17 |
| 70 | 16 | 14 |
| 71 | 17 | 16 |
| 72 | 17 | 16 |
| 73 | 17 | 16 |
| 74 | 17 | 16 |
| 75 | 17 | 16 |
| 76 | 17 | 16 |
| 77 | 17 | 16 |
+----+------------+--------+
As you can see, I have multiple pairs of (17, 16), while 14 has two pairs of (14, 13) and (14, 15). How can I delete all but one record of any duplicate entries?

Don't implement post factum correction logic, put a unique index on the fields that need to be unique, that way the database will stop dupe inserts before it's too late.
If you're using MySQL 5.1 or higher you can remove dupes and create a unique index in 1 command:
ALTER IGNORE TABLE 'YOURTABLE'
ADD UNIQUE INDEX somefancynamefortheindex (locationID, linkID)

You can create a temporary table where you can store the distinct records and then truncate the original table and insert data from temp table.
CREATE TEMPORARY TABLE temp_table (locationId INT,linkID INT)
INSERT INTO temp_table (locationId,linkId) SELECT DISTINCT locationId,linkId FROM table1;
DELETE from table1;
INSERT INTO table1 (locationId,linkId) SELECT * FROM temp_table ;

delete from tbl
using tbl,tbl t2
where tbl.locationID=t2.locationID
and tbl.linkID=t2.linkID
and tbl.ID>t2.ID

I am assuming you don't mean for the clean up, but for the new check? Put a unique index on if possible, if you don't have control of the DB do an upsert and check for nulls instead of an insert.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Sqoop incremental import and update does not work - mysql

Related

Freeradius 3.0 doesn't send any data through mysql (radacct table empty)

Print column names only once in shell script

Get Number of A's in Result Table - MySQL

MySQL, Determine value associated with MAX() of another value using GROUP BY [duplicate]

Erasing duplicate records from MySQL

Categories

Resources