remove duplicate headers in hive-beeline - mysql

Select ACCOUNT_NUMBER, BIN AS RISK_BIN FROM test.daily_call_routing2;"
| python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin,
dialect=csv.excel_tab);writer = csv.writer(sys.stdout,
dialect=csv.excel)\nfor row in reader: writer.writerow(row)")' >
$EXPORT_DIR//home/rabbid160/test_$DATE_STR.csv;**
When I'm trying to execute the above command in beeline-hive, I am able to see the data however with multiple headers in between. Can anyone please tell me how this can be solved to only one header and data following.
Example as follows:
+-------------------+-----------+--+
| account_number | risk_bin |
+-------------------+-----------+--+
| 8498310230444304 | 2 |
| 8778104140754717 | 2 |
| 8155100513664825 | 2 |
| 8155100513664825 | 2 |
| 8155400040004812 | 2 |
| 8155200521190266 | 2 |
| 8155300210482543 | 2 |
| 8497202241094288 | 2 |
| 8155500010197049 | 2 |
+-------------------+-----------+--+
| account_number | risk_bin |
+-------------------+-----------+--+
| 8155100030718781 | 2 |
| 8495444731138751 | 2 |
| 8498320015120250 | 2 |
| 8498330360083177 | 2 |
| 8155300210487112 | 2 |
| 8777701821146336 | 2 |
| 8497202461586765 | 2 |
| 8155400310837610 | 2 |

In beeline, the number of rows after which header should be repeated is defined by beeline variable headerinterval .
You may set headerinterval using beeline command !set headerinterval 100
Set headerinterval to large value so that possibly you will see header only once.

Related

Update one column based on matching values of two columns from the same table

Basically I have 3 columns, like this:
+-------------+-------------+--------+
| startpoint | endpoint | number |
+-------------+-------------+--------+
| 15037232632 | 4226861620 | null |
| 4226862003 | 4226862079 | null |
| 4226862079 | 4226862111 | null |
| 4226862111 | 4226862121 | 2 |
| 4226862121 | ---------- | 1 |
| 15025374738 | 4226862003 | null |
| 4226861620 | 15025374738 | null |
| 4226801794 | 15037232632 | null |
+-------------+-------------+--------+
What I am trying to do is:
Step 1 : I assign a number '1' to any one of the IDs from the 'startpoint' column
Step 2 : Match the 'startpoint' ID to which I assigned the number in the previous step with the IDs in the 'endpoint' column
Step 3 : After the 'startpoint' ID matches with the 'endpoint' ID, I assign the number 2 in the 'number' column on the ROW where the endpoint matched
Step 4: On the row where number was assigned, I take the 'startpoint' ID and then repeat the steps 2-4 again.
I have tried playing around with the update query but it doesn't seem right. Any help would be appreciated.
EDIT:
I am also including the expected output. The table without applying any queries is given above
+-------------+-------------+--------+
| startpoint | endpoint | number |
+-------------+-------------+--------+
| 15037232632 | 4226861620 | 7 |
| 4226862003 | 4226862079 | 4 |
| 4226862079 | 4226862111 | 3 |
| 4226862111 | 4226862121 | 2 |
| 4226862121 | ---------- | 1 |
| 15025374738 | 4226862003 | 5 |
| 4226861620 | 15025374738 | 6 |
| 4226801794 | 15037232632 | 8 |
+-------------+-------------+--------+

I understand that the PIVOT command can transform a dataset, is this the correct way how to do it?

I have a dataset that looks like this:
+----+-------------+
| ID | StoreVisit |
+----+-------------+
| 1 | Home Depot |
| 2 | Lowes |
| 3 | Home Depot |
| 2 | ACE |
| 2 | Lowes |
| 1 | Home Depot |
| 4 | ACE |
| 5 | ACE |
| 4 | Lowes |
+----+-------------+
I'm new(ish) to SQL and I know I can select all and then either use Excel (pivot table / functions / paste special) or R (tidyr) to transpose.. however, if I have a lot of data, this is not efficient. Is the query below correct? If so, how can I define all values of StoreVisit if there are thousands of types of stores without typing each one in the query?
select * from Stores
pivot (COUNT(StoreVisit) for StoreVisit in ([ACE],[Lowes],[Home Depot])) as StoreCounts
+----+-------+-----------+-----+
| ID | Lowes | HomeDepot | ACE |
+----+-------+-----------+-----+
| 1 | 0 | 2 | 0 |
| 2 | 2 | 0 | 0 |
| 3 | 0 | 1 | 0 |
| 4 | 1 | 0 | 1 |
| 5 | 0 | 0 | 1 |
+----+-------+-----------+-----+
Please excuse the formatting of this post! Many apologies.
Use conditional aggregation:
select id,
sum(storevisit = 'Lowes') as lowes,
sum(storevisit = 'HomeDepot') as HomeDepot,
sum(storevisit = 'Ace') as ace
from t
group by id;

Update table from another table throught intermediate table Mysql

I have the next table setup, two tables related by an intermediate table, like this:
Client
| client_id | ...|field_X |
| 1 | ...|value1 |
| 2 | ...|value2 |
| 3 | ...|value3 |
Project
| project_id | ...|field_X |
| 1 | ...| |
| 2 | ...| |
| 3 | ...| |
| 4 | ...| |
| 5 | ...| |
| 6 | ...| |
| 7 | ...| |
client_project
| client_id | project_id|
| 1 | 2 |
| 1 | 3 |
| 2 | 4 |
| 2 | 5 |
| 3 | 6 |
| 3 | 7 |
The field_x in the table project is new and i have to fill it with the data from table client to get approximately something like this:
Project
| project_id | ...|field_X |
| 1 | ...| |
| 2 | ...|value1 |
| 3 | ...|value1 |
| 4 | ...|value2 |
| 5 | ...|value2 |
| 6 | ...|value3 |
| 7 | ...|value3 |
i dont know hot to deal with the intermediate table. i have tried this code but it doesnt work.
INSERT INTO project
(field_x)
(select field_x
from
client_project
inner join
client
where client_project.client_id = client.client_id
);
I have the idea of what i have to do but i am not able to translate it into a sql command because of the intermeditate table.Could someone explain how to deal with it?
Thanks in advance.
I assume you already have all entries in the project table but they're missing the Field_X property? So what you need is an update, not an insert
UPDATE project p, client c, project_client pc SET p.Field_X=c.Field_X WHERE p.ID=pc.ProjectID AND c.ID=pc.ClientID
However, be advised that having the same data in two places is not a good practise; if possible always put one fact in only one place.

MySQL - Join tables and convert rows to columns

I have two tables similar to these (t_stamp would normally be a DATETIME, abbreviated here for clarity):
datapoints
+------+---------+----+---------+
| ndx | value | ID | t_stamp |
+------+---------+----+---------+
| 1 | 503.42 | 1 | 3/1/15 |
| 2 | 17.81 | 2 | 3/1/15 |
| 4 | 498.21 | 1 | 3/2/15 |
| 4 | 19.51 | 2 | 3/2/15 |
+------+---------+----+---------+
parameters
+------+----+---------------+-------+
| ndx | ID | description | unit |
+------+----+---------------+-------+
| 1 | 1 | wetwell level | ft |
| 2 | 2 | effluent flow | MGD |
+------+----+---------------+-------+
I'm looking to combine them so that the descriptions become column headers and list the values in order of time stamp, end result looking something like this:
new table
+---------+---------------+---------------+
| t_stamp | wetwell level | effluent flow |
+---------+---------------+---------------+
| 3/1/15 | 503.42 | 17.81 |
| 3/2/15 | 498.21 | 19.51 |
+---------+---------------+---------------+
Bearing in mind, I have considerably more rows in each table so I'm looking for something dynamic. It could be query or stored procedure based. Thank you for any help!

Subtract values from line above the current line in MySQL

I've the following table:
| id | Name | Date of Birth | Date of Death | Result |
| 1 | John | 3546565 | 3548987 | |
| 2 | Mary | 5233654 | 5265458 | |
| 3 | Lewis| 6546876 | 6548752 | |
| 4 | Mark | 6546546 | 6767767 | |
| 5 | Steve| 6546877 | 6548798 | |
And I need to do this for the whole table:
Result = 1, if( current_row(Date of Birth) - row_above_current_row(Date of Death))>X else 0
To make things easier, I guess, I created the same table above but with 2 extra id fields: id_minus_one and id_plus_one
Like this:
| id | id_minus_one | id_plus_one |Name | Date_of_Birth | Date_of_Death | Result |
| 1 | 0 | 2 |John | 3546565 | 3548987 | |
| 2 | 1 | 3 |Mary | 5233654 | 5265458 | |
| 3 | 2 | 4 |Lewis| 6546876 | 6548752 | |
| 4 | 3 | 5 |Mark | 6546546 | 6767767 | |
| 5 | 4 | 6 |Steve| 6546877 | 6548798 | |
So my approach would be something like (in pseudo code):
for id=1, ignore result. (Because there is no row above)
for id=2, Result = 1 if( (Where id=2).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death )>X else 0
for id=3, Result = 1 if( (Where id=3).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death)>X else 0
and so on for the whole table...
Just ignore id_plus_one if there is no need for it, I'll use it later for the same thing. So, if I manage to do this for id_minus_one I'll manage for id_plus_one as they are the same algorithm.
My question is how to pass that pseudo code into SQL code, I can't find a way to relate both ids in just one select.
Thank you!
As you describe this, it is just a self join with some logic on the select:
select t.*,
((t.date_of_birth - tprev.date_of_death) > x) as flag
from t left outer join
t tprev
on t.id_minus_one = tprev.id