I wish to port some R code to Hadoop to be used with Impala or Hive with a SQL-like query. The code I have is based on this question:
R data table: compare row value to group values, with condition
I wish to find, for each row, the number of rows with the same id in subgroup 1 with cheaper price.
Let's say I have the following data:
CREATE TABLE project
(
id int,
price int,
subgroup int
);
INSERT INTO project(id,price,subgroup)
VALUES
(1, 10, 1),
(1, 10, 1),
(1, 12, 1),
(1, 15, 1),
(1, 8, 2),
(1, 11, 2),
(2, 9, 1),
(2, 12, 1),
(2, 14, 2),
(2, 18, 2);
Here is the output I would like to have (with the new column cheaper):
id price subgroup cheaper
1 10 1 0 ( because no row is cheaper in id 1 subgroup 1)
1 10 1 0 ( because no row is cheaper in id 1 subgroup 1)
1 12 1 2 ( rows 1 and 2 are cheaper)
1 15 1 3
1 8 2 0 (nobody is cheaper in id 1 and subgroup 1)
1 11 2 2
2 9 1 0
2 12 1 1
2 14 2 2
2 18 2 2
Note that I always want to compare rows to the ones in subgroup 1, even when the rows are themselves in subgroup 2.
You can join the table with itself, using a LEFT JOIN:
SELECT
p.id,
p.price,
p.subgroup,
COUNT(p2.id)
FROM
project p LEFT JOIN project p2
ON p.id=p2.id AND p2.subgroup=1 AND p.price>p2.price
GROUP BY
p.id,
p.price,
p.subgroup
ORDER BY
p.id, p.subgroup
count(p2.id) will count all rows where the join does succeed (and it succeeds where there are cheaper prices for the same id and for the subgroup 1).
The only problem is that you are expecting those two rows:
1 10 1 0
1 10 1 0
but my query will only return one, because I'm grouping by id, price, and subgroup. If you have another unique ID in your project table you could also group by that ID. Please see a fiddle here.
Or you could use an inline query:
SELECT
p.id,
p.price,
p.subgroup,
(SELECT COUNT(*)
FROM project p2
WHERE p2.id=p.id AND p2.subgroup=1 AND p2.price<p.price) AS n
FROM
project p
Related
I have a table of ports:
drop table if exists ports;
create table ports(id int, name char(20));
insert into ports (id, name ) values
(1, 'Port hedland'),
(2, 'Kwinana');
And a table of tariffs connected to those ports:
drop table if exists tariffs;
create table tariffs(id int, portId int, price decimal(12,2), expiry bigint(11));
insert into tariffs (id, portId, price, expiry ) values
(1, 2, 11.00, 1648408400),
(2, 2, 12.00, 1648508400),
(3, 2, 13.00, 1648594800),
(4, 2, 14.00, 1651273200),
(5, 2, 15.00, 2250000000 );
insert into tariffs (id, portId, price, expiry ) values
(1, 1, 21.00, 1648408400),
(2, 1, 22.00, 1648508400),
(3, 1, 23.00, 1648594800),
(4, 1, 24.00, 1651273200),
(5, 1, 25.00, 2250000000 );
Each tariff has an expiry.
I can easily make a query to figure out the right tariff for as specific date for each port. For example at timestamp 1648594700 the right tariff is:
SELECT * FROM tariffs
WHERE 1648594700 < expiry AND portId = 2
ORDER BY expiry
LIMIT 1
Result:
id portId price expiry
3 2 13.00 1648594800
However, in my application I want to be able to pull in the right tariff starting from the ports record.
For one record, I can do this:
SELECT * FROM ports
LEFT JOIN tariffs on tariffs.portId = ports.id
WHERE 1648594700 < tariffs.expiry AND ports.id = 2
LIMIT 1
Result:
id name id portId price expiry
2 Kwinana 3 2 13.00 1648594800
This feels a little 'dirty', especially because I am doing a lookup on a record, and then forcing only one result using LIMIT. But, OK.
What I cannot do, and can't work out how to do, is a query that will return a list of ports, and each port having a price field that matches the constraint above (that is, the record with the highest expiry compared to 1648594700 for each port).
This obviously won't work:
SELECT * FROM ports
left join tariffs on tariffs.portId = ports.id
where 1648594700 < tariffs.expiry
Since the result of the query, testing with timestamp 1648594700, would be:
id name id portId price expiry
2 Kwinana 3 2 13.00 1648594800
2 Kwinana 4 2 14.00 1651273200
2 Kwinana 5 2 15.00 2250000000
1 Port he 3 1 23.00 1648594800
1 Port he 4 1 24.00 1651273200
1 Port he 5 1 25.00 2250000000
Instead, the result for all ports (before further filtering) should be:
id name id portId price expiry
2 Kwinana 3 2 13.00 1648594800
1 Port he 3 1 23.00 1648594800
Is there a clean, non-hacky way to have such a result?
As an added constraint, is this possible for this to be done in ONE query, without temp tables etc.?
You can select the lowest expiry, do your join and only take the rows having this minimum expiry:
SELECT p.id, p.name, t.id, t.portId, t.price, t.expiry
FROM ports p
LEFT JOIN tariffs t ON p.id = t.portId
WHERE expiry = (SELECT MIN(expiry) FROM tariffs WHERE 1648594700 < expiry)
ORDER BY p.id;
This will get your desired result, please see here: db<>fiddle
On MySQL 8+, ROW_NUMBER should work here:
WITH cte AS (
SELECT p.id, p.name, t.price, t.expiry,
ROW_NUMBER() OVER (PARTITION BY p.id ORDER BY t.expiry) rn
FROM ports p
LEFT JOIN tariffs t ON t.portId = p.id
WHERE t.expiry > 1648594700
)
SELECT id, name, price, expiry
FROM cte
WHERE rn = 1
ORDER BY id;
This logic would return one record for each port having the nearest expiry.
Many thanks for any help.
I have a following table consisting of 2 columns service_id and artist_id. Primary Key is (service_id, artist_id). I want to retrieve all the artists for a set of service ids.
Sample Table :
service_id artists_id
5 9
6 9
5 10
1 9
5 1
6 1
6 7
1 10
I tried this and it is not working as it gives all the artists who give either service id 5 or service id 6 or service id 1.
SELECT artists_id FROM `service_schedule` WHERE service_id IN (5, 6, 1)
I want artists who give service id 5 And Service Id 6 And Service Id 1. For the above sample table only artist 9 gives all the 3 services in the set (5,6,1).
So if i want to retrieve all artists who can give services with id 5 and 6 and 1. How do i write a SQL query?
Once try this,
select artist_id from table where service_id IN (5, 6) AND
artist_id IN (select artist_id from table group by artist_id having count(*) > 1);
I am fetching all artist_id which has count greater than 1 and must be service_id with 5 and 6.
I hope this will help.
EDIT
select artists_id from Sample where service_id IN (5, 6, 1)
AND
artists_id IN (select artists_id from Sample group by artists_id
having count(*) > 2) group by artists_id;
You can get, service_ids like, 5,6 or 5,6,1, you just need to take count of service ids and then keep that value - 1 in place of 2
EDIT
select * from Sample where service_id IN (5, 6, 1, 8) and is_available = 1
AND
artists_id IN (select artists_id from Sample where is_available = 1 group by artists_id
having count(*) > 3) group by artists_id;
You have to use a group by query:
select
artist_id
from
service_schedule
group by
artist_id
having
sum(service_id in (5,6,1))=3
service_id in (5,6,1) will be evaluated as 1 when the condition is true, and you want to check if for every artist_id the sum is 3 (all three services are given - arist_id and service_id is unique so no need to check about duplicated rows here)
The question is not much clear to me but is following solution works for you if I understood it clearly.
select s1.service_id, s1. artists_id
from Sample s1 inner join Sample s2
on s1.artists_id = s2.artists_id
where s2.service_id = 5 and s1.service_id = 6
union
select s1.service_id, s1. artists_id
from Sample s1 inner join Sample s2
on s1.artists_id = s2.artists_id
where s2.service_id = 6 and s1.service_id = 5
Sample Output - http://sqlfiddle.com/#!9/7b3a5/16
Try this:
SELECT DISTINCT(s1.artists_id) FROM service_schedule s1
JOIN service_schedule s2 ON s2.service_id = s1.service_id AND s2.artists_id <> s1.artists_id
WHERE s1.service_id IN (5, 6, 1)
I have a list of ids pre-generated that I need to check if exist in a table. My table has two columns, id, name, where id is an auto increment integer and name is a varchar(255).
I basically want to get a count of how many ids do not exist in table foo from my pre-generated list. So say my list has the numbers 5 and 10 in it, what's the best way to write something to the extent of:
select count(*) from foo where id does not exist in ( 5, 10 )
The idea here is that if 5 and 10 do not exist, I need the response 2, and not the number of rows in foo that do not have the id 5 or 10.
TL; DR sample data and queries at rextester
The idea here is that if 5 and 10 do not exist, I need the response 2, and not the number of rows in foo that do not have the id 5 or 10.
You should have provided a little more information to avoid confusion.
Example
id | name
1 | tom
2 | joe
3 | mae
4 | goku
5 | vegeta
If your list contains (1, 2, 3) then your answer should be 0 (since all three are in the table )
If your list contains (1, 2, 6) then your answer should be 1. ( since 1 and 2 are in the table but 6 is in't )
If your list contains (1, 6, 7) then your answer should be 2.
If your list contains (6, 7, 8) then your answer should be 3.
assuming this was your question
If you know the length of your list
select 2 - count(*) as my_count from foo where id in (5, 10)
The following query tells you how many are present in foo.
select count(*) from foo where id in (5,10)
So if you want to find those that do not exist, subtract this result from the length of your list.
select n - count(*) as my_count from foo where id in (5, 10,....)
You could use on fly table using union and the a left join
select count(*)
from my_table as m
left join (
select 5 as id from dual
union
select 10 from dual ) t on t.id = m.id
where t.id is null
otherwise you can populate a tempo table with the value you need and use left join
where the value is null
I have the following data structure:
a table entries with a column entry_id
a table data_int with columns entry_id, question and data
a table data_text with columns entry_id, question and data
a table questions with columns question_id
Now I would like to make a MySQL query that does the following: for a given entry_id (say 222) it should select all question_id q from that table for which there is no row with (entry_id=222 AND question_id=q) in data_int, and also no such row in data_text. Is this possible in a single query, and if so how should I do this?
A sample data set would be
entries:
1
2
data_int:
1, 1, 4
1, 2, 56
1, 6, 43
1, 7, -1
data_text:
1, 3, 'hello'
1, 5, 'world'
questions:
1
2
3
4
5
6
7
8
9
10
Then for entry_id=1, the return value should be 4, 8, 9, 10, since these don't appear in either data_ table for entry_id=1.
For entry_id=2, the return value should be 1,2,3,4,5,6,7,8,9,10 since nothing appears in any of the data_ tables.
There are a couple ways to do this. The more efficient way with mysql is probably using multiple outer join / null checks.
select q.*
from questions q
left join data_int di on q.questionid = di.questionid and di.entryid = 1
left join data_text dt on q.questionid = dt.questionid and dt.entryid = 1
where di.entryid is null and dt.entryid is null
I have 2 tables:
event_categories containing:
event_category_id, event_category
Sample data:
1, Tennis
2, Volleyball
3, Boxing
4, Skating
Then I have a table that joins users that might possibly be linked to any of these categories.
users_event_categories containing
user_id, event_category_id
Sample data:
1223, 2
1223, 4
5998, 2
I need a query that returns ALL event categories, and returns if a user has that category linked.
So if I query with the user_id 1223 my result would be:
1, Tennis, 0
2, Volleyball, 1
3, Boxing, 0
4, Skating, 1
Or a query with user_id 4444 would return:
1, Tennis, 0
2, Volleyball, 0
3, Boxing, 0
4, Skating, 0
This would work if you only want data about one particular user
select ec.event_category_id, ec.event_category, if(uec.user_id is null, 0, 1)
from event_categories ec
left join users_event_categories uec
on uec.event_category_id = ec.event_category_id and uec.user_id = 1223
select tn2.user_id,event_category,count(event_category) as total from table_name1 tn1
inner join table_name2 tn2 on tn1.event_category_id = tn2.event_category_id
where tn2.user_id = 4444
group by event_category