max and count on joing n to n tables - mysql

I'm blocked with a query and I'm needing some help.
If someone could help me I'd appreciate a lot :)
I have two tables (I'm using only one movie for showing the situation):
Table Consumption
client_id movie_id name date_consumption
XXX 1 MovieA 01/Jan/2000
YYY 1 MovieA 01/Jan/2000
ZZZ 1 MovieA 02/Jan/2000
XXX 1 MovieA 02/Jan/2000
ZZZ 1 MovieA 10/Jan/2000
Table movies_owners
movie_id rightowner date_buyed*
A LucasFilm 01/Jan/2000
A Disney 02/Jan/2000
A Sony 05/Jan/2000
**Date_buyed : It's the date where the movie belongs to a new right owner.*
The ideia is simple:
I have to find the count of clients who watched a movie for day with
the correct Right Owner in the day this movie was watched.
Table Expected
movie_id date_consumption rightowner consumption(count)
MovieA 01/Jan/2000 LucasFilm 2
MovieA 02/Jan/2000 Disney 2
MovieA 10/Jan/2000 Sony 1
--
With this query I can find the correct right owner of the movie in some day (max of all buyed dates before the day in question):
SELECT A.movie_id, A.date_buyed, A.rightowner
FROM movies_owners A
WHERE A.date_buyed EXISTS (
SELECT max(date_buyed)
FROM movies_owners
WHERE TO_DATE(date_buyed) <= TO_DATE('2000-01-02') AND movie_id = 'MovieA')
AND movie_id = 'MovieA';
But my problem is when joing with the consumption table.
I can't use the date_consumption from table consumption in a sub query.
I tried to break into a auxiliar table for doing the join, but I still can't find the result. =\
Can someone has, at least, an ideia or suggestion for me please?
Thank you all in advanced.
Juste for info: I'm working with Hive, but the sintax is almost the same from Sql.

Hive does not support non-equijoins. Move join on c.consumption_date<=o.date_buyed condition to the WHERE clause:
select c.movie_id, c.date_consumption, o.rightowner, c.consumption_count
from
(--consumption count per movie, date
select substr(movie_id,6) movie_id, date_consumption, count(*) consumption_count
from consumption
group by substr(movie_id,6), date_consumption
)c
left join movies_owners o on c.movie_id=o.movie_id
where c.consumption_date<=o.date_buyed

Related

Histogram using to data tables (SQL query)

I want to make a histogram of the number of comments per user in January 2019 (including the once that haven't commented)
The tables I'm working with look like this:
id
Name
1
Jose
2
Pedro
3
Juan
4
Sofia
user_id
Comment
Date
1
Hello
2018-10-02 11:00:03
3
Didn't Like it
2018-06-02 11:00:03
1
Not so bad
2018-10-22 11:00:03
2
Trash
2018-7-21 11:00:03
I think I'm overcomplicating it. But here is my try:
#Here I'm counting how much comments are per person that have commented.
CREATE TABLE aux AS
SELECT user_id, COUNT(user_id)
FROM Undostres
GROUP BY user_id;
#With the following code, I end up with a table with the missing values (ids that haven't commented)
CREATE TABLE Test AS
SELECT DISTINCT user_id +1
FROM aux
WHERE user_id + 1 NOT IN (SELECT DISTINCT user_id FROM aux);
ALTER TABLE Test RENAME COLUMN user_id +1 TO ;
INSERT INTO Undostres (user_id)
SELECT user_id FROM Test;
It returns an error when I try to rename user_id+1 with other name. So I can't keep going.
Any suggestions would be great!
I would do it this way:
CREATE TABLE aux AS
SELECT Users.user_id, COUNT(Undostres.user_id) AS count
FROM Users
LEFT OUTER JOIN Undostres USING (user_id)
GROUP BY Users.user_id;
I am assuming you have a table Users that enumerates all your users, whether they have made any comments or not. The LEFT OUTER JOIN helps in this case, because if there are no comments for a given user, the user is still part of the result, and the COUNT is 0.

How to combine the data in 2 table, form the aggregation without duplicating the record by left join

Now I am working with SQL files and have a question:
I would like to review the effect of the promotion campaign with the data in the sql file. In the SQL file there are 2 tables, web traffic and promotion campaign
The web traffic table, let's say table web are as follows
visitor_id purchase date traffic_source campaign_name country purchase_value
1 1/1/2018 Search promotion101 US 100
2 2/1/2018 Direct voucher02 UK 110
3 2/1/2018 Search buyme01 US 50
4 3/1/2018 Banner Example01 DE 130
.. ....... ... ... .. ...
And in the second table I have the campaign information, let's say table promotion
Promotion_date campaign_name num_delivered promotion_fee
1/12/2017 promotion101 50 30
2/12/2017 promotion101 30 20
2/12/2017 voucher02 40 10
3/12/2017 Example01 70 30
... ... ... ...
In this case, I tried to use the left join to merge the table first but the record duplicated
Select
web.campaign_name,
sum(web.promotion_fee),
sum(promotion.purchase_value)
FROM
web LEFT JOIN promotion
ON web.campaign_name = promotion.campaign_name
GROUP BY
1
However, it doesn't work because the left join simply duplicate the record...
In this case, If I want to formulate the table like this:
Campaign_name Traffic_source Total_Customer Total_purchase_value Total expenditure
promotion101 Search 1000 2000 1500
Example01 Banner 2000 3750 3000
Is it possible to do so? If yes then How can I make it?
Many thanks for your help in advance!
You may peform the aggregations of each table in separate subqueries:
SELECT
w.campaign_name,
w.purchase_value AS Total_purchase_value,
COALESCE(p.promotion_fee, 0) AS Total_expenditure
FROM
(
SELECT campaign_name, SUM(purchase_value) AS purchase_value
FROM web
GROUP BY campaign_name
) w
LEFT JOIN
(
SELECT campaign_name, SUM(promotion_fee) AS promotion_fee
FROM promotion
GROUP BY campaign_name
) p
ON w.campaign_name = p.campaign_name;
A critical assumption I have made here is that the web table contains data for all campaigns. If not, then you might have to join to a third table containing all campaigns which happened. Actually, arguably such a table should already exist.

Subquery in sql to solve problems displaying name in exactly 2 subjects

to display the name of all the candidates who got below 40 in exactly 2 subjects using sql
degree(degcode,name,subject)
candidate(seatno,degcode,name)
marks(seatno,dedcode,mark)
Your query can be like below-
SELECT cd.name
FROM degree dg
JOIN candidate cd ON cd.degcode=dg.degcode
JOIN marks mk ON mk.seatno=cd.seatno
WHERE mk.mark < 40
GROUP BY cd.seatno
HAVING COUNT(dg.degcode)=2;
If it does not work for you then can create a sqlfiddle with dummy data for more clairty so that I can modify query as per your requirement.

find out count of comma based value in MySql

I have two tables.
Table Emp
id name
1 Ajay
2 Amol
3 Sanjay
4 Vijay
Table Sports
Sport_name Played by
Cricket ^2^,^3^,^4^
Football ^1^,^3^
Vollyball ^4^,^1^
Now I want to write a query which will give me output like
name No_of_sports_played
Ajay 2
Amol 1
Sanjay 2
Vijay 2
So what will be Mysql query for this?
I agree with the above answers/comments that you are not using a database for what a database is for, but here is how you could calculate your table from your current structure in case you have no control over that:
SELECT Emp.name, IF(Played_by IS NULL,0,COUNT(*)) as Num_Sports
FROM Emp
LEFT JOIN Sports
ON Sports.Played_by RLIKE CONCAT('[[:<:]]',Emp.id,'[[:>:]]')
GROUP BY Emp.name;
See it in action here.
UPDATE: added the IF(Played_by IS NULL,0,COUNT(*)) instead of COUNT(*). This means that if an employee doesn't play anything they'll have a 0 as their Num_Sports. See it here (I also added in those ^ characters and it still works.
What it does is joins the Emp table to the Sports table if it can find the Emp.id in the corresponding Played_by column.
For example, if we wanted to see what sports Ajay played (id=1), we could do:
SELECT *
FROM Emp, Sports
WHERE Sports.Played_by LIKE '%1%'
AND Emp.id=1;
The query I gave as my solution is basically the query above, with a GROUP BY Emp.name to perform it for each employee.
The one modification is the use of RLIKE instead of LIKE.
I use RLIKE '[[:<:]]employeeid[[:>:]]' instead of LIKE '%employeeid%. The [[:<:]] symbols just mean "make sure the employeeid you match is a whole word".
This prevents (e.g.) Emp.id 1 matching the 1 in the Played_by of 3,4,11,2.
You do not want to store your relationships in a column like that. Create this table:
CREATE TABLE player_sports (player_id INTEGER NOT NULL, sport_id INTEGER NOT NULL, PRIMARY KEY(player_id, sport_id));
This assumes you have an id column in your sports table. So now a player will have one record in player_sports for each sport they play.
Your final query will be:
SELECT p.name, COUNT(ps.player_id)
FROM players p, player_sports ps
WHERE ps.player_id = p.id
GROUP BY p.name;

Selecting most recent as part of group by (or other solution ...)

I've got a table where the columns that matter look like this:
username
source
description
My goal is to get the 10 most recent records where a user/source combination is unique. From the following data:
1 katie facebook loved it!
2 katie facebook it could have been better.
3 tom twitter less then 140
4 katie twitter Wowzers!
The query should return records 2,3 and 4 (assume higher IDs are more recent - the actual table uses a timestamp column).
My current solution 'works' but requires 1 select to generate the 10 records, then 1 select to get the proper description per row (so 11 selects to generate 10 records) ... I have to imagine there's a better way to go. That solution is:
SELECT max(id) as MAX_ID, username, source, topic
FROM events
GROUP BY source, username
ORDER BY MAX_ID desc;
It returns the proper ids, but the wrong descriptions so I can then select the proper descriptions by the record ID.
Untested, but you should be able to handle this with a join:
SELECT
fullEvent.id,
fullEvent.username,
fullEvent.source,
fullEvent.topic
FROM
events fullEvent JOIN
(
SELECT max(id) as MAX_ID, username, source
FROM events
GROUP BY source, username
) maxEvent ON maxEvent.MAX_ID = fullEvent.id
ORDER BY fullEvent.id desc;