MySQL: Using Union to split multiple Select queries into respective tables - mysql

I have the following tables:
Product(maker, model, type)
PC(model, speed, ram, hd, price)
Laptop(model, speed, ram, hd, screen, price)
Printer(model, color, type, price)
I need to write a query that will return the average price of all products made by each maker, but only if that average is >= 200, in descending order by price.
I have tried 2 different methods and both get me very close but not exactly what I need:
(SELECT maker, AVG(price) as a
FROM Product NATURAL JOIN PC
WHERE price >= 200
GROUP BY maker)
UNION
(SELECT maker, AVG(price) as b
FROM Product NATURAL JOIN Laptop
WHERE price >= 200
GROUP BY maker)
UNION
(SELECT maker, AVG(price) as c
FROM Product NATURAL JOIN Printer
WHERE price >= 200
GROUP BY maker)
ORDER BY a;
The above gives me the average prices made by each maker for all the products they have made but it is all in one column so you cannot visually tell what product each average is linked to.
SELECT maker,
(SELECT AVG(price)
FROM PC
WHERE price >= 200) as 'Average Cost of PCs',
(SELECT AVG(price)
FROM Laptop
WHERE price >= 200
GROUP BY maker) as 'Average Cost of Laptops',
(SELECT AVG(price)
FROM Printer
WHERE price >= 200
GROUP BY maker) as 'Average Cost of Printers'
FROM Product
GROUP BY maker;
The above successfully gives each type of product its own column and also a column for all the makers, but it gives the average cost for all PCs, Printers, and Laptops in their respective columns instead of the average cost of each made by the maker it is parallel to.
Im not sure which one I am closer to the answer with but I've hit a wall and I'm not sure what to do. If I could get the first code to divide into different columns it would be correct. if I could get the second one to average correctly it would be right.
I am very new to Stack Overflow so I apologize if I did not ask this question in the correct format

You can UNION the "detail" table data, and join to that, and use what is referred to as conditional aggregation (aggregate functions ignore null values) to get your averages:
SELECT p.maker
, AVG(CASE WHEN d.Type = 'PC' THEN d.price ELSE NULL END) AS pcAvg
, AVG(CASE WHEN d.Type = 'Laptop' THEN d.price ELSE NULL END) AS laptopAvg
, AVG(CASE WHEN d.Type = 'Printer' THEN d.price ELSE NULL END) AS printerAvg
FROM Product AS p
LEFT JOIN (
SELECT model, price, 'PC' AS Type FROM PC
UNION SELECT model, price, 'Laptop' AS Type FROM Laptop
UNION SELECT model, price, 'Printer' AS Type FROM Printer
) AS d ON p.model = d.model
GROUP BY p.maker
If you want the average of prices >= 200 you can filter them out in the unioned subqueries or add an AND d.price >= 200 to each WHEN.
If you only want averages >= 200, you need to wrap the query above like so:
SELECT q.maker
, CASE WHEN q.pcAvg >= 200 THEN q.pcAvg ELSE NULL END AS pcAvg
, CASE WHEN q.laptopAvg >= 200 THEN q.laptopAvg ELSE NULL END AS laptopAvg
, CASE WHEN q.printerAvg >= 200 THEN q.printerAvg ELSE NULL END AS printerAvg
FROM (the query above) AS q
;
You cannot omit a column if the value is <= 200, you can only give a different value.
Sidenote: You can actually omit ELSE NULL from a CASE statement, the lack of an else implies else null; I was just being explicit for clarity of example and intent.

Related

Substract the price using Case in Mysql

I have 3 tables: training_schedules, training_discounts, and agents.
training_schedules: id, name, agent_id, price.
training_discounts: id, agent_id, schedule_id, discount.
agents: id, name
I try to subtract the price from training_schedules table with the discount column in training_discounts table like this:
SELECT ts.id, name, training_types, DATE_FORMAT(date_start ,'%d/%m/%Y') as date_start,
DATE_FORMAT(date_end ,'%d/%m/%Y') as date_end, quota, price, td.discount,
CASE price WHEN td.agent_id = 2 THEN (price - td.discount) ELSE price END as total
FROM training_schedules ts
LEFT JOIN training_discounts td on ts.id = td.schedule_id GROUP BY td.schedule_id;
But it doesn't right, the total column is still the same price as before even if agent_id is the same. What can possibly be wrong with my query? Here's the SQLfiddle if needed: http://sqlfiddle.com/#!9/0cd42d/1/0
You don't need to use group by since you are not using any aggregation functions.
SELECT ts.id
, name
, training_types
, DATE_FORMAT(date_start ,'%d/%m/%Y') as date_start
, DATE_FORMAT(date_end ,'%d/%m/%Y') as date_end
, quota, price
, td.discount
, CASE WHEN td.agent_id = 2 THEN price - td.discount ELSE price END as total
FROM training_schedules ts
LEFT JOIN training_discounts td on ts.id = td.schedule_id;
You are also using the select case wrongly. Another option is to use mysql if() function.
if(agent_id = 2, price - td.discount, price) as total

Querying Customers who have rented a movie at least once every week or in the Weekend

I have a DB for movie_rental. The Tables I have are for :
Customer Level:
Primary key: Customer_id(INT)
first_name(VARCHAR)
last_name(VARCHAR)
Movie Level:
Primary key: Film_id(INT)
title(VARCHAR)
category(VARCHAR)
Rental Level:
Primary key: Rental_id(INT).
The other columns in this table are:
Rental_date(DATETIME)
customer_id(INT)
film_id(INT)
payment_date(DATETIME)
amount(DECIMAL(5,2))
Now the question is to Create a master list of customers categorized by the following:
Regulars, who rent at least once a week
Weekenders, for whom most of their rentals come on Saturday and Sundays
I am not looking for the code here but the logic to approach this problem. Have tried quite a number of ways but was not able to form the logic as to how I can look up for a customer id in each week. The code I tried is as follows:
select
r.customer_id
, concat(c.first_name, ' ', c.last_name) as Customer_Name
, dayname(r.rental_date) as day_of_rental
, case
when dayname(r.rental_date) in ('Monday','Tuesday','Wednesday','Thursday','Friday')
then 'Regulars'
else 'Weekenders'
end as Customer_Category
from rental r
inner join customer c on r.customer_id = c.customer_id;
I know it is not correct but I am not able to think beyond this.
First, you don't need the customer table for this. You can add that in after you have the classification.
To solve the problem, you need the following information:
The total number of rentals.
The total number of weeks with a rental.
The total number of weeks overall or with no rental.
The total number of rentals on weekend days.
You can obtain this information using aggregation:
select r.customer_id,
count(*) as num_rentals,
count(distinct yearweek(rental_date)) as num_weeks,
(to_days(max(rental_date)) - to_days(min(rental_date)) ) / 7 as num_weeks_overall,
sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id;
Now, your question is a bit vague on thresholds and what to do if someone only rents on weekends but does so every week. So, I'll just make arbitrary assumptions for the final categorization:
select r.customer_id,
(case when num_weeks > 10 and
num_weeks >= num_weeks_overall * 0.9
then 'Regular' -- at least 10 weeks and rents in 90% of the weeks
when weekend_rentals >= 0.8 * num_rentals
then 'Weekender' -- 80% of rentals are on the weekend'
else 'Hoi Polloi'
end) as category
from (select r.customer_id,
count(*) as num_rentals,
count(distinct yearweek(rental_date)) as num_weeks,
(to_days(max(rental_date)) - to_days(min(rental_date)) ) / 7 as num_weeks_overall,
sum(dayname(r.rental_date) in ('Saturday', 'Sunday')) as weekend_rentals
from rental r
group by r.customer_id
) r;
The problem with the current approach is that every rental of every customer will be treated separately. I am assuming a customer might rent more than once and so, we will need to aggregate all rental data for a customer to calculate the category.
So to create the master table, you have mentioned in the logic that weekenders are customers "for whom most of their rentals come on Saturday and Sundays", whereas regulars are customers who rent at least once a week.
2 questions:-
What is the logic for "most" for weekenders?
Are these two categories mutually exclusive? From the statement it does not seem so, because a customer might rent only on a Saturday or a Sunday.
I have tried a solution in Oracle SQL dialect (working but performance can be improved) with the logic being thus: If the customer has rented more on weekdays than on weekends, the customer is a Regular, else a Weekender. This query can be modified based on the answers to the above questions.
select
c.customer_id,
c.first_name || ' ' || c.last_name as Customer_Name,
case
when r.reg_count>r.we_count then 'Regulars'
else 'Weekenders'
end as Customer_Category
from customer c
inner join
(select customer_id, count(case when trim(to_char(rental_date, 'DAY')) in ('MONDAY','TUESDAY','WEDNESDAY','THURSDAY','FRIDAY') then 1 end) as reg_count,
count(case when trim(to_char(rental_date, 'DAY')) in ('SATURDAY','SUNDAY') then 1 end) as we_count
from rental group by customer_id) r on r.customer_id=c.customer_id;
Updated query based on clarity given in comment:-
select
c.customer_id,
c.first_name || ' ' || c.last_name as Customer_Name,
case when rg.cnt>0 then 1 else 0 end as REGULAR,
case when we.cnt>0 then 1 else 0 end as WEEKENDER
from customer c
left outer join
(select customer_id, count(rental_id) cnt from rental where trim(to_char(rental_date, 'DAY')) in ('MONDAY','TUESDAY','WEDNESDAY','THURSDAY','FRIDAY') group by customer_id) rg on rg.customer_id=c.customer_id
left outer join
(select customer_id, count(rental_id) cnt from rental where trim(to_char(rental_date, 'DAY')) in ('SATURDAY','SUNDAY') group by customer_id) we on we.customer_id=c.customer_id;
Test Data :
insert into customer values (1, 'nonsensical', 'coder');
insert into rental values(1, 1, sysdate, 1, sysdate, 500);
insert into customer values (2, 'foo', 'bar');
insert into rental values(2, 2, sysdate-5, 2, sysdate-5, 800); [Current day is Friday]
Query Output (first query):
CUSTOMER_ID CUSTOMER_NAME CUSTOMER_CATEGORY
1 nonsensical coder Regulars
2 foo bar Weekenders
Query Output (second query):
CUSTOMER_ID CUSTOMER_NAME REGULAR WEEKENDER
1 nonsensical coder 0 1
2 foo bar 1 0
This is a study of cohorts. First find the minimal expression of each group:
# Weekday regulars
SELECT
customer_id
FROM rental
WHERE WEEKDAY(`date`) < 5 # 0-4 are weekdays
# Weekend warriors
SELECT
customer_id
FROM rental
WHERE WEEKDAY(`date`) > 4 # 5 and 6 are weekends
Now we know how to get a listing of customers who have rented on weekdays and weekends, inclusive. These queries only actually tell us that these were customers who visited on a day in the given series, hence we need to make some judgements.
Let's introduce a periodicity, which then allows us to gain thresholds. We'll need aggregation too, so we're going to count the weeks that are distinctly knowable by grouping to the rental.customer_id.
# Weekday regulars
SELECT
customer_id
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) < 5
GROUP BY customer_id
# Weekend warriors
SELECT
customer_id
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) > 4
GROUP BY customer_id
We also need a determinant period:
FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS weeks_in_period
Put those together:
# Weekday regulars
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) < 5
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
GROUP BY customer_id
# Weekend warriors
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
WHERE WEEKDAY(`date`) > 4
GROUP BY customer_id
So now we can introduce our threshold accumulator per cohort.
# Weekday regulars
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(`date`) < 5
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
GROUP BY customer_id
HAVING total_weeks = weeks_as_customer
# Weekend warriors
SELECT
customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(`date`)) AS weeks_as_customer
FROM rental
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
WHERE WEEKDAY(`date`) > 4
GROUP BY customer_id
HAVING total_weeks = weeks_as_customer
Then we can use these to subquery our master list.
SELECT
customer.customer_id
, CONCAT(customer.first_name, ' ', customer.last_name) as customer_name
, CASE
WHEN regulars.customer_id IS NOT NULL THEN 'regular'
WHEN weekenders.customer_id IS NOT NULL THEN 'weekender'
ELSE NULL
AS category
FROM customer
CROSS JOIN (
SELECT FLOOR(DATEDIFF(DATE(NOW()), '2019-01-01') / 7) AS total_weeks
) AS period
LEFT JOIN (
SELECT
rental.customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(rental.`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(rental.`date`) < 5
GROUP BY rental.customer_id
HAVING total_weeks = weeks_as_customer
) AS regulars ON customer.customer_id = regulars.customer_id
LEFT JOIN (
SELECT
rental.customer_id
, period.total_weeks
, COUNT(DISTINCT YEARWEEK(rental.`date`)) AS weeks_as_customer
FROM rental
WHERE WEEKDAY(rental.`date`) > 4
GROUP BY rental.customer_id
HAVING total_weeks = weeks_as_customer
) AS weekenders ON customer.customer_id = weekenders.customer_id
HAVING category IS NOT NULL
There is some ambiguity as far as whether cross-cohorts are to be left out (regulars who missed a week because they rented on the weekend-only at least once, for instance). You would need to work this type of inclusivity/exclusivity question out.
This would involve going back to the cohort-specific queries to introduce and tune the queries to explain that degree of further comprehension, and/or add other cohort cross-cutting subqueries that can be combined in other ways to establish better and/or more comprehensions at the top view.
However, I think what I've provided matches reasonably with what you've provided given this caveat.

Select column(s) corresponding to max/min of another column without joins

I have a table (id, employee_id, device_id, logged_time) [simplified] that logs attendances of employees from biometric devices.
I generate reports showing the first in and last out time of each employee by date.
Currently, I am able to fetch the first in and last out time of each employee by date, but I also need to fetch the first in and last out device_ids of each employee. The entries are not in sequential order of the logged time.
I do not want to (and probably cannot) use joins as in one of the reports the columns are dynamically generated and can lead to thousands of joins. Furthermore, these are subqueries and are joined to other queries to get further details.
A sample setup of the table and queries are at http://sqlfiddle.com/#!9/3bc755/4
The first one just shows lists the entry and exit time by date of every employee
select
attendance_logs.employee_id,
DATE(attendance_logs.logged_time) as date,
TIME(MIN(attendance_logs.logged_time)) as entry_time,
TIME(MAX(attendance_logs.logged_time)) as exit_time
from attendance_logs
group by date, attendance_logs.employee_id
The second one builds up an attendance chart given a date range
select
`attendance_logs`.`employee_id`,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_18,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_18,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_18,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_19,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_19,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_19
/*
* dynamically generated columns for dates in date range
*/
from `attendance_logs`
where `attendance_logs`.`logged_time` >= '2017-09-18 00:00:00' and `attendance_logs`.`logged_time` <= '2017-09-19 23:59:59'
group by `attendance_logs`.`employee_id`;
Tried:
Similar to max and min logged_time of each date using case, tried to select the device_id where logged_time is max/min.
```MIN(case
when
`attendance_logs.logged_time` = MIN(
case when DATE(`attendance_logs`.`logged_time`)
= '2017-09-18' THEN `attendance_logs`.`logged_time` END
)
then `attendance_logs`.`device_id` end) as entry_device_2017_09_18 ```
This results in invalid use of group by
A quick hack for your query to pick the device id for in and out by using GROUP_CONCAT with in SUBSTRING_INDEX
SUBSTRING_INDEX(GROUP_CONCAT(case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END ORDER BY `l`.`device_id` desc),',',1) exit_device_2017_09_18,
Or if device id will be same for each in and its out then simply it can be written with GROUP_CONCAT only
GROUP_CONCAT(DISTINCT case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END)
DEMO
To avoid joins I suggest you try "correlated subqueries" instead:
select
employee_id
, logdate
, TIME(entry_time) entry_time
, (select MIN(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.entry_time) entry_device
, TIME(exit_time) exit_time
, (select MAX(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.exit_time) exit_device
from (
select
attendance_logs.employee_id
, DATE(attendance_logs.logged_time) as logdate
, MIN(attendance_logs.logged_time) as entry_time
, MAX(attendance_logs.logged_time) as exit_time
from attendance_logs
group by
attendance_logs.employee_id
, DATE(attendance_logs.logged_time)
) d
;
see: http://sqlfiddle.com/#!9/06e0e2/3
Note: I have used MIN() and MAX() on those subqueries only to avoid any possibility that these return more than one value. You could use limit 1 instead if you prefer.
Note also: I do not normally recommend correlated subqueries as they can cause performance issues, but they do supply the data you need.
oh, and please try to avoid using date as a column name, it isn't good practice.

Error code 1248, SQL state 42000: Every derived table must have its own alias

I am running this query on MySQL.
Select Vendor, sum(Rate) as Rate
from (select case Vendor when 'NSN' then 'Nokia' else Vendor end as Vendor, Rate
from ( Select vendor ,(count(1) )*100/(Select count(id_incident)from incident where open_time between '2015-01-01'and'2015-01-30') as Rate from incident where open_time between '2015-01-01'and'2015-01-30'group by upper (vendor) )) as y group by vendor;
and it is giving this error:
Error code 1248, SQL state 42000: Every derived table must have its own alias".
what's the problem?
You forgot to give the inner subquery an alias. I chose x
Select Vendor, sum(Rate) as Rate
from
(
select case Vendor when 'NSN' then 'Nokia' else Vendor end as Vendor, Rate
from
(
Select vendor ,(count(1) )*100/(Select count(id_incident)from incident where open_time between '2015-01-01'and'2015-01-30') as Rate
from incident
where open_time between '2015-01-01'and'2015-01-30'
group by upper (vendor)
) as x
) as y
group by vendor;
Your query is way more complicated than it needs to be:
Select (case when vendor = 'NSN' then 'NOKIA' else upper(vendor) end) as vendor,
count(*)*100 / overall.cnt as Rate
from incident i cross join
(Select count(*) as cnt
from incident
where open_time between '2015-01-01'and'2015-01-30'
) overall
where open_time between '2015-01-01'and'2015-01-30'
group by (case when vendor = 'NSN' then 'NOKIA' else upper(vendor) end)
Comments:
Subqueries (in MySQL) incur extra overhead for materializing the intermediate results.
A subquery in the select clause is called once for each row. In the from, it is only calculated once.
If id_incident is not NULL, then just use count(*) (or count(1)). It is misleading to put a column in the count(), when you are not really checking for NULL values.

Showing taxes per invoice items from database

I have problem with getting tax values from database. I will simplify it as possible i can.
First table is
Invoices
(
`Id`,
`Date`,
`InvoiceNumber`,
`Total`
)
Second table is
`InvoiceItems`
(
`Id`,
`Total`,
`TotalWithoutTax`,
`TotalTax`,
`InvoiceId`
)
InvoiceId is a foreign key for Id column from previous table Invoices
Third table is
`InvoiceItemTaxes`
(
`Id`,
`TaxAmmount`,
`InvoiceItemId`,
`TaxId`
)
and fourth table
`Taxes`
(
`Id`,
`Value`
)
This last table contains three taxes, let's say 3, 10 and 15 percent.
I am trying to get something like this - table with columns InvoiceNumber, Total without taxes, Tax1, Tax2, Tax3 and Total with taxes.
I tried a lot of different approaches and i simply cannot get tax amount for every invoice. End result would be table where i can see every invoice with specified amounts of every tax (sum of each tax amount for every invoice item).
If I'm understanding correctly, you can use conditional aggregation with sum and case to get the breakdown by tax group:
select i.id, i.invoicenumber, i.total as pretaxtotal,
sum(case when t.value = 3 then iit.TaxAmmount end) taxes_3,
sum(case when t.value = 10 then iit.TaxAmmount end) taxes_10,
sum(case when t.value = 15 then iit.TaxAmmount end) taxes_15,
sum(ii.Total) as overalltotal
from invoices i
join InvoiceItems ii on i.id = ii.invoiceid
join InvoiceItemTaxes iit on ii.id = iit.InvoiceItemId
join Taxes t on t.id = iit.taxid
group by i.id, i.invoicenumber, i.total
Some of the fields may be a little off -- the sample data was not complete.