Query for joining and counting some data from multiple tables - mysql - mysql

I want for given date time range to get:
Airport name,
Number of flights of that airport (number of times where that airport is arrivalAirport in Flight, while departureDate of that same Flight is in the given datetime range),
Number of sold tickets of that airport (dateOfSale has to be in the given datetime range) and
Total price of those sold tickets of that airport.
I have 3 tables (Airport, Flight and Ticket):
Airport data:
id | name
1 , Madrid Airport
2 , Amsterdam Airport
3 , Belgrade Airport
----------------------------------------------
Flight data:
id | number | departureDate | arrivalDate | departureAirport | arrivalAirport | price
1 , 101 ,2019-01-29 19:21:44,2019-01-29 22:21:44, Madrid Airport , Amsterdam Airport, 600
2 , 102 ,2019-01-29 22:21:44,2019-01-30 00:21:44, Madrid Airport , Belgrade Airport , 450
3 , 103 ,2019-01-30 20:21:44,2019-01-30 22:21:44, Belgrade Airport , Amsterdam Airport, 555
4 , 104 ,2019-02-10 20:21:44,2019-02-10 22:21:44, Belgrade Airport , Madrid Airport , 555
----------------------------------------------
Ticket data:
id | FlightId | dateOfSale
1 , 3 , 2019-01-23 00:00:00
2 , 3 , 2019-01-27 10:00:00
3 , 1 , 2019-01-27 13:00:00
Example datetime range:
Minimal: 2019-01-25 19:21:44
Maximal: 2019-02-02 00:00:00
With the given datetime range, only fourth flight will not pass condition because its departureDate is not in the given range, other three flights will pass.
So now, we have Amsterdam Airport (x2) and Belgrade Airport (x1) as an arrivalAirports.
So first two columns should be represented like this:
Name of Airport | Number of Flights |
Amsterdam Airport, 2
Belgrade Airport , 1
Third one represents number of sold tickets while dateOfSale is also in the given datetime range.
In the ticket's table 3 tickets are sold, first two are bought for flight with id=3 and third ticket is bought for flight with id=1.
Since first ticket's dateOfSale is not in the given datetime range, only 2nd and 3rd ticket will pass, and they both represent arrivalAirport called Amsterdam Airport.
So end result should be:
Name of Airport | Number of Flights | Number of sold tickets | Total price
Amsterdam Airport, 2 , 2 , 1155
Belgrade Airport , 1 , 0 , 0
I tried something like this:
select a.name, count(f.arrivalAirport) as 'number of flights',
count(t.dateOfSale) as 'number of sold tickets', sum(f.price) from airport a,
flight f, ticket t where a.name = f.arrivalAirport and f.departureDate >
'2019-01-25 19:21:44' and f.departureDate < '2019-02-02 00:00:00' and
t.flightId = f.id and t.dateOfSale > '2019-01-25 19:21:44' and t.dateOfSale
< '2019-02-02 00:00:00' group by a.name;
Doing only this, it counts number of sold tickets exactly like it should but number of flights are wrong.
I am not sure how to proceed any further. What am I missing in this query and can it be done like this or it has to include join (with which I have problems also)?

Related

Grouping in Mysql

i need to get the top touristCount in each month like January Zambia has 4 touristCount i need to select only Zambia for January and so on
user
`useri_id` | `username` | `email` | `nationality`
1 Joseph `` US
2 Abraham. `` UK
3 g.wood '' Zambia
4 Messi. '' France
5 Ronaldo. '' Namibia
6 Pogba. '' Holand.
bookings
booking_id | user_id | booking_date | tour_id
1 1 2022-01-01 1
2 1 2022-01-01 6
3 1 2022-05-01 2
4 3 2022-01-01 5
5 2 2022-04-01 5
6 2 2022-11-01 7
7 3 2022-12-01 2
8 6 2022-01-01 1
this is what i have tried
SELECT s.nationality AS Nationality,
COUNT(b.tourist_id) AS touristsCount,
MONTH(STR_TO_DATE(b.booked_date, '%d-%m-%Y')) AS `MonthNumber`
FROM bookings b, users s
WHERE s.user_id = b.tourist_id
AND YEAR(STR_TO_DATE(b.booked_date, '%d-%m-%Y')) = '2022'
GROUP BY Nationality,MonthNumber
order BY MonthNumber ASC
LIMIT 100
i need the results to be like
nationality | TouritIdCount | MonthNumber
US 2 01
UK 1 04
US 1 05
UK 1 11
ZAMBIA 1 12
Try this :
SELECT nationality, COUNT(booking_id) AS TouristIdCount, MONTH(booking_date) AS MonthNumber
FROM users u
JOIN bookings b ON u.user_id = b.user_id
WHERE YEAR(booking_date) = 2022
GROUP BY nationality, MonthNumber
ORDER BY TouristIdCount DESC, MonthNumber ASC
you can use
having COUNT(b.tourist_id) >= 2
You want to count bookings per month and tourist's nationality and then show only the top nationality (or nationalities) per month.
There are two very similar approaches:
Rank the nationalities' booking counts per month with RANK and only show the best ranked rows.
Select the top booking count per month and only show rows matching their top count.
The following query uses the second method. It shows one row per month and top booking nationality. Often there may be excatly one row for a month showing the one top booking nationality, but there may also be months where nationalities tie and share the same top booking count, in which case we see more than one row for a month.
select year, month, nationality, booking_count
from
(
select
year(b.booking_date) as year,
month(b.booking_date) as month,
u.nationality,
count(*) as booking_count,
max(count(*)) over (partition by year(b.booking_date), month(b.booking_date)) as months_max_booking_count
from bookings b
join users u on u.user_id = b.tourist_id
group by year(b.booking_date), month(b.booking_date), u.nationality
) ranked
where booking_count = months_max_booking_count
order by year, month, nationality;
As your own sample data doesn't contain any edge cases, here is some other sample data along with my query's result and an explanation. (In other words, this is what you should have shown in your request ideally.)
users
user_id
username
email
nationality
1
Joseph
joseph#mail.us
US
2
Mary
mary#mail.us
US
3
Abraham
abraham#mail.uk
UK
bookings
booking_id
user_id
booking_date
tour_id
1
1
2022-01-11
1
2
2
2022-01-11
1
3
3
2022-01-11
1
4
3
2022-01-22
2
5
1
2022-05-01
3
6
2
2022-05-01
3
7
1
2022-05-12
4
8
2
2022-05-12
4
9
3
2022-05-14
5
10
3
2022-05-20
6
11
3
2022-05-27
7
result
year
month
nationality
booking_count
2022
1
UK
2
2022
1
US
2
2022
5
US
4
In January there were two tours, but we are not interested in tours. We see four bookings, two by the Americans, two by the Britsh person. This is a tie, and we show two rows, one for UK and one for US with two bookings each.
In May there were five tours, but again, we are not interested in tours. There are seven bookings, four by the Americans, three by the Britsh person. So we only show US as the top country with four bookings here.

Version control elements in a database-table using MySQL

I am developing a database using MySQL. Database (shopping) consists of one table (basket). When the customer comes in and purchase groceries those to be stored in the table by unique customer Id. For example, a customer named ABC came to my store, purchased 10 apples, and made bill. After 4 hours, he returned four apples to the shop, took money of them, and left. Therefore, I need to version control those records in my table, to see the track of the customer.
MySQL code:
SHOW DATABASES;
CREATE DATABASE shopping;
USE shopping;
CREATE TABLE basket (Customer_ID VARCHAR (20) NOT NULL, Phone INT, grocery VARCHAR (10), QTY INT, Timestamp, Version VARCHAR (10));
The table looks like
Customer_ID Phone Grocery QTY Timestamp Version
ABC 34567 Apple 10 1/20/2020 7:00 am A1
ABC 34567 banana 5 1/20/2020 7:00 am B1
ABC 34567 oranges 4 1/20/2020 7:00 am O1
DEF 12345 jelly 10 1/20/2020 8:00am J1
DEF 12345 pineapple 6 1/20/2020 8:00am P1
GHI 67854 juice 4 1/20/2020 9:00 am J1
GHI 67854 icecream 6 1/20/2020 9:00 am I1
ABC 34567 Apple -4 1/20/2020 11:00 am A2
No I need to get the second instance of customer ABC when he return four apples to the shop and version it as A2, so i could track all the changes on each customer.

How to display data without using JOIN and Counting Number of employees?

I am having trouble with this question I can't seem to get the count correct
on each department and only select the highest one as well as excluding
"DALLAS"
THIS IS THE QUESTION
"Write a SQL statement to display the name and location of all departments
(except the departments located in Dallas) with the highest number of
employees.
You cannot use join operations in your SQL statement (e.g., … FROM department,
employee WHERE …, department INNER JOIN employee ON …)."
DEPARTMENT_ID DEPARTMENT_NAME LOCATION
------------- -------------------- --------------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 IT DALLAS
50 EXECUTIVE NEW YORK
60 MARKETING CHICAGO
6 rows selected
EMPLOYEE_ID EMPLOYEE_NAME JOB_TITLE SUPERVISOR_ID HIRE_DATE SALARY COMMISSION DEPARTMENT_ID
----------- -------------------- -------------------------------------------------- ------------- --------- ---------- ---------- -------------
7839 KING PRESIDENT 20-NOV-01 5000 50
7596 JOST VICE PRESIDENT 7839 04-MAY-01 4500 50
7603 CLARK VICE PRESIDENT 7839 12-JUN-01 4000 50
7566 JONES CHIEF ACCOUNTANT 7596 05-APR-01 3000 10
7886 STEEL PUBLIC ACCOUNTANT 7566 08-MAR-03 2500 10
7610 WILSON BUSINESS ANALYST 7596 03-DEC-01 3000 20
7999 WOLFE TEST ANALYST 7610 15-FEB-02 2500 20
7944 LEE REPORTING ANALYST 7610 04-SEP-06 2400 20
7900 FISHER SALES EXECUTIVE 7603 06-DEC-01 3000 500 30
7921 JACKSON SALES REPRESENTATIVE 7900 25-FEB-05 2500 400 30
7952 LANCASTER SALES CONSULTANT 7900 06-DEC-06 2000 150 30
7910 SMITH DATABASE ADMINISTRATOR 7596 20-DEC-01 2900 40
7788 SCOTT PROGRAMMER 7910 15-JAN-03 2500 40
7876 ADAMS PROGRAMMER 7910 15-JAN-03 2000 40
7934 MILLER PROGRAMMER 7876 25-JAN-02 1000 40
8000 BREWSTER TBA 22-AUG-13 2500
8100 PHILLIPS TBA 7839 21-AUG-13 2800
7400 SMITH VICE PRESIDENT 7839 16-FEB-01 4300 50
7700 ANDRUS PUBLIC ACCOUNTANT 7566 18-FEB-02 2500 10
7601 SAMPSON PROGRAMMER 7910 09-JAN-01 2500 40
7588 DODSON TEST ANALYST 7610 02-AUG-08 2500 20
7888 SANDY SALES CONSULTANT 7900 05-AUG-04 2500 30
22 rows selected
SELECT DEPARTMENT_NAME,
location,
count(*)
FROM DEPARTMENT
WHERE department_id IN ( SELECT department_id
FROM department
WHERE UPPER(location) <> 'DALLAS'
)
group by department_NAME, location
ORDER BY location;
DEPARTMENT_NAME LOCATION COUNT(*)
-------------------- -------------------- ----------
MARKETING CHICAGO 1
SALES CHICAGO 1
ACCOUNTING NEW YORK 1
EXECUTIVE NEW YORK 1
you can try using sub-queries if you are limited in not using joins
SELECT *
FROM (SELECT d.department_name,
d.location,
(SELECT COUNT(employee_id)
FROM employee e
WHERE e.department_id = d.department_id) no_employees
FROM department d
WHERE d.location <> 'DALLAS'
) t
WHERE no_employees = (SELECT COUNT(employee_id)
FROM employee
WHERE department_id IN (SELECT DISTINCT department_id
FROM department
WHERE location <> 'DALLAS')
GROUP BY department_id
ORDER BY 1 DESC
LIMIT 1)
Result
department_name location no_employees
SALES CHICAGO 4
EXECUTIVE NEW YORK 4
Am trying to the find the department with maximum count and then retrieve the corresponding name and location without using joins
SELECT (SELECT DEPARTMENT_NAME, location
FROM DEPARTMENT
WHERE department_id = q.department_id) ,
q.ct countofdept
FROM
(SELECT count(*) ct, department_id
FROM EMPLOYEE
WHERE department_id in ( SELECT department_id
FROM department
WHERE UPPER(location) <> 'DALLAS'
)
GROUP BY department_id
ORDER BY ct desc
LIMIT 1) q

MySQL self join to get past averages

I am trying to find an average of past records in the database based on a specific time frame (between 9 and 3 months ago) if there is no value recorded for a recent sale. the reason for this is recent sales on our website sometimes do not immediately collect commissions so i am needing to go back to historic records to find out what a commission rate estimate might be.
Commission rate is calculated as:
total_commission / gross_sales
It is only necessary to find out what an estimate would be if a recent sale has no "total_commission" recorded
here is what i have tried so far but i think this is wrong:
SELECT
cs.*
,SUM(cs2.gross_sales)
,SUM(cs2.total_commission)
FROM
(SELECT
sale_id
, date
, customer_code
, customer_country
, gross_sales
, total_commission
FROM customer_sale cs ) cs
LEFT JOIN customer_sale cs2
ON cs2.customer_code = cs.customer_code
AND cs2.customer_country = cs.customer_country
AND cs2.date > cs.date - interval 9 month
AND cs2.date < cs.date - interval 3 month
GROUP BY cs.sale_id
so that data would be structured as follows:
sale_id date customer_code customer_country gross_sales total_commission
1 2013-12-01 cust1 united states 10000 1500
2 2013-12-01 cust2 france 20000 3000
3 2013-12-01 cust3 united states 15000 2250
4 2013-12-01 cust4 france 14000 2100
5 2013-12-01 cust5 united states 13000 1950
6 2013-12-01 cust6 france 12000 1800
7 2014-04-02 cust1 united states 10000
8 2014-04-02 cust2 france 20000
9 2014-04-02 cust3 united states 15000
10 2014-04-02 cust4 france 14000
11 2014-04-02 cust5 united states 13000
12 2014-04-02 cust6 france 12000
so I would need to output results from the query similar to this: (based on sales between 9 and 3 months ago from the same customer_code in the same customer_country)
sale_id date customer_code customer_country gross_sales total_commission gross_sales_past total_commission_past
1 2013-12-01 cust1 united states 10000 1500
2 2013-12-01 cust2 france 20000 3000
3 2013-12-01 cust3 united states 15000 2250
4 2013-12-01 cust4 france 14000 2100
5 2013-12-01 cust5 united states 13000 1950
6 2013-12-01 cust6 france 12000 1800
7 2014-04-02 cust1 united states 10000 10000 1500
8 2014-04-02 cust2 france 20000 20000 3000
9 2014-04-02 cust3 united states 15000 15000 2250
10 2014-04-02 cust4 france 14000 14000 2100
11 2014-04-02 cust5 united states 13000 13000 1950
12 2014-04-02 cust6 france 12000 12000 1800
Your query looks mostly right, but I think your outer query needs to be GROUP BY cs.sale_id (assuming that sale_id is unique in the customer_sale table, and assuming that the date column is datatype DATE, DATETIME, or TIMESTAMP).
And I think you want to include a join predicate so that you match only match "past" rows to those rows where you don't have a total commission, e.g.
AND cs.total_commission IS NULL
And I don't think you really need an inline view.
Here's what I came up with:
SELECT cs.sale_id
, cs.date
, cs.customer_code
, cs.customer_country
, cs.gross_sales
, cs.total_commission
, SUM(ps.gross_sales) AS gross_sales_past
, SUM(ps.total_commission) AS total_commission_past
FROM customer_sale cs
LEFT
JOIN customer_sale ps
ON ps.customer_code = cs.customer_code
AND ps.customer_country = cs.customer_country
AND ps.date > cs.date - INTERVAL 9 MONTH
AND ps.date < cs.date - INTERVAL 3 MONTH
AND cs.total_commission IS NULL
GROUP
BY cs.sale_id
Appropriate indexes will likely improve performance of the query. Likely, the EXPLAIN output will show "Using temporary; Using filesort", and that can be expensive for large sets.
MySQL will likely be able to make use of a covering index for the JOIN:
... ON customer_sale (customer_code,customer_country,date,gross_sales,total_commission).

MySQL AVG and Grouping

I'm struggling with a MySQL statement and was hoping for some guidance as I am close, but not quite there. I have a database that contains a table of property addresses and of property rental listings. The property addresses are related to a table or regions, which is related to a table of districts, which is then related to a table of suburbs.
I am trying to create a result which gives me the average rent in each suburb per month and by the number of bedrooms.
For example:
District Suburb Month YEAR YMD Bedrooms DataAverage
Nelson The Brook 01 2012 2012-01-01 00:00 1 190
Nelson The Brook 01 2012 2012-01-01 00:00 2 274
Nelson The Brook 01 2012 2012-01-01 00:00 3 341
Which I can then convert into a table as follows:
Average Rent
Beds by Suburb Jan-12 Feb-12 Mar-12 Apr-12 May-12 Jun-12 Jul-12
The Brook
1 $150 $245 $160 $285 $135 $370 $350
2 $330 $340 $380 $310 $335 $345 $355
3 $350 $380 $310 $395 $380 $350 $350
Inner City
1 $160 $245 $260 $285 $295 $300 $350
2 $360 $440 $480 $410 $535 $545 $555
3 $370 $480 $510 $595 $480 $450 $550
My Current SQL query is this:
SELECT d.name as District, s.name AS Suburb,
FROM_UNIXTIME(l.StartDate,'%m') AS Month,
FROM_UNIXTIME(l.StartDate,'%Y') AS YEAR,
FROM_UNIXTIME(l.StartDate, '%Y-%m-01 00:00') AS YMD,
p.Bedrooms,
REPLACE(FORMAT(AVG(l.RentPerWeek),0),',','') AS DataAverage
FROM properties p
LEFT JOIN listings l on l.property_id=p.id
LEFT JOIN regions r on p.region_id=r.id
LEFT JOIN districts d on d.region_id=r.id
LEFT JOIN suburbs s on s.district_id=d.id
WHERE FROM_UNIXTIME(l.StartDate) BETWEEN DATE(NOW()) - INTERVAL (DAY(NOW()) - 1) DAY - INTERVAL 11 MONTH AND NOW()
GROUP BY District, Suburb, Year, Month, Bedrooms
ORDER BY District, Suburb ASC, YMD ASC, Bedrooms ASC
Unfortunately what I am getting is the same result for each and every suburb. I think I may need to create a subquery SQL statement to get this to work properly, but I'm not entirely sure.
So I am getting something like this:
District Suburb Month YEAR YMD Bedrooms DataAverage
Nelson The Brook 01 2012 2012-01-01 00:00 1 190
Nelson The Brook 01 2012 2012-01-01 00:00 2 330
Nelson The Brook 01 2012 2012-01-01 00:00 3 350
Nelson The Brook 02 2012 2012-02-01 00:00 1 245
Nelson The Brook 02 2012 2012-02-01 00:00 2 340
Nelson The Brook 02 2012 2012-02-01 00:00 3 380
...
Nelson Inner City 01 2012 2012-01-01 00:00 1 190
Nelson Inner City 01 2012 2012-01-01 00:00 2 330
Nelson Inner City 01 2012 2012-01-01 00:00 3 350
Nelson Inner City 02 2012 2012-02-01 00:00 1 245
Nelson Inner City 02 2012 2012-02-01 00:00 2 340
Nelson Inner City 02 2012 2012-02-01 00:00 3 380
.etc.
Average Rent
Beds by Suburb Jan-12 Feb-12 Mar-12 Apr-12 May-12 Jun-12 Jul-12
The Brook
1 $150 $245 $160 $285 $135 $370 $350
2 $330 $340 $380 $310 $335 $345 $355
3 $350 $380 $310 $395 $380 $350 $350
Inner City
1 $150 $245 $160 $285 $135 $370 $350
2 $330 $340 $380 $310 $335 $345 $355
3 $350 $380 $310 $395 $380 $350 $350
Any pointers or assistance would be greatly appreciated.
Assuming that id is the primary key of each table, then according to your query text, a property is associated with a region, by virtue of the region_id column on the properties table:
FROM properties p
LEFT
JOIN regions r
ON p.region_id=r.id
A district is associated with a region (presumably, a district is a subdivision of a region.)
LEFT
JOIN districts d
ON d.region_id=r.id
and a suburb is associated with a district (presumably, a suburb is a subdivision of a district.)
LEFT
JOIN suburbs s
ON s.district_id=d.id
The net result is that every property within a region is getting associated with EVERY district within that region, and associated with EVERY suburb within each district.
So, you are getting the rent values averaged for all properties within a region.
To get rent values per suburb, you really need the relationship between a property and its suburb.
What you really need is a suburb_id column on the properties table as a foreign key to the suburbs table.
LEFT
JOIN suburbs s
ON s.district_id=d.id
AND s.id = p.suburb_id