Correct method to optimise MySQL query - mysql

I've been struggling with a query which selects from multiple tables. My original query was incredibly slow (53 seconds). From reading up, I'm now reasonably sure that I need to create an inner query to limit the data which is iterated over. But I'm not sure how to use the result of the subquery (inner query) when using more than 2 tables. Below are some dummy tables:
+-------+---------------------+------------+
| tr_id | tr_datecreated | tr_depart |
+-------+---------------------+------------+
| 1 | 2011-07-31 00:00:00 | 2011-08-20 |
| 2 | 2011-08-01 00:00:00 | 2011-08-30 |
| 3 | 2011-08-02 00:00:00 | 2011-09-01 |
+-------+---------------------+------------+
+------+--------+---------+---------+
| p_id | p_trid | p_name | p_lname |
+------+--------+---------+---------+
| 1 | 1 | Geoff | Thingy |
| 2 | 1 | Mildred | Thingy |
| 3 | 1 | Garry | Thingy |
| 4 | 2 | Linda | Doobrey |
| 5 | 2 | Kev | Doobrey |
| 6 | 3 | John | Wotsit |
| 7 | 3 | Jill | Wotsit |
+------+--------+---------+---------+
+------+--------+----------+
| h_id | h_trid | h_dest |
+------+--------+----------+
| 1 | 1 | France |
| 2 | 1 | Spain |
| 3 | 2 | Italy |
| 4 | 3 | Portugal |
+------+--------+----------+
I want to get a result such as:
+-------+---------------------+------------+---------+---------+----------+
| tr_id | tr_datecreated | tr_depart | p_name | p_lname | h_dest |
+-------+---------------------+------------+---------+---------+----------+
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Geoff | Thingy | France |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Geoff | Thingy | Spain |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Mildred | Thingy | France |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Mildred | Thingy | Spain |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Garry | Thingy | France |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Garry | Thingy | Spain |
| 2 | 2011-08-01 00:00:00 | 2011-08-30 | Linda | Doobrey | Italy |
| 2 | 2011-08-01 00:00:00 | 2011-08-30 | Kev | Doobrey | Italy |
| 3 | 2011-08-02 00:00:00 | 2011-09-01 | John | Wotsit | Portugal |
| 3 | 2011-08-02 00:00:00 | 2011-09-01 | Jill | Wotsit | Portugal |
+-------+---------------------+------------+---------+---------+----------+
where we get a separate row for each person for each holiday destination.
My original effort was in the form of:
SELECT tr_id, tr_datecreated, tr_depart, p_name, p_lname, h_dest
FROM transaction, people, holiday
WHERE tr_id = p_trid
AND tr_id = h_trid
AND tr_datecreated >= "2010-12-12 00:00:00"
AND tr_datecreated <= "2012-12-12 00:00:00"
I think that this created a huge number of cross joins and the query ran very slowly.
Seeing as the tr_id is being referenced a number of times I wanted to do an inner query which reduced the number of rows that everything else was compared to.
So the inner query part will be:
SELECT tr_id WHERE tr_datecreated >= "2010-12-12 00:00:00"
AND tr_datecreated <= "2012-12-12 00:00:00"
How would I create my desired table which I would want to compare both the p_trid and the h_trid against the same inner query without running that inner query twice (if possible)?
Would inner joins help in this situation? (I have read through but haven't fully absorbed it yet).
Grateful for any advice and suggestions here. The database is large and I need to be efficient.
Edit
Indexes:
tr_id, h_id and p_id are all primary keys
Result of EXPLAIN
+----+-------------+--------------+--------+---------------+---------+---------+---------------------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------+---------+---------+---------------------+------+--------------------------------+
| 1 | SIMPLE | holiday | ALL | NULL | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | people | ALL | NULL | NULL | NULL | NULL | 7 | Using where; Using join buffer |
| 1 | SIMPLE | transactions | eq_ref | PRIMARY | PRIMARY | 4 | db.people.p_trid | 1 | Using where |
+----+-------------+--------------+--------+---------------+---------+---------+---------------------+------+--------------------------------+

I think that this should work. Let me know if it works.
Total Query
SELECT t.id, t.date, t.depart, p.p_name, p.p_lname, h.h_dest
FROM
(SELECT tr_id 'id', tr_datecreated 'date', tr_depart 'depart' FROM transaction
WHERE DATE(tr_datecreated) BETWEEN DATE("2010-12-12 00:00:00")
AND DATE("2012-12-12 00:00:00")) t
JOIN people p ON t.id = p.p_trid
JOIN holiday h ON t.id = h.h_trid;
Inner Query
(SELECT tr_id 'id', tr_datecreated 'date', tr_depart 'depart' FROM transaction
WHERE DATE(tr_datecreated) BETWEEN DATE("2010-12-12 00:00:00")
AND DATE("2012-12-12 00:00:00"))
Edit: Subquery explanation
The subquery selects the id, date created, and depart columns from the transaction table for the date range that you listed above. The 't' outside the right paren at the end of query lets you alias the inner query so you can use its data above. Also, where I have 'id','date', and 'depart' inside the subquery is also aliasing. It lets you use those values without typing out the full column name.
Hope this helped.

have you tried joins?
SELECT tr.tr_id, tr.tr_datecreated, tr.tr_depart, p.p_name, p.p_lname, h.h_dest
FROM transaction tr
join people p on tr.tr_id = p.p_trid
join holiday h on tr.tr_id = h.h_trid
WHERE tr_datecreated >= "2010-12-12 00:00:00"
AND tr_datecreated <= "2012-12-12 00:00
haven't tested this yet but that's the general idea.

I suggest to add an index on people.p_trid and holiday.h_trid. The EXPLAIN clearly shows that there is no index used for both tables.
Also make sure that datatype of transactions.tr_id, people.p_trid and holiday.h_trid is the same.

Related

mysql return one row of right table

I am facing a huge problem with MYSQL.
I have a table called tperson with the following content
+--------------+------------+
| tperson_id | first_name |
+--------------+------------+
| 1 | juan |
| 2 | miguel |
| 3 | Carlos |
| 4 | Diego |
+--------------+------------+
on the second table i have this data
+--------------+------------+------------+
| tperson_id | trans_code | date_added |
+--------------+------------+------------+
| 1 | 2000-01 |2020/03/03 |
| 1 | 2000-02 |2020/03/04 |
| 2 | 1999-05 |2019/12/25 |
| 3 | 1999-06 |2019/12/26 |
| 3 | 1999-07 |2019/12/27 |
+--------------+------------+------------+
Now I want to have this result in mysql
+--------------+------------+------------+------------+
| tperson_id | first_name | trans_code | date_added |
+--------------+------------+------------+------------+
| 1 | juan |2000-02 | 2020/03/04 |
| 2 | miguel |1999-05 | 2019/12/25 |
| 3 | Carlos |1999-07 | 2019/12/27 |
| 4 | Diego | null | null |
+--------------+------------+------------+------------+
what is the right MYsql statement to generation the result I want?
pls anyone help, I keep looking for the answer found nowhere. I am not good in any database.
thank you so much
I'm assuming your 2nd table name is tdate, and data on trans_code and date_added that's being selected is the latest value if there are more than one data from the same tperson_id on table tdate
SELECT tp.tperson_id, tp.first_name, MAX(td.trans_code), MAX(td.date_added)
FROM tperson tp
LEFT JOIN tdate td
ON tp.tperson_id = td.tperson_id
GROUP BY tp.tperson_id

Left joins, i need an explanation about a code

i am watching a tutorial. There is a code which i don't understand what is supposed to do.
$sql = 'SELECT p.*,
a.screen_name AS author_name,
c.name AS category_name
FROM
posts p
LEFT JOIN
admin_users a ON p.author_id = a.id
LEFT JOIN
categories c ON p.category_id = c.id
WHERE
p.id = ?';
I read about the left joins but i didn't understand them. Can somebody please explain me the code i shared.
Thanks in advance!
Imagine you have two tables. One that stores the information about the programmers on your website, and the other table that keeps track of their online purchases.
PROGRAMMERS Table
+--------------------------------------------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Desire | 32 | 123 fake s| 3000.00 |
| 2 | Jamin | 25 | 234 fake s| 2500.00 |
| 3 | Jon | 23 | 567 fake s| 2000.00 |
| 4 | Bob | 30 | 789 fake s| 1500.00 |
| 5 | OtherGuy | 31 | 890 fake s| 1000.00 |
| 6 | DudeMan | 32 | 901 fake s| 500.00 |
+--------------------------------------------+
PURCHASES Table
+---------------------------------------------+
| ORDER_ID | PROG_ID | DATE | PRICE |
+-------------+---------+---------------------|
| 1 | 1 | 1-1-2017 | 100 |
| 2 | 2 | 1-2-2017 | 200 |
| 3 | 6 | 1-3-2017 | 300 |
+---------------------------------------------|
You decide you need to make a new table to consolidate this information to a table that contains
certain columns you want.
For example, you figure it would be nice for shipping purposes to have a table
that has the ID, the NAME, the PRICE, and the DATE columns.
Currently, the tables we have don't display all of that in a single table.
If we were to LEFT JOIN these tables, we would end up filling the desired columns
with NULL values where there is no information to join.
SELECT ID, NAME, PRICE, DATE
FROM PROGRAMMERS
LEFT JOIN PURCHASES
ON PROGRAMMERS.ID = PURCHASES.PROG_ID;
Notice that I'm selecting the columns I want from the starting table, then joining the right table
even though there might be missing information.
RESULTING TABLE
+-------------------------------------+
| ID | NAME | PRICE | DATE |
+----+----------+-----------------+---+
| 1 | Desire | 100 | 1-1-2017 |
| 2 | Jamin | 200 | 1-2-2017 |
| 3 | Jon | NULL | NULL |
| 4 | Bob | NULL | NULL |
| 5 | OtherGuy | NULL | NULL |
| 6 | DudeMan | 300 | 1-3-2017 |
+-------------------------------------+
For a visual representation of the difference between SQL JOINs check out
https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins .

Sum of all rows prior to (and including) date on current row in MYSQL

It's important to know that the date will be unknown during the query time, so I cannot just hard code a 'WHERE' clause.
Here's my table:
+-----------+----------+-------------+
| Date_ID | Customer | Order_Count |
+-----------+----------+-------------+
| 20150101 | Jones | 6 |
| 20150102 | Jones | 4 |
| 20150103 | Jones | 3 |
+-----------+----------+-------------+
Here's the desired output:
+-----------+----------+------------------+
| Date_ID | Customer | SUM(Order_Count) |
+-----------+----------+------------------+
| 20150101 | Jones | 6 |
| 20150102 | Jones | 10 |
| 20150103 | Jones | 13 |
+-----------+----------+------------------+
My guess is I need to use a variable or perhaps a join.
Edit: still not able to get it fast enough. very slow.
Try this query; it's most likely the best you can do without limiting the dataset you operate on. It should benefit from an index (customer, date_id).
select
t1.date_id, t1.customer, sum(t2.order_count)
from
table1 t1
left join
table1 t2 on t1.customer = t2.customer
and t1.date_id >= t2.date_id
group by
t1.date_id, t1.customer;
Sample SQL Fiddle.
One way you could go about it is by using a sub query which sums all orders up till the current order. Probably not the fastest way, but it should do the trick.
SELECT `Date_ID`, `Customer`,
(SELECT sum(b.`Order_Count`)
FROM tablename as b WHERE
b.`Date_ID` <= a.`Date_ID` AND
a.`customer = b.`Customer`)
FROM tablename as a
Where performance is an issue, consider a solution akin the following:
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT i,#i:=#i+i FROM ints, (SELECT #i:=0)n ORDER BY i;
+---+----------+
| i | #i:=#i+i |
+---+----------+
| 0 | 0 |
| 1 | 1 |
| 2 | 3 |
| 3 | 6 |
| 4 | 10 |
| 5 | 15 |
| 6 | 21 |
| 7 | 28 |
| 8 | 36 |
| 9 | 45 |
+---+----------+
you can consider this solution
select Date_ID,
Customer,
SUM(Order_COunt) over (order by Date_ID, Customer rows unbounded preceding) as SUM(Order_COunt)
from table

Get the most recent "event" for distinct items using an SQL query (MySQL database)

I have a table of "events" that I'm using to store some statistical data on lab computer logins, logouts, shutdowns and startups.
What I'm looking to produce is a list of the last actions each individual computername did.
Here's a sample of what my table named raw looks like:
mysql> select * from raw limit 20;
+--------+--------------+-------+---------------------+
| id | computername | event | timestamp |
+--------+--------------+-------+---------------------+
| 148776 | REF-18 | 1 | 2014-11-05 15:05:29 |
| 148775 | DEC-02 | 3 | 2014-11-05 15:05:19 |
| 148774 | GPS-06 | 3 | 2014-11-05 15:05:18 |
| 148773 | DEC-15 | 3 | 2014-11-05 15:05:16 |
| 148772 | DEC-02 | 1 | 2014-11-05 15:04:33 |
| 148771 | REF-18 | 2 | 2014-11-05 15:04:18 |
| 148770 | REF-09 | 1 | 2014-11-05 15:04:14 |
| 148769 | REF-18 | 4 | 2014-11-05 15:04:02 |
| 148768 | DEC-02 | 2 | 2014-11-05 15:03:39 |
| 148767 | DEC-02 | 4 | 2014-11-05 15:03:24 |
| 148766 | REF-09 | 2 | 2014-11-05 15:03:00 |
| 148765 | DEC-08 | 3 | 2014-11-05 15:02:54 |
| 148764 | REF-09 | 4 | 2014-11-05 15:02:44 |
| 148763 | REF-09 | 3 | 2014-11-05 15:01:31 |
| 148762 | DEC-01 | 1 | 2014-11-05 15:01:13 |
| 148760 | REF-19 | 1 | 2014-11-05 15:00:50 |
| 148761 | DEC-04 | 3 | 2014-11-05 15:00:50 |
| 148759 | REF-18 | 3 | 2014-11-05 15:00:25 |
| 148758 | DEC-36 | 1 | 2014-11-05 15:00:10 |
| 148757 | DEC-01 | 2 | 2014-11-05 15:00:09 |
+--------+--------------+-------+---------------------+
I've come up with a couple of solutions I think could work;
SELECT r1.id, r1.computername, r1.event, r1.timestamp
FROM raw r1
JOIN (SELECT id, computername, event, MAX(timestamp) AS timestamp
FROM raw GROUP BY computername)
AS r2
ON r1.computername = r2.computername
AND r1.timestamp = r2.timestamp
GROUP BY r1.computername;
This seems to do the job, but it takes f o r e v e r
SELECT *
FROM (SELECT * from raw order by timestamp desc) row_result
GROUP BY computername;
This takes considerably less time by far, and yet seems to produce the same results. Which is better? Is the second query simply a hack on the way that MySQL works? Could I optimize my data, or query somehow to produce quicker more reliable results?
Thanks!
Have you tried something like this:
select r.id, r.computername, r.event, r.timestamp
from raw r
inner join (
select max(id) as id
from raw
group by computerName
) as maxEventPerComputer on r.id = maxEventPerComputer.Id
Granted it's very similar to your initial query, but you might get somewhat better results especially considering your id column is (likely) indexed where your date column might not be (i'd imagine).
But from what i understand mysql is less good with the subqueries in comparison to other RDBMSes... but hopefully this will help.

help in forming mysql query to find free(available) venues/resources for a give date range

I have tables & data like this:
venues table contains : id
+----+---------+
| id | name |
+----+---------+
| 1 | venue 1 |
| 2 | venue 2 |
---------------
event_dates : id, event_id, event_from_datetime, event_to_datetime, venue_id
+----+----------+---------------------+---------------------+----------+
| id | event_id | event_from_datetime | event_to_datetime | venue_id |
+----+----------+---------------------+---------------------+----------+
| 1 | 1 | 2009-12-05 00:00:00 | 2009-12-07 00:00:00 | 1 |
| 2 | 1 | 2009-12-09 00:00:00 | 2009-12-12 00:00:00 | 1 |
| 3 | 1 | 2009-12-15 00:00:00 | 2009-12-20 00:00:00 | 2 |
+----+----------+---------------------+---------------------+----------+
This is my requirement: I want venues that will be free on 2009-12-06 00:00:00
i.e.
I should get
|venue_id|
|2 |
Currently I'm having the following query,
select ven.id , evtdt.event_from_datetime, evtdt.event_to_datetime
from venues ven
left join event_dates evtdt
on (ven.id=evtdt.venue_id)
where evtdt.venue_id is null
or not ('2009-12-06 00:00:00' between evtdt.event_from_datetime
and evtdt.event_to_datetime);
+----+---------------------+---------------------+
| id | event_from_datetime | event_to_datetime |
+----+---------------------+---------------------+
| 1 | 2009-12-09 00:00:00 | 2009-12-12 00:00:00 |
| 2 | 2009-12-15 00:00:00 | 2009-12-20 00:00:00 |
| 3 | NULL | NULL |
| 5 | NULL | NULL |
+----+---------------------+---------------------+
If you note the results, its not including venue id 1 where date is in between 2009-12-06 00:00:00 but showing other bookings.
Please help me correct this query.
Thanks in advance.
SELECT *
FROM venue v
WHERE NOT EXISTS
(
SELECT NULL
FROM event_dates ed
WHERE ed.venue_id = v.id
AND '2009-12-06 00:00:00' BETWEEN ed.event_from_datetime AND ed.event_to_datetime
)
or not ('2009-12-06 00:00:00' between evtdt.event_from_datetime
and evtdt.event_to_datetime);
12/6/2009 is between 12/5/09 and 12/7/09... that's why venue_id 1 is being excluded... what is it you're trying to extract from the data exactly?
The join query you've constructed says, take the venues table and for each row of it that has a matching venue_id make a copy of the venue table row and append the matching row. So if you just did:
select *
from venues ven
left join event_dates evtdt
on (ven.id=evtdt.venue_id);
It would yield:
+----+---------+------+----------+---------------------+---------------------+----------+
| id | name | id | event_id | event_from_datetime | event_to_datetime | venue_id |
+----+---------+------+----------+---------------------+---------------------+----------+
| 1 | venue 1 | 1 | 1 | 2009-12-05 00:00:00 | 2009-12-07 00:00:00 | 1 |
| 1 | venue 1 | 2 | 1 | 2009-12-09 00:00:00 | 2009-12-12 00:00:00 | 1 |
| 2 | venue 2 | 3 | 1 | 2009-12-15 00:00:00 | 2009-12-20 00:00:00 | 2 |
+----+---------+------+----------+---------------------+---------------------+----------+
If you then added your condition, which states the date of interest is not between the from and to date of the event, the query looks like:
select *
from venues ven
left join event_dates evtdt
on (ven.id=evtdt.venue_id)
where not ('2009-12-06' between evtdt.event_from_datetime and evtdt.event_to_datetime)
Which yields a result of:
+----+---------+------+----------+---------------------+---------------------+----------+
| id | name | id | event_id | event_from_datetime | event_to_datetime | venue_id |
+----+---------+------+----------+---------------------+---------------------+----------+
| 1 | venue 1 | 2 | 1 | 2009-12-09 00:00:00 | 2009-12-12 00:00:00 | 1 |
| 2 | venue 2 | 3 | 1 | 2009-12-15 00:00:00 | 2009-12-20 00:00:00 | 2 |
+----+---------+------+----------+---------------------+---------------------+----------+
These are my actual experimental results with your data in MySQL.
If you want to get the venue_ids that are free on the proposed date then you would write something like:
select ven.id, SUM('2009-12-06' between evtdt.event_from_datetime and evtdt.event_to_datetime) as num_intersects
from venues ven left join event_dates evtdt on (ven.id=evtdt.venue_id)
group by ven.id
having num_intersects = 0;
which yields:
+----+----------------+
| id | num_intersects |
+----+----------------+
| 2 | 0 |
+----+----------------+
this also comes up with the right answer (without modification) in the case where you have a venue with no events in the event_date table.
At a guess, if you remove not from
or not ('2009-12-06 00:00:00' between evtdt.event_from_datetime
and evtdt.event_to_datetime)
this will then return row 1 from event dates but not the other event date rows.
I say "at a guess" because your where clause is a bit hard to understand. Maybe you mean
select ven.id , evtdt.event_from_datetime, evtdt.event_to_datetime
from venues ven
left join event_dates evtdt
on (ven.id=evtdt.venue_id)
where '2009-12-06 00:00:00' between evtdt.event_from_datetime
and evtdt.event_to_datetime;