Writing more better SQL

Writing more better SQL - mysql

I've got a query here that's painfully slow. Part of the problem may be that tableA in the sub-query has a quite substantial size in comparison to the other tables.
TABLES STRUCTURE
*-------------------*------------------*-------------------*
| ID_TABLE | DATA_TABLE | DATA_TABLE_EXT |
*-------------------*------------------*-------------------*
| id n<|>1 id 1<|>n owner_id |
| foreign_id | owner_id | information |
| foreign_id_source | date_field | ... |
| ... | ... | |
*-------------------*------------------*-------------------*
QUERY
SELECT ID_TABLE.foreign_id_source, count(ID_TABLE.id) as count
FROM DATA_TABLE
LEFT JOIN ID_TABLE ON DATA_TABLE.id = ID_TABLE.id
WHERE DATA_TABLE.owner_id = 'some_id'
AND DATA_TABLE.date_field > 'some_date'
AND DATA_TABLE.id IN (
SELECT DATA_TABLE_EXT.owner_id FROM DATA_TABLE_EXT
JOIN DATA_TABLE ON DATA_TABLE_EXT.owner_id = DATA_TABLE.id
WHERE DATA_TABLE.owner_id = 'some_id'
GROUP BY DATA_TABLE.id
HAVING SUM(ABS(DATA_TABLE_EXT.information)) <> 0
)
GROUP BY ID_TABLE.foreign_id_source
ORDER BY count ASC
REQUIRED RESULT
*-------------------*-------------*
| foreign_id_source | count |
*-------------------*-------------*
| source1 | 45 |
| source2 | 10 |
| ... | |
*-------------------*-------------*
Each id in DATA_TABLE may have multiple records in ID_TABLE.
many records in DATA_TABLE may have the same owner_id.
I'm looking for the number of records in data_table with a foreign_id_source, grouped by that foreign_id_source, where the record is after 'some_date' and it's DATA_TABLE_EXT records do not all have a value of 0 in the information field.
Short of creating indexes or other database manipulation is there a way to improve this query in terms of performance?
Any other suggestions are also welcome.

The point is: SUM(ABS(DATA_TABLE_EXT.information)) <> 0 can only be true if at least one DATA_TABLE_EXT.information is non-zero. So we don't have to sum() them, we only only need to check if a non-zero one exists.
[ I don't know if mysql is smart enough to handle the exists(), but in theory it is cheaper, and can be faster]
SELECT it.foreign_id_source, count(it.id) as count
FROM DATA_TABLE dt
LEFT JOIN ID_TABLE it ON dt.id = it.id
WHERE dt.owner_id = 'some_id'
AND dt.date_field > 'some_date'
AND EXISTS (
SELECT *
FROM DATA_TABLE_EXT x
JOIN DATA_TABLE dt2 ON x.owner_id = dt2.id
WHERE x.id =dt.id
AND dt2.owner_id = 'some_id'
AND x.information <> 0
)
GROUP BY it.foreign_id_source
ORDER BY count ASC
;

Often moving the subquery to the FROM will help:
SELECT ID_TABLE.foreign_id_source, count(DATA_TABLE.id) as count
FROM ID_TABLE LEFT JOIN
DATA_TABLE
ON DATA_TABLE.id = ID_TABLE.id JOIN
(SELECT DATA_TABLE.id
FROM DATA_TABLE_EXT JOIN
DATA_TABLE
ON DATA_TABLE_EXT.owner_id = DATA_TABLE.id
WHERE DATA_TABLE.owner_id = 'some_value'
GROUP BY DATA_TABLE.id
HAVING SUM(ABS(DATA_TABLE_EXT.information)) <> 0
) xx
ON DATA_TABLE.id = xx.id
WHERE DATA_TABLE.owner_id = 'some_value' AND
DATA_TABLE.date_field > 'some_date'
GROUP BY x.field1
ORDER BY count ASC;
Then, you can think about indexes. These would be tableX(field2, fieldZ, field1, fieldX), tableI(field1), tableX(field2, field1, fieldB), andtableA(field1)`.

Related

How to query and group every continuous number series in MySQL?

I have this freight.or_nos table which contains series of receipt numbers. I want to list all the or's being issued excluding the status='Cancelled' making the series broken in groups.
For example I have this receipt stab 125001-125050, and 125020 is cancelled so the listing result would be:
+-------------------------------------------------------+
| OR Start | OR End | Quantity | Amount |
+-------------------------------------------------------+
| 125001 | 125019 | 19 | |
+-------------------------------------------------------+
| 125021 | 125050 | 30 | |
+-------------------------------------------------------+
This seems to be a tough query.

Thanks for reading but I already made it, just now! :)
Here's my query(disregard the other characters it's form our CGI):
{.while SELECT `start`,`end`,or_prefix,or_suffix,SUM(a.amount) AS g_total,COUNT(*) AS qcount FROM (SELECT l.id AS `start`,( SELECT MIN(a.id) AS id FROM ( SELECT a.or_no AS id FROM freight.`or_nos` a WHERE a.status!='Cancelled' AND a.log_user = 0#user_teller AND DATE(a.or_date)='#user_date`DATE' AND IF(a.status='Default' AND a.amount=0,0,1) ) AS a LEFT OUTER JOIN ( SELECT a.or_no AS id FROM freight.`or_nos` a WHERE a.status!='Cancelled' AND a.log_user = 0#user_teller AND DATE(a.or_date)='#user_date`DATE' AND IF(a.status='Default' AND a.amount=0,0,1) ) AS b ON a.id = b.id - 1 WHERE b.id IS NULL AND a.id >= l.id ) AS `end` FROM ( SELECT a.or_no AS id FROM freight.`or_nos` a WHERE a.status!='Cancelled' AND a.log_user = 0#user_teller AND DATE(a.or_date)='#user_date`DATE' AND IF(a.status='Default' AND a.amount=0,0,1) ) AS l LEFT OUTER JOIN ( SELECT a.or_no AS id FROM freight.`or_nos` a WHERE a.log_user = 0#user_teller AND DATE(a.or_date)='#user_date`DATE' AND IF(a.status='Default' AND a.amount=0,0,1) ) AS r ON r.id = l.id - 1 WHERE r.id IS NULL) AS k LEFT JOIN freight.`or_nos` a ON a.`or_no` BETWEEN k.start AND k.end AND DATE(a.`or_date`)='#user_date`DATE' AND a.log_user =0#user_teller AND IF(a.status='Default' AND a.amount=0,0,1) AND a.status!='Cancelled' GROUP BY `start`}
{.start}{.x.24.12:end}{.x`p0.40.-5:qcount}{.x`p2.57.-15:g_total}{.asc 255}
{.wend}{.asc 255}

difficulties getting a 3 table join to return expected results

I'm having some difficulty getting to the bottom of this sql query.
Tables:
--Tickets-- --Finance-- --Access--
id_tickets id_finance id_access
name_tickets id_event id_event
cat_tickets id_tickets id_tickets
sold_finance scan_access
Finance and Access both contain a row for multiple of each ticket type as listed in tickets.
and I'm trying to get:
cat_tickets | total_sold | total_scan
-------------------------------------
single | 3043 | 2571
season | 481 | 292
comp | 114 | 75
-------------------------------------
total | 3638 | 2938
The closest I've been to the result I've used:
SELECT tickets.cat_tickets, COALESCE(SUM(finance.sold_finance), 0) AS total_sold, COALESCE(SUM(access.scan_access), 0) AS total_scan
FROM finance INNER JOIN tickets ON finance.id_tickets = tickets.id_tickets
INNER JOIN access ON access.id_tickets = tickets.id_tickets
WHERE access.id_event = 235 AND finance.id_event = access.id_event
GROUP BY tickets.cat_tickets
ORDER BY tickets.cat_tickets DESC
but that just returns:
cat_tickets | total_sold | total_scan
-------------------------------------
single | 4945 | 4437
season | 954 | 599
comp | 342 | 375
-------------------------------------
total | 6241 | 5411
Any ideas where I could be going wrong?
Thanks!

The problem is the relation between access and finance tables, you have to join them. Even if you LEFT JOIN the table the predicate finance.id_event = access.id_event will make it INNER JOIN. As a work around, use UNION like this:
SELECT
tickets.cat_tickets,
SUM(CASE WHEN a.Type = 'f' THEN num ELSE 0 END) AS total_sold,
SUM(CASE WHEN a.Type = 'a' THEN num ELSE 0 END) AS total_scan
FROM tickets
LEFT JOIN
(
SELECT 'f' Type, id_tickets, sold_finance num
FROM finance f
WHERE id_event = 1
UNION ALL
SELECT 'a', id_tickets, scan_access
FROM access
WHERE id_event = 1
) a ON a.id_tickets = tickets.id_tickets
GROUP BY tickets.cat_tickets;
SQL Fiddle Demo

Although I am fully clear on what you want, just try this query if the result of this is what you are expecting.
SELECT tickets.cat_tickets, COALESCE(SUM(finance.sold_finance), 0) AS total_sold, COALESCE(SUM(access.scan_access), 0) AS total_scan
FROM finance LEFT JOIN tickets ON finance.id_tickets = tickets.id_tickets
LEFT JOIN access ON access.id_tickets = tickets.id_tickets
WHERE access.id_event = 235
GROUP BY tickets.cat_tickets
ORDER BY tickets.cat_tickets DESC

Disclaimer: This query is not tested due to incomplete data on the question.
SELECT z.Cat_tickets,
COALESCE(x.total_sold,0) total_sold,
COALESCE(y.total_scan,0) total_scan
FROM tickets z
LEFT JOIN
(
SELECT a.id_tickets,
a.cat_tickets,
SUM(b.sold_finance) total_sold
FROM tickets a
INNER JOIN finance b
ON a.id_tickets = b.id_tickets
WHERE id_event = 235
GROUP BY a.id_tickets, a.cat_tickets
) x ON z.id_tickets = x.id_tickets
LEFT JOIN
(
SELECT aa.id_tickets,
aa.cat_tickets,
SUM(bb.scan_access) total_scan
FROM tickets aa
INNER JOIN Access bb
ON aa.id_tickets = bb.id_tickets
WHERE id_event = 235
GROUP BY aa.id_tickets, aa.cat_tickets
) y ON z.id_tickets = y.id_tickets

Why is this MySQL query slow?

I have the following query, all relevant columns are indexed correctly. MySQL version 5.0.8. The query takes forever:
SELECT COUNT(*) FROM `members` `t` WHERE t.member_type NOT IN (1,2)
AND ( SELECT end_date FROM subscriptions s
WHERE s.sub_auth_id = t.member_auth_id AND s.sub_status = 'Completed'
AND s.sub_pkg_id > 0 ORDER BY s.id DESC LIMIT 1 ) < curdate( )
EXPLAIN output:
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
1 | PRIMARY | t | ALL | membership_type | NULL | NULL | NULL | 9610 | Using where
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
2 | DEPENDENT SUBQUERY | s | index | subscription_auth_id, | PRIMARY | 4 | NULL | 1 | Using where
| | | | subscription_pkg_id, | | | | |
| | | | subscription_status | | | | |
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
Why?

Your subselect refers to values in the parent query. This is known as a correlated (dependent) subquery, and such a query has to be executed once for every row in the parent query, which often leads to poor performance. It is often faster to rewrite the query as a JOIN, for example like this
(Note: without a sample schema to test with, it is impossible to say in advance if this will be faster and still correct, you might need to adjust it a little):
SELECT COUNT(*) FROM members t
LEFT JOIN (
SELECT sub_auth_id as member_id, max(id) as sid FROM subscriptions
WHERE sub_status = 'Completed'
AND sub_pkg_id > 0
GROUP BY sub_auth_id
LEFT JOIN (
SELECT id AS subid, end_date FROM subscriptions
WHERE sub_status = 'Completed'
AND sub_pkg_id > 0
) sdate ON sid = subid
) sub ON sub.member_id = t.member_auth_id
WHERE t.member_type NOT IN (1,2)
AND sub.end_date < curdate( )
The logic here is:
For each member, find his latest subscription.
For each latest subscription, find its end date.
Join these member-latest_sub_date pair to the members list.
Filter the list.

Your query is slow because as written you are considering 9,610 rows and therefore performing 9,610 SELECT subqueries in your WHERE clause. You really should rewrite your query to JOIN the members and subscriptions tables first, to which your WHERE conditions could still apply.
EDIT: Try this.
SELECT COUNT(*)
FROM `members` `t`
JOIN subscriptions s ON (s.sub_auth_id = t.member_auth_id)
WHERE t.member_type NOT IN (1,2)
AND s.sub_status = 'Completed'
AND s.sub_pkg_id > 0
AND end_date < curdate()
ORDER BY s.id DESC LIMIT 1

Caveat: I'm not a MySQL expert, but pretty good in a different SQL flavour (VFP), but I believe you will save some time if:
You count just one field, let's say memberid, instead of *.
Your comparison NOT IN (1,2) is replaced with > 2 (provided that is valid).
The ORDER BY in your subselect is unnecessary, I think. You're trying to get the last completed subscription?
The < curdate() should be inside your subselect's WHERE.
(SELECT end_date FROM subscriptions s
WHERE s.end_date < curdate() and s.sub_auth_id = t.member_auth_id AND
s.sub_status = 'Completed' AND s.sub_pkg_id > 0 ORDER BY s.id DESC LIMIT 1 )
Tune your subselect so as to trim down the set as quickly as possible. The first conditional should be the one least likely to occur.

I ended up doing it like this:
select count(*) from members t
JOIN subscriptions s ON s.sub_auth_id = t.member_auth_id
WHERE t.membership_type > 2 AND s.sub_status = 'Completed' AND s.sub_pkg_id > 0
AND s.sub_end_date < curdate( )
AND s.id = (SELECT MAX(ss.id) FROM subscriptions ss WHERE ss.sub_auth_id = t.member_auth_id)
I believe that the problem is due to a bug that won't be fixed until MySQL 6.

MySQL GROUP BY order

Please consider the following table structure and data:
+--------------------+-------------+
| venue_name | listed_by |
+--------------------+-------------+
| My Venue Name | 1 |
| Another Venue | 2 |
| My Venue Name | 5 |
+--------------------+-------------+
I am currently using MySQL's GROUP BY function to select only unique venue names. However, this only returns the first occurance of My Venue Name, but I would like to return it based on a condition (in this case where the listed_by field has a value > 2.
Essentially here's some pseudo-code of what I'd like to achieve:
Select all records
Group by name
if grouped, return the occurance with the higher value in listed_by
Is there an SQL statement that will allow this functionality?
Edit: I should have mentioned that there are other fields involved in the query, and the listed_by field needs to be used elsewhere in the query, too. Here is the original query that we're using:
SELECT l1.field_value AS venue_name,
base.ID AS listing_id,
base.user_ID AS user_id,
IF(base.user_ID > 1, 'b', 'a') AS flag,
COUNT(img.ID) AS img_num
FROM ( listingsDBElements l1, listingsDB base )
LEFT JOIN listingsImages img ON (base.ID = img.listing_id AND base.user_ID = img.user_id and img.active = 'yes')
WHERE l1.field_name = 'venue_name'
AND l1.field_value LIKE '%name%'
AND base.ID = l1.listing_id
AND base.user_ID = l1.user_id
AND base.ID = l1.listing_id
AND base.user_ID = l1.user_id
AND base.active = 'yes'
GROUP BY base.Title ORDER BY flag desc,img_num desc

As long as you didn't mention other fields - here is the simplest solution:
SELECT venue_name,
MAX(listed_by)
FROM tblname
WHERE listed_by > 2
GROUP BY venue_name
With other fields it could look like (assuming there is no duplicates in venue_name + listed_by pairs):
SELECT *
FROM tblname t1
INNER JOIN (SELECT venue_name,
MAX(listed_by) max_listed_by
FROM tblname
WHERE listed_by > 2
GROUP BY venue_name) t2 ON t1.venue_name = t2.venue_name
AND t1.listed_by = t2.max_listed_by

How to delete duplicates in SQL table based on multiple fields

I have a table of games, which is described as follows:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | | NULL | |
| time | time | NO | | NULL | |
| hometeam_id | int(11) | NO | MUL | NULL | |
| awayteam_id | int(11) | NO | MUL | NULL | |
| locationcity | varchar(30) | NO | | NULL | |
| locationstate | varchar(20) | NO | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
But each game has a duplicate entry in the table somewhere, because each game was in the schedules for two teams. Is there a sql statement I can use to look through and delete all the duplicates based on identical date, time, hometeam_id, awayteam_id, locationcity, and locationstate fields?

You should be able to do a correlated subquery to delete the data. Find all rows that are duplicates and delete all but the one with the smallest id. For MYSQL, an inner join (functional equivalent of EXISTS) needs to be used, like so:
delete games from games inner join
(select min(id) minid, date, time,
hometeam_id, awayteam_id, locationcity, locationstate
from games
group by date, time, hometeam_id,
awayteam_id, locationcity, locationstate
having count(1) > 1) as duplicates
on (duplicates.date = games.date
and duplicates.time = games.time
and duplicates.hometeam_id = games.hometeam_id
and duplicates.awayteam_id = games.awayteam_id
and duplicates.locationcity = games.locationcity
and duplicates.locationstate = games.locationstate
and duplicates.minid <> games.id)
To test, replace delete games from games with select * from games. Don't just run a delete on your DB :-)

You can try such query:
DELETE FROM table_name AS t1
WHERE EXISTS (
SELECT 1 FROM table_name AS t2
WHERE t2.date = t1.date
AND t2.time = t1.time
AND t2.hometeam_id = t1.hometeam_id
AND t2.awayteam_id = t1.awayteam_id
AND t2.locationcity = t1.locationcity
AND t2.id > t1.id )
This will leave in database only one example of each game instance which has the smallest id.

The best thing that worked for me was to recreate the table.
CREATE TABLE newtable SELECT * FROM oldtable GROUP BY field1,field2;
You can then rename.

To get list of duplicate entried matching two fields
select t.ID, t.field1, t.field2
from (
select field1, field2
from table_name
group by field1, field2
having count(*) > 1) x, table_name t
where x.field1 = t.field1 and x.field2 = t.field2
order by t.field1, t.field2
And to delete all the duplicate only
DELETE x
FROM table_name x
JOIN table_name y
ON y.field1= x.field1
AND y.field2 = x.field2
AND y.id < x.id;

select orig.id,
dupl.id
from games orig,
games dupl
where orig.date = dupl.date
and orig.time = dupl.time
and orig.hometeam_id = dupl.hometeam_id
and orig. awayteam_id = dupl.awayeam_id
and orig.locationcity = dupl.locationcity
and orig.locationstate = dupl.locationstate
and orig.id < dupl.id
this should give you the duplicates; you can use it as a subquery to specify IDs to delete.

AS long as you are not getting id (primary key) of the table in your select query and the other data is exact same you can use SELECT DISTINCT to avoid getting duplicate results.

delete from games
where id not in
(select max(id) from games
group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
);
Workaround
select max(id) id from games
group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
into table temp_table;
delete from games where id in (select id from temp);

DELETE FROM table
WHERE id =
(SELECT t.id
FROM table as t
JOIN (table as tj ON (t.date = tj.data
AND t.hometeam_id = tj.hometeam_id
AND t.awayteam_id = tj.awayteam_id
...))

DELETE FROM tbl
USING tbl, tbl t2
WHERE tbl.id > t2.id
AND t2.field = tbl.field;
in your case:
DELETE FROM games
USING games tbl, games t2
WHERE tbl.id > t2.id
AND t2.date = tbl.date
AND t2.time = tbl.time
AND t2.hometeam_id = tbl.hometeam_id
AND t2.awayteam_id = tbl.awayteam_id
AND t2.locationcity = tbl.locationcity
AND t2.locationstate = tbl.locationstate;
reference: https://dev.mysql.com/doc/refman/5.7/en/delete.html

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Writing more better SQL - mysql

Related

How to query and group every continuous number series in MySQL?

difficulties getting a 3 table join to return expected results

Why is this MySQL query slow?

MySQL GROUP BY order

How to delete duplicates in SQL table based on multiple fields

Categories

Resources