Conditional logic in SQL query - mysql

I have a table that looks like the following:
mysql> desc mlb_lineups;
+----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| player_id | int(11) | NO | MUL | NULL | |
| team_id | int(11) | NO | | NULL | |
| game_id | int(11) | NO | MUL | NULL | |
| gamedate | date | NO | | NULL | |
| pos | int(11) | NO | | NULL | |
| is_home | int(11) | NO | | 0 | |
| is_pitcher | int(11) | YES | MUL | 0 | |
| opponent_team_id | int(11) | NO | MUL | NULL | |
| first_name | varchar(255) | YES | | NULL | |
| last_name | varchar(255) | YES | | NULL | |
| position | varchar(20) | YES | | NULL | |
| hand_throws_with | varchar(1) | YES | | NULL | |
+----------------------+--------------+------+-----+---------+----------------+
In order for me to retrieve a lineup that a team used last, let's say a team with team_id 31 in this case, I'd run the following query:
select * from mlb_lineups
where team_id = 31
and pos > -1
order by gamedate DESC,
pos ASC LIMIT 9;
That works fine and dandy. What I'm trying to do though is a bit tricky and I can't seem to piece the way the inner query and/or conditional logic would work here. I want to run a query that basically says: retrieve a lineup that a team used last where the opponent_team_id had an is_pitcher equal to 1 with a hand_throws_with equal to L. mlb_lineups table will contain at least one row where a player is_pitcher is equal to 1 and hand_throws_with is equal to L where a team has a lefty throwing on the mound.
Essentially what I'd need to do to find out what the last lineup a team_id did used when their opposing pitcher had a hand_throws_with equal to L I'd have to run a query that would figure out what the last opponent_team_id they faced is with that particular handedness and then retrieve their lineup for that game_id. Does this schema provide enough information to run a single query for that? Did I provide enough information to make my problem understandable?

Presumably you'll need to JOIN the table back to itself using the opponent_team_id and game_id fields. There are a couple of ways to do this.
Here is one method using EXISTS:
select *
from mlb_lineups ml1
where tsn_team_id = 31
and pos > -1
and exists (
select 1
from mlb_lineups ml2
where ml1.opponent_team_id = ml2.team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
and ml1.game_id = ml2.game_id
)
order by gamedate desc, pos
limit 9;
This method uses a standard JOIN but it may require DISTINCT, depends on the data:
select ml1.*
from mlb_lineups ml1
inner join mlb_lineups ml2 on ml1.game_id = ml2.game_id
and ml1.team_id = ml2.opponent_team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
where ml1.tsn_team_id = 31
and ml1.pos > -1
order by ml1.gamedate desc, ml1.pos
limit 9;

Related

SQL, delete only if exactly one row is found

I have a nested query that deletes a row in table terms only if exactly one row in definitions.term_id is found. It works but it takes like 9 seconds on my system. Im looking to optimize the query.
DELETE FROM terms
WHERE id
IN(
SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP BY term_id
HAVING COUNT(term_id) = 1
)
The database is only about 4000 rows. If I separate the query into 2 independent queries, it takes about 0.1 each
terms
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term | varchar(50) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
definitions
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term_id | int(11) | YES | | NULL | |
| definition | varchar(500) | YES | | NULL | |
| example | varchar(500) | YES | | NULL | |
| submitter_name | varchar(50) | YES | | NULL | |
| approved | int(1) | YES | MUL | 0 | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| votos | int(3) | NO | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
To speed up the process, please consider creating an index on the relevant field:
CREATE INDEX term_id ON terms (term_id)
How about using correlated sub query using exists and try,
DELETE FROM terms t
WHERE id = 1234
AND EXISTS (SELECT 1
FROM definitions d
WHERE d.term_id = t.term_id
GROUP BY term_id
HAVING COUNT(term_id) = 1)
It's often quicker to create a new table retaining only the rows you wish to keep. That said, I'd probably write this as follows, and provide indexes as appropriate.
DELETE
FROM terms t
JOIN
( SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP
BY term_id
HAVING COUNT(*) = 1
) x
ON x.term_id = t.id
Hehe; this may be a kludgy way to do it:
DELETE ... WHERE id = ( SELECT ... )
but without any LIMIT or other constraints.
I'm depending on getting an error something like "subquery returned more than one row" in order to prevent the DELETE being performed if multiple rows match.

MYSQL (MariaDB) - Invalid use of group function

I have two tables called addresses and house_sales
addresses
+-------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| house_number_name | varchar(150) | NO | | NULL | |
| address_line1 | varchar(150) | NO | MUL | NULL | |
| address_line2 | varchar(150) | YES | | NULL | |
| address_line3 | varchar(150) | YES | MUL | NULL | |
| town_city | varchar(150) | NO | MUL | NULL | |
| district | varchar(150) | YES | MUL | NULL | |
| county | varchar(150) | YES | MUL | NULL | |
| post_code | varchar(8) | NO | MUL | NULL | |
| updated_at | datetime | NO | | NULL | |
| created_at | datetime | NO | | NULL | |
+-------------------+------------------+------+-----+---------+----------------+
house_sales
+---------------+------------------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------------------------------------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| address_id | int(11) unsigned | NO | MUL | NULL | |
| price | int(11) unsigned | NO | MUL | NULL | |
| date | datetime | NO | MUL | NULL | |
| updated_at | datetime | NO | | NULL | |
| created_at | datetime | NO | | NULL | |
+---------------+------------------------------------------------------------+------+-----+---------+----------------+
I'm trying to select all the addresses grouped by address_line1 and then getting the average price for that street. The query works but I want to only select where there is more than one house on the same street. However when I add the AND count(*) > 1 I get the error "Invalid use of group function". Below is the query
SELECT count(*) as total_sales, avg(price) as average_price, `address_line1`, `town_city`
FROM `house_sales` `hs`
LEFT JOIN `addresses` `a` ON `hs`.`address_id` = `a`.`id`
WHERE `town_city` = 'London'
AND count(*) > 1
GROUP BY `address_line1`
ORDER BY `average_price` desc
I'm not sure why I'm getting this error. I've tried a sub query so I can use HAVING but haven't got this to work. Any help or pointers would be appreciated
You need a having clause to filter on the aggregate expression:
SELECT count(*) as total_sales, avg(price) as average_price, `address_line1`, `town_city`
FROM `house_sales` `hs`
LEFT JOIN `addresses` `a` ON `hs`.`address_id` = `a`.`id`
WHERE `town_city` = 'London'
GROUP BY `address_line1`, `town_city`
HAVING count(*) > 1
ORDER BY `average_price` desc
MySQL extends the SQL standard by allowing the use of aliases in the having clause, so you can also do:
having total_sales > 1
Side notes:
as commented by jarlh, it is a good practice to qualify (prefix) all column names with the table they belong to
it is also a good practice to put all non-aggregated columns in the group by clause (I added town_city, which was missing in your original query) - newer versions of MySQL do not allow this by default
quoting all identifiers is usually not necessary (unless they contain special characters)
There are two ways to go here. One would be to add town_city to the GROUP BY list:
SELECT
address_line1,
town_city,
COUNT(*) AS total_sales,
AVG(price) AS average_price
FROM house_sales hs
LEFT JOIN addresses a ON hs.address_id = a.id
WHERE town_city = 'London'
GROUP BY address_line1, town_city
HAVING COUNT(*) > 1
ORDER BY average_price DESC;
The other would be to just keep your current query but remove town_city from the select list, since you are restricting to just London anyway.
SELECT
address_line1,
COUNT(*) AS total_sales,
AVG(price) AS average_price
FROM house_sales hs
LEFT JOIN addresses a ON hs.address_id = a.id
WHERE town_city = 'London'
GROUP BY address_line1
HAVING COUNT(*) > 1
ORDER BY average_price DESC;

HQL/MySQL for listing distincts and duplicates

I have list of 20.000+ objects. These objects have a fk to a table called title. Two tipps are considered duplicate if they are linked to the same title, and they belong to the same package(tipp_pkg_fk, this is a parameter).
I need a list of all objects, with the duplicates listed together. For example:
tippA.title.name = "One"
tippB.title.name = "Two"
tippC.title.name = "Two"
Ideally from the above I will get a list result like this: [[tippA],[tippB,tippC]]
I am not sure how to do this, I have made an attempt (first in Mysql so I can test it, then ill change it to HQL):
select tipp.tipp_id, 1 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates,
title_instance_package_platform tipp
where tipp.tipp_id != duplicates.id
union all
select duplicates.id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
This executed for 330 seconds, then I got the message fetching in MySQL workbench, and computer started dying at that point. So the idea is that first I select all the IDs that are not duplicate, then I select all the IDS that are duplicate, and then I merge them and order them so that they appear together. I am looking for the most efficient way to do this, as I will be executing this query several times during an overnight job.
For my TIPP model, the following are part of the mapping:
static mapping = {
pkg column:'tipp_pkg_fk', index: 'tipp_idx'
title column:'tipp_ti_fk', index: 'tipp_idx'
}
+-----------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+--------------+------+-----+---------+----------------+
| tipp_id | bigint(20) | NO | PRI | NULL | auto_increment |
| tipp_version | bigint(20) | NO | | NULL | |
| tipp_pkg_fk | bigint(20) | NO | MUL | NULL | |
| tipp_plat_fk | bigint(20) | NO | MUL | NULL | |
| tipp_ti_fk | bigint(20) | NO | MUL | NULL | |
| date_created | datetime | NO | | NULL | |
| last_updated | datetime | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| ti_id | bigint(20) | NO | PRI | NULL | auto_increment |
| ti_version | bigint(20) | NO | | NULL | |
| date_created | datetime | NO | | NULL | |
| ti_imp_id | varchar(255) | NO | MUL | NULL | |
| last_updated | datetime | NO | | NULL | |
| ti_title | varchar(1024) | YES | | NULL | |
| ti_key_title | varchar(1024) | YES | | NULL | |
| ti_norm_title | varchar(1024) | YES | | NULL | |
| sort_title | varchar(1024) | YES | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
Update
After some changes it is working:
select tipp.tipp_id as id, 1 as sortOrder
from
title_instance_package_platform tipp
where tipp.tipp_id not in (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk)
union all
select duplicates.id as id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
I still haven't got the duplicates grouped together though, instead everything comes as a list, which means I still need to group them.
Can you do your select from the other side?
select all titles and packages and list all tipps to these, only if a tipp exists (count > 0) and bundle these together to get the array you showed?
Seems like you could compute both the dups and the non-dups at the same time. Something like
SELECT ( a.tipp_ti_fk = b.tipp_ti_fk ) AS sortOrder,
a.tipp_id as id
from title_instance_package_platform a ,
title_instance_package_platform b
where a.tipp_pkg_fk = 1
and b.tipp_pkg_fk = 1
You might need a DISTINCT.
This composite index would help:
INDEX(tipp_pkg_fk, tipp_ti_fk, tipp_id)

Calculate average of values between 2 columns sql

I have a table called validation_errors that looks like this:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| link | varchar(200) | NO | MUL | NULL | |
| message | varchar(500) | NO | | | |
| explanation | mediumtext | NO | | NULL | |
| type | varchar(50) | NO | | | |
| subtype | varchar(50) | NO | | | |
| message_id | varchar(50) | NO | | | |
+-------------+--------------+------+-----+---------+----------------+
Link table looks like this:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| link | varchar(200) | NO | PRI | NULL | |
| visited | tinyint(1) | NO | | 0 | |
| validated | tinyint(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+-------+
I wish to calculate the average number of validation errors per page per topdomain.
I have a query that can fetch the amount of pages per topdomain:
SELECT substr(link, - instr(reverse(link), '.')) as domain , count(*) as count
FROM links
GROUP BY domain
ORDER BY count desc
limit 30;
And have a sql query that can fetch the amount of validation errors per top domain:
SELECT substr(link, - instr(reverse(link), '.')) as domain ,count(*) as count
FROM validation_errors
GROUP BY domain
ORDER BY count desc
limit 30;
What i now need to do is combine them into a query and divise the results of one column with the other and i can't figure out how to do it.
Any help would be greatly apriciated.
First, use substring_index(), rather than your construct. Here is the query to join them together:
select domain, sum(numviews) as numviews, sum(numerrors) as numerrors,
sum(numerrors) / nullif(sum(numviews), 0) as error_rate
from ((SELECT substring_index(link, '.', -1) as domain , count(*) as numviews, 0 as numerrors
FROM links
GROUP BY domain
) UNION ALL
(SELECT substring_index(link, '.', -1) as domain , 0, count(*)
FROM validation_errors
GROUP BY domain
)
) d
GROUP BY domain;
With both variables, I don't know which 30 you want to choose, so I haven't included an order by.
Note that this doesn't use a join, it uses union all with aggregation. This ensures that you will get all domains, even those with no views and those with no errors.

Query returning multiple objects when only one is expected

I have a simple table:
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| AdmissionDateTime | datetime | NO | | NULL | |
| AdmissionEvent | varchar(255) | YES | | NULL | |
| DischargeDateTime | datetime | YES | | NULL | |
| DischargeEvent | varchar(255) | YES | | NULL | |
| DemographicId | bigint(20) | NO | MUL | NULL | |
| FacilityId | bigint(20) | YES | MUL | NULL | |
| VisitId | bigint(20) | NO | MUL | NULL | |
| WardId | bigint(20) | NO | MUL | NULL | |
+-------------------+--------------+------+-----+---------+-------+
On which I run the following JPA (Spring-data) query:
#Query("SELECT w FROM WardTransaction w WHERE w.id = (SELECT MAX(x.id) FROM
WardTransaction x WHERE w = x AND w.visit = :visit)")
public WardTransaction findCurrent(#Param("visit") Visit visit);
On occasions I get the following exception.
org.springframework.dao.IncorrectResultSizeDataAccessException: More than one
result was returned from Query.getSingleResult(); nested exception is
javax.persistence.NonUniqueResultException: More than one result was returned from
Query.getSingleResult()
I have not been able to work out why this is happening. It does not seem to make a lot of sense to me as there can only be one 'MAX' - especially on Id (I have used 'admissionDate' in the past).
Any assistance appreciated.
why are you selecting table ? you should select columns .
try this
#Query("SELECT * FROM WardTransaction w WHERE w.id in (SELECT MAX(x.id)
FROM WardTransaction x WHERE w.id = x.id AND w.visit = :visit)")
This query is simpler and I think would get you what you want:
SELECT something
FROM sometable
Where something = someotherthing
ORDER BY sometable.id DESC
LIMIT 1
Basically it returns the results with the highest IDs at the top and grabs the first one.