HQL/MySQL for listing distincts and duplicates - mysql

I have list of 20.000+ objects. These objects have a fk to a table called title. Two tipps are considered duplicate if they are linked to the same title, and they belong to the same package(tipp_pkg_fk, this is a parameter).
I need a list of all objects, with the duplicates listed together. For example:
tippA.title.name = "One"
tippB.title.name = "Two"
tippC.title.name = "Two"
Ideally from the above I will get a list result like this: [[tippA],[tippB,tippC]]
I am not sure how to do this, I have made an attempt (first in Mysql so I can test it, then ill change it to HQL):
select tipp.tipp_id, 1 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates,
title_instance_package_platform tipp
where tipp.tipp_id != duplicates.id
union all
select duplicates.id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
This executed for 330 seconds, then I got the message fetching in MySQL workbench, and computer started dying at that point. So the idea is that first I select all the IDs that are not duplicate, then I select all the IDS that are duplicate, and then I merge them and order them so that they appear together. I am looking for the most efficient way to do this, as I will be executing this query several times during an overnight job.
For my TIPP model, the following are part of the mapping:
static mapping = {
pkg column:'tipp_pkg_fk', index: 'tipp_idx'
title column:'tipp_ti_fk', index: 'tipp_idx'
}
+-----------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+--------------+------+-----+---------+----------------+
| tipp_id | bigint(20) | NO | PRI | NULL | auto_increment |
| tipp_version | bigint(20) | NO | | NULL | |
| tipp_pkg_fk | bigint(20) | NO | MUL | NULL | |
| tipp_plat_fk | bigint(20) | NO | MUL | NULL | |
| tipp_ti_fk | bigint(20) | NO | MUL | NULL | |
| date_created | datetime | NO | | NULL | |
| last_updated | datetime | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| ti_id | bigint(20) | NO | PRI | NULL | auto_increment |
| ti_version | bigint(20) | NO | | NULL | |
| date_created | datetime | NO | | NULL | |
| ti_imp_id | varchar(255) | NO | MUL | NULL | |
| last_updated | datetime | NO | | NULL | |
| ti_title | varchar(1024) | YES | | NULL | |
| ti_key_title | varchar(1024) | YES | | NULL | |
| ti_norm_title | varchar(1024) | YES | | NULL | |
| sort_title | varchar(1024) | YES | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
Update
After some changes it is working:
select tipp.tipp_id as id, 1 as sortOrder
from
title_instance_package_platform tipp
where tipp.tipp_id not in (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk)
union all
select duplicates.id as id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
I still haven't got the duplicates grouped together though, instead everything comes as a list, which means I still need to group them.

Can you do your select from the other side?
select all titles and packages and list all tipps to these, only if a tipp exists (count > 0) and bundle these together to get the array you showed?

Seems like you could compute both the dups and the non-dups at the same time. Something like
SELECT ( a.tipp_ti_fk = b.tipp_ti_fk ) AS sortOrder,
a.tipp_id as id
from title_instance_package_platform a ,
title_instance_package_platform b
where a.tipp_pkg_fk = 1
and b.tipp_pkg_fk = 1
You might need a DISTINCT.
This composite index would help:
INDEX(tipp_pkg_fk, tipp_ti_fk, tipp_id)

Related

SQL, delete only if exactly one row is found

I have a nested query that deletes a row in table terms only if exactly one row in definitions.term_id is found. It works but it takes like 9 seconds on my system. Im looking to optimize the query.
DELETE FROM terms
WHERE id
IN(
SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP BY term_id
HAVING COUNT(term_id) = 1
)
The database is only about 4000 rows. If I separate the query into 2 independent queries, it takes about 0.1 each
terms
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term | varchar(50) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
definitions
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term_id | int(11) | YES | | NULL | |
| definition | varchar(500) | YES | | NULL | |
| example | varchar(500) | YES | | NULL | |
| submitter_name | varchar(50) | YES | | NULL | |
| approved | int(1) | YES | MUL | 0 | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| votos | int(3) | NO | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
To speed up the process, please consider creating an index on the relevant field:
CREATE INDEX term_id ON terms (term_id)
How about using correlated sub query using exists and try,
DELETE FROM terms t
WHERE id = 1234
AND EXISTS (SELECT 1
FROM definitions d
WHERE d.term_id = t.term_id
GROUP BY term_id
HAVING COUNT(term_id) = 1)
It's often quicker to create a new table retaining only the rows you wish to keep. That said, I'd probably write this as follows, and provide indexes as appropriate.
DELETE
FROM terms t
JOIN
( SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP
BY term_id
HAVING COUNT(*) = 1
) x
ON x.term_id = t.id
Hehe; this may be a kludgy way to do it:
DELETE ... WHERE id = ( SELECT ... )
but without any LIMIT or other constraints.
I'm depending on getting an error something like "subquery returned more than one row" in order to prevent the DELETE being performed if multiple rows match.

MySQL Select parent row from a many to many relation pivot

I have three tables, products, ingredients and ingredient_product.
I need to find all products, where the product has the related ingredients.
It also needs to have a matching value on the percentage column.
A product exists with two ingredients related.
+-----+------------+---------------+------------+
| id | product_id | ingredient_id | percentage |
+-----+------------+---------------+------------+
| 1 | 1 | 1 | 50 |
| 2 | 1 | 2 | 50 |
+------------------+--------------+-------------+
SQL to retrieve:
SELECT
products.id
FROM
products,
ingredient_product
WHERE
ingredient_product.product_id = products.id
AND
(ingredient_product.ingredient_id = 1 AND ingredient_product.percentage = 50)
AND
(ingredient_product.ingredient_id = 2 AND ingredient_product.percentage = 50)
But this returns an empty result. Empty set (0.00 sec)
Products:
+-------------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-----------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| short_description | text | YES | | NULL | |
| long_description | text | YES | | NULL | |
+-------------------+-----------------------+------+-----+---------+----------------+
Ingredients:
+-------------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-----------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| short_name | varchar(50) | NO | | NULL | |
| thumbnail | varchar(255) | NO | | NULL | |
+-------------------+-----------------------+------+-----+---------+----------------+
ingredient_product
+---------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| product_id | int(10) unsigned | NO | MUL | NULL | |
| ingredient_id | int(10) unsigned | NO | MUL | NULL | |
| percentage | tinyint(3) unsigned | NO | | NULL | |
+---------------+---------------------+------+-----+---------+----------------+
Based on Strawberries answer:
SELECT product_id
FROM ingredient_product
WHERE ingredient_id IN (1,2) AND percentage = 50
GROUP BY product_id
HAVING COUNT(*) = 2;
However this doesn't work if you need separate percentages, but you can do this:
SELECT product_id FROM
(SELECT product_id
FROM ingredient_product
WHERE ingredient_id = 1 AND percentage = 50
UNION ALL
SELECT product_id
FROM ingredient_product
WHERE ingredient_id = 2 AND percentage = 50) AS tmp
GROUP BY product_id
HAVING COUNT(*) = 2;
Essentially you add a UNION for each requirement (consisting of an id and percentage combo) and then increase the HAVING condition to the number of requirements.
Notes: Since the requirements are mutually exclusive in this case, UNION ALL is quicker than UNION and will give you the same result.
Instead of an UNION in a subquery you could use OR but we found that MySQL seems to like this format better. That however might change from version to version. For completeness sake here that solution as well:
SELECT product_id
FROM ingredient_product
WHERE (ingredient_id = 1 AND percentage = 50) OR (ingredient_id = 2 AND percentage = 50)
GROUP BY product_id
HAVING COUNT(*) = 2;
On the assumption that id is redundant, and that you have a perfectly serviceable natural key on (product_id,ingredient_id)...
SELECT product_id
FROM ingredient_product
WHERE ingredient_id IN (1,2)
GROUP
BY product_id
HAVING COUNT(*) = 2;

Conditional logic in SQL query

I have a table that looks like the following:
mysql> desc mlb_lineups;
+----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| player_id | int(11) | NO | MUL | NULL | |
| team_id | int(11) | NO | | NULL | |
| game_id | int(11) | NO | MUL | NULL | |
| gamedate | date | NO | | NULL | |
| pos | int(11) | NO | | NULL | |
| is_home | int(11) | NO | | 0 | |
| is_pitcher | int(11) | YES | MUL | 0 | |
| opponent_team_id | int(11) | NO | MUL | NULL | |
| first_name | varchar(255) | YES | | NULL | |
| last_name | varchar(255) | YES | | NULL | |
| position | varchar(20) | YES | | NULL | |
| hand_throws_with | varchar(1) | YES | | NULL | |
+----------------------+--------------+------+-----+---------+----------------+
In order for me to retrieve a lineup that a team used last, let's say a team with team_id 31 in this case, I'd run the following query:
select * from mlb_lineups
where team_id = 31
and pos > -1
order by gamedate DESC,
pos ASC LIMIT 9;
That works fine and dandy. What I'm trying to do though is a bit tricky and I can't seem to piece the way the inner query and/or conditional logic would work here. I want to run a query that basically says: retrieve a lineup that a team used last where the opponent_team_id had an is_pitcher equal to 1 with a hand_throws_with equal to L. mlb_lineups table will contain at least one row where a player is_pitcher is equal to 1 and hand_throws_with is equal to L where a team has a lefty throwing on the mound.
Essentially what I'd need to do to find out what the last lineup a team_id did used when their opposing pitcher had a hand_throws_with equal to L I'd have to run a query that would figure out what the last opponent_team_id they faced is with that particular handedness and then retrieve their lineup for that game_id. Does this schema provide enough information to run a single query for that? Did I provide enough information to make my problem understandable?
Presumably you'll need to JOIN the table back to itself using the opponent_team_id and game_id fields. There are a couple of ways to do this.
Here is one method using EXISTS:
select *
from mlb_lineups ml1
where tsn_team_id = 31
and pos > -1
and exists (
select 1
from mlb_lineups ml2
where ml1.opponent_team_id = ml2.team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
and ml1.game_id = ml2.game_id
)
order by gamedate desc, pos
limit 9;
This method uses a standard JOIN but it may require DISTINCT, depends on the data:
select ml1.*
from mlb_lineups ml1
inner join mlb_lineups ml2 on ml1.game_id = ml2.game_id
and ml1.team_id = ml2.opponent_team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
where ml1.tsn_team_id = 31
and ml1.pos > -1
order by ml1.gamedate desc, ml1.pos
limit 9;

Query returning multiple objects when only one is expected

I have a simple table:
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| AdmissionDateTime | datetime | NO | | NULL | |
| AdmissionEvent | varchar(255) | YES | | NULL | |
| DischargeDateTime | datetime | YES | | NULL | |
| DischargeEvent | varchar(255) | YES | | NULL | |
| DemographicId | bigint(20) | NO | MUL | NULL | |
| FacilityId | bigint(20) | YES | MUL | NULL | |
| VisitId | bigint(20) | NO | MUL | NULL | |
| WardId | bigint(20) | NO | MUL | NULL | |
+-------------------+--------------+------+-----+---------+-------+
On which I run the following JPA (Spring-data) query:
#Query("SELECT w FROM WardTransaction w WHERE w.id = (SELECT MAX(x.id) FROM
WardTransaction x WHERE w = x AND w.visit = :visit)")
public WardTransaction findCurrent(#Param("visit") Visit visit);
On occasions I get the following exception.
org.springframework.dao.IncorrectResultSizeDataAccessException: More than one
result was returned from Query.getSingleResult(); nested exception is
javax.persistence.NonUniqueResultException: More than one result was returned from
Query.getSingleResult()
I have not been able to work out why this is happening. It does not seem to make a lot of sense to me as there can only be one 'MAX' - especially on Id (I have used 'admissionDate' in the past).
Any assistance appreciated.
why are you selecting table ? you should select columns .
try this
#Query("SELECT * FROM WardTransaction w WHERE w.id in (SELECT MAX(x.id)
FROM WardTransaction x WHERE w.id = x.id AND w.visit = :visit)")
This query is simpler and I think would get you what you want:
SELECT something
FROM sometable
Where something = someotherthing
ORDER BY sometable.id DESC
LIMIT 1
Basically it returns the results with the highest IDs at the top and grabs the first one.

MySQL merge results into table from count of 2 other tables, matching ids

I've got 3 tables: model, model_views, and model_views2. In an effort to have one column per row to hold aggregated views, I've done a migration to make the model look something like this, with a new column for the views:
+---------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
| [...] | | | | | |
| views | int(20) | YES | | 0 | |
+---------------+---------------+------+-----+---------+----------------+
This is what the columns for model_views and model_views2 look like:
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | smallint(5) | NO | MUL | NULL | |
| model_id | smallint(5) | NO | MUL | NULL | |
| time | int(10) unsigned | NO | | NULL | |
| ip_address | varchar(16) | NO | MUL | NULL | |
+------------+------------------+------+-----+---------+----------------+
model_views and model_views2 are gargantuan, both totalling in the tens of millions of rows each. Each row is representative of one view, and this is a terrible mess for performance. So far, I've got this MySQL command to fetch a count of all the rows representing single views in both of these tables, sorted by model_id added up:
SELECT model_id, SUM(c) FROM (
SELECT model_views.model_id, COUNT(*) AS c FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c FROM model_views2
GROUP BY model_views2.model_id)
AS foo GROUP BY model_id
So that I get a nice big table with the following:
+----------+--------+
| model_id | SUM(c) |
+----------+--------+
| 1 | 1451 |
| [...] | |
+----------+--------+
What would be the safest route for pulling off commands from here on in to merge the values of SUM(c) into the column model.views, matched by the model.id to model_ids that I get out of the above SQL query? I want to only fill the rows for models that still exist - There is probably model_views referring to rows in the model table which have been deleted.
You can just use UPDATE with a JOIN on your subquery:
UPDATE model
JOIN (
SELECT model_views.model_id, COUNT(*) AS c
FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c
FROM model_views2
GROUP BY model_views2.model_id) toupdate ON model.id = toupdate.model_id
SET model.views = toupdate.c