Order by count(*) of my second table takes long time - mysql

Let's assume I have 2 tables. One contains car manufacturer's names and their IDs, the second contains information about car models. I need to select few of them from the first table, but order them by quantity of linked from the second table data.
Currently, my query looks like this:
SELECT DISTINCT `manufacturers`.`name`,
`manufacturers`.`cars_link`,
`manufacturers`.`slug`
FROM `manufacturers`
JOIN `cars`
ON manufacturers.cars_link = cars.manufacturer
WHERE ( NOT ( `manufacturers`.`cars_link` IS NULL ) )
AND ( `cars`.`class` = 'sedan' )
ORDER BY (SELECT Count(*)
FROM `cars`
WHERE `manufacturers`.cars_link = `cars`.manufacturer) DESC
It was working ok for my table of scooters which size is few dozens of mb. But now i need to do the same thing for the cars table, which size is few hundreds megabytes. The problem is that the query takes very long time, sometimes it even causes nginx timeout. Also, i think, that i have all the necesary database indexes. Is there any alternative for the query above?

lets try to use subquery for your count instead.
select * from (
select distinct m.name, m.cars_link, m.slug
from manufacturers m
join cars c on m.cars_link=c.manufacturer
left join
(select count(1) ct, c1.manufacturer from manufacturers m1
inner join cars_link c2 on m1.cars_link=c2.manufacturer
where coalesce(m1.cars_link, '') != '' and c1.class='sedan'
group by c1.manufacturer) as t1
on t1.manufacturer = c.manufacturer
where coalesce(m.cars_link, '') != '' and c.class='sedan') t2
order by t1.ct

Related

MINUS operator in MySQL query [duplicate]

I am trying to perform a MINUS operation in MySql.I have three tables:
one with service details
one table with states that a service is offered in
another table (based on zipcode and state) shows where this service is not offered.
I am able to get the output for those two select queries separately. But I need a combined statement that gives the output as
'SELECT query_1 - SELECT query_2'.
Service_Details Table
Service_Code(PK) Service Name
Servicing_States Table
Service_Code(FK) State Country PK(Service_Code,State,Country)
Exception Table
Service_Code(FK) Zipcode State PK(Service_Code,Zipcode,State)
MySql does not recognise MINUS and INTERSECT, these are Oracle based operations. In MySql a user can use NOT IN as MINUS (other solutions are also there, but I liked it lot).
Example:
select a.id
from table1 as a
where <condition>
AND a.id NOT IN (select b.id
from table2 as b
where <condition>);
MySQL Does not supports MINUS or EXCEPT,You can use NOT EXISTS, NULL or NOT IN.
Here's my two cents... a complex query just made it work, originally expressed with Minus and translated for MySql
With MINUS:
select distinct oi.`productOfferingId`,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid;
minus
select
distinct r.name,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid
join t_m_rfs r
on r.name = oi.productOfferingId
join t_m_attr a
on a.attr2rfs = r.oid and f.name = a.name;
With NOT EXISTS
select distinct oi.`productOfferingId`,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid
where not exists (
select
r.name,f.name
from t_m_rfs r
join t_m_attr a
on a.attr2rfs = r.oid
where r.name = oi.productOfferingId and f.name = a.name
The tables have to have the same columns, but I think you can achieve what you are looking for with EXCEPT... except that EXCEPT only works in standard SQL! Here's how to do it in MySQL:
SELECT * FROM Servicing_states ss WHERE NOT EXISTS
( SELECT * FROM Exception e WHERE ss.Service_Code = e.Service_Code);
http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Standard SQL
SELECT * FROM Servicing_States
EXCEPT
SELECT * FROM Exception;
An anti-join pattern is the approach I typically use. That's an outer join, to return all rows from query_1, along with matching rows from query_2, and then filtering out all the rows that had a match... leaving only rows from query_1 that didn't have a match. For example:
SELECT q1.*
FROM ( query_1 ) q1
LEFT
JOIN ( query_2 ) q2
ON q2.id = q1.id
WHERE q2.id IS NULL
To emulate the MINUS set operator, we'd need the join predicate to compare all columns returned by q1 and q2, also matching NULL values.
ON q1.col1 <=> q2.col2
AND q1.col2 <=> q2.col2
AND q1.col3 <=> q2.col3
AND ...
Also, To fully emulate the MINUS operation, we'd also need to remove duplicate rows returned by q1. Adding the DISTINCT keyword would be sufficient to do that.
In case the tables are huge and are similar, one option is to save the PK to new tables. Then compare based only on the PK. In case you know that the first half is identical or so add a where clause to check only after a specific value or date .
create table _temp_old ( id int NOT NULL PRIMARY KEY )
create table _temp_new ( id int NOT NULL PRIMARY KEY )
### will take some time
insert into _temp_old ( id )
select id from _real_table_old
### will take some time
insert into _temp_new ( id )
select id from _real_table_new
### this version should be much faster
select id from _temp_old to where not exists ( select id from _temp_new tn where to.id = tn.id)
### this should be much slower
select id from _real_table_old rto where not exists ( select id from _real_table_new rtn where rto.id = rtn.id )

How efficiently check record exist more than 2 times in table using sub-query?

I have a query like this . I have compound index for CC.key1,CC.key2.
I am executing this in a big database
Select * from CC where
( (
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) > 2
AND
CC.key3='new'
)
OR
(
(select count(*) from Service s
where CC.key1=s.sr2 and CC.key2=s.sr1) <= 2
)
)
limit 10000;
I tried to make it as inner join , but its getting slower . How can i optimize this query ?
The trick here is being able to articulate a query for the problem:
SELECT *
FROM CC t1
INNER JOIN
(
SELECT cc.key1, cc.key2
FROM CC cc
LEFT JOIN Service s
ON cc.key1 = s.sr2 AND
cc.key2 = s.sr1
GROUP BY cc.key1, cc.key2
HAVING COUNT(*) <= 2 OR
SUM(CASE WHEN cc.key = 'new' THEN 1 ELSE 0 END) > 2
) t2
ON t1.key1 = t2.key1 AND
t1.key2 = t2.key2
Explanation:
Your original two subqueries would only add to the count if a given record in CC, with a given key1 and key2 value, matched to a corresponding record in the Service table. The strategy behind my inner query is to use GROUP BY to count the number of times that this happens, and use this instead of your subqueries. The first count condition is your bottom subquery, and the second one is the top.
The inner query finds all key1, key2 pairs in CC corresponding to records which should be retained. And recognize that these two columns are the only criteria in your original query for determining whether a record from CC gets retained. Then, this inner query can be inner joined to CC again to get your final result set.
In terms of performance, even this answer could leave something to be desired, but it should be better than a massive correlated subquery, which is what you had.
Basically get the Columns that must not have a duplicate then join them together. Example:
select *
FROM Table_X A
WHERE exists (SELECT 1
FROM Table_X B
WHERE 1=1
and a.SHOULD_BE_UNIQUE = b.SHOULD_BE_UNIQUE
and a.SHOULD_BE_UNIQUE2 = b.SHOULD_BE_UNIQUE2
/* excluded because these columns are null or can be Duplicated*/
--and a.GENERIC_COLUMN = b.GENERIC_COLUMN
--and a.GENERIC_COLUMN2 = b.GENERIC_COLUMN2
--and a.NULL_COLUMN = b.NULL_COLUMN
--and a.NULL_COLUMN2 = b.NULL_COLUMN2
and b.rowid > a.ROWID);
Where SHOULD_BE_UNIQUE and SHOULD_BE_UNIQUE2 are columns that shouldn't be repeated and have unique columns and the GENERIC_COLUMN and NULL_COLUMNS can be ignored so just leave them out of the query.
Been using this approach when we have issues in Duplicate Records.
With the limited information you've given us, this could be a rewrite using 'simplified' logic:
SEELCT *
FROM CC NATURAL JOIN
( SELECT key1, key2, COUNT(*) AS tally
FROM Service
GROUP
BY key1, key2 ) AS t
WHERE key3 = 'new' OR tally <= 2;
Not sure whether it will perform better but might give you some ideas of what to try next?

mysqli subquery unknown column

This almost seems like a scope issue- the select statement in the subquery doesn't recognize table 'candidate':
SELECT
candidate.id AS id,
candidate.image AS image,
candidate.name AS name,
candidate.party AS party,
player.order AS player_order,
c_pcts.pct AS pct
FROM `candidate`
INNER JOIN players player ON player.candidate_id = candidate.id
INNER JOIN lineups lineup ON player.lineup_id = lineup.id
INNER JOIN (
SELECT
pct
FROM candidate_pcts p
INNER JOIN weekly_game game ON p.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
) WHERE p.candidate_id = candidate.id
) c_pcts
WHERE lineup.id = '31'
ORDER BY player.order ASC
gives the error: "Unknown column 'candidate.id' in 'where clause'." If instead of "FROM candidate_pcts p" I put
FROM candidate_pcts p, candidate c
then it doesn't see 'p.weekly_game_id' ...huh?
Seems like I need to identify the 'candidate' table for the subquery somehow but everything I'm trying leads me only further astray. And I have tried a mess of things: order of the tables, explicitly identifying them everywhere i could think of, backticks. I should note that the nested subquery works like a charm. Here it is again:
SELECT
pct
FROM `candidate_pcts`
INNER JOIN weekly_game game ON candidate_pcts.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
) WHERE candidate_pcts.candidate_id = '5'
with a hardcoded an id value there, of course. I can supply database structure if needed here, but this is long already. The 'weekly_game' table is simply a set of scores for each candidate each week and we only want the most recent week's score, thus the 'ORDER BY date DESC LIMIT 1' clause.
Thanks very much for your time.
Tables:
table candidate: {id, image, name, party}
table candidate_pcts: {id, candidate_id, pct, weekly_game_id}
table lineups: {id, date, user_id}
table players: {id,candidate_id,lineup_id,order}
table weekly_game: {id,date}
You are basically on the right track around the problem. In essence the nested sub-select does not know about candidate.id. It you break apart the query and just look at the sub-select in question:
SELECT
pct
FROM candidate_pcts p
INNER JOIN weekly_game game ON p.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
) WHERE p.candidate_id = candidate.id
You can see there is NO reference whatsoever in that query to the candidate table other than in your where clause, thus this is an unknown column.
Since a subselect is, in essence, made before the outer select that references it, the subselect must be a standalone, executable query.
Thanks to all, especially Mike for that excellent explanation. What I did was restructured the query like so:
SELECT
candidate.id AS id,
candidate.image AS image,
candidate.name AS name,
candidate.party AS party,
player.order AS player_order,
pcts.pct AS pct
FROM `candidate`
INNER JOIN players player ON player.candidate_id = candidate.id
INNER JOIN lineups lineup ON player.lineup_id = lineup.id
LEFT JOIN (
SELECT
p.candidate_id AS pct_id, pct AS pct
FROM candidate_pcts p
INNER JOIN weekly_game game ON p.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
)
) pcts
ON pct_id = candidate.id
WHERE lineup.id = '$lineup_id'
ORDER BY player.order ASC

MySQL Inner Join with where clause sorting and limit, subquery?

Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.

Mysql: same query, different results

EDIT:
Sorry about unreadable query, I was under deadline. I managed to solve problem by breaking this query into two smaller ones, and doing some business logic in Java. Still want to know why this query can random times return two different results.
So, it randomly returns once all expected results, other time just half. I noticed that when I write it join per join, and execute after each join, in the end it returns all expected results. So am wandering if there's some kind of MySql memory or other limitation that it doesn't take whole tables in joins. Also read on undeterministic queries but not sure what to tell.
Please help, ask if needs clarification, and thank you in advance.
RESET QUERY CACHE;
SET SQL_BIG_SELECTS=1;
set #displayvideoaction_id = 2302;
set #ticSessionId = 3851;
select richtext.id,richtextcross.name,richtextcross.updates_demo_field,richtext.content from
(
select listitemcross.id,name,updates_demo_field,listitem.text_id from
(
select id,name, updates_demo_field, items_id from
(
SELECT id, name, answertype_id, updates_demo_field,
#student:=CASE WHEN #class <> updates_demo_field THEN 0 ELSE #student+1 END AS rn,
#class:=updates_demo_field AS clset FROM
(SELECT #student:= -1) s,
(SELECT #class:= '-1') c,
(
select id, name, answertype_id, updates_demo_field from
(
select manytomany.questions_id from
(
select questiongroup_id from
(
select questiongroup_id from `ticnotes`.`scriptaction` where ticsession_id=#ticSessionId and questiongroup_id is not null
) scriptaction
inner join
(
select * from `ticnotes`.`questiongroup`
) questiongroup on scriptaction.questiongroup_id=questiongroup.id
) scriptgroup
inner join
(
select * from `ticnotes`.`questiongroup_question`
) manytomany on scriptgroup.questiongroup_id=manytomany.questiongroup_id
) questionrelation
inner join
(
select * from `ticnotes`.`question`
) questiontable on questionrelation.questions_id=questiontable.id
where updates_demo_field = 'DEMO1' or updates_demo_field = 'DEMO2'
order by updates_demo_field, id desc
) t
having rn=0
) firstrowofgroup
inner join
(
select * from `ticnotes`.`multipleoptionstype_listitem`
) selectlistanswers on firstrowofgroup.answertype_id=selectlistanswers.multipleoptionstype_id
) listitemcross
inner join
(
select * from `ticnotes`.`listitem`
) listitem on listitemcross.items_id=listitem.id
) richtextcross
inner join
(
select * from `ticnotes`.`richtext`
) richtext on richtextcross.text_id=richtext.id;
My first impression is - don't use short cuts to describe your tables. I am lost at which td3 is where ,then td6, tdx3... I guess you might be lost as well.
If you name your aliases more sensibly there will be less chance to get something wrong and mix 6 with 8 or whatever.
Just a sugestion :)
There is no limitation on mySQL so my bet would be on human error - somewhere there join logic fails.