The left table is a result of my query. And I need to sort it as the right table.
I need to order by p_id, if level >= 2. The blue box of right table is a target of order by.
Is it possible? Of course it is an example. Actual data is hundreds and really need to be sorted.
I searched a lot, but coudln't find the same case.
edit : this table will be returned as java.util.ArrayList. If this kind of 'order by' is not possilbe, is it possible in java.util.ArrayList?
I'm sure it's not possible in one query in MySQL.
In your diagram on the right, the ordering has been done in two separate steps:
Sort by id
Sort each block by p_id if level >= 2
That's quite difficult to do in MySQL as you would need to identify the blocks and then iterate over them, sorting each block separately.
I've done something similar where ordering within blocks was required and then selecting from those ordered blocks. You can view that here but as I said, I think that that SQL code is horribly complicated involving 5 temporary tables. You would probably need fewer temp tables, but it would still be a very complicated procedure, quite slow and hard to maintain.
"Actual data is hundreds and really need to be sorted."
Is there any reason why you can't just sort it as you want in code?
$blockStart = FALSE;
$count = 0;
foreach($dataArray as $data){
if($blockStart === FALSE){
$blockStart = $count;
}
if($data['level'] < 2){ //Block has finished
sortBlock($dataArray, $blockStart, $count);
$blockStart = $count;
}
$count++;
}
sortBlock($dataArray, $blockStart, $count - 1);
function sortBlock($dataArray, $indexStart, $indexEnd){
//Sort the elements of $dataArray, between $indexStart and $indexEnd inclusive
//by the value of p_id
}
Trying to solve a general programming problem in MySQL when you could solve it in 1/10th of the programmer time (and probably have it perform faster as well) in Java is not a good path to follow.
It is possible to do this in SQL, but it would be a very, very complicated query in MySQL. Here is the approach.
(1) Create a subquery that has the original ids and an indicator of whether something is in level 2 or not. The ids in this table are going to define the final order.
(2) Next, create a separate counter for each group in this above table. In other databases, you would use row_number(). In my SQL, this requires a correlated subquery. This provides the mapping from id to the new ordering.
(3) Next, create a counter for each group, but this time with the needed order (by id for the non-level2 group, by your rules for ordering).
(4) Join the tables together to get the matching.
(5) Order by the original id.
Here is an attempt:
select altord.*
from (select t.*,
(select count(*) from t t2 where t2.id <= t.id and ((t2.level = 2 and t1.level = 2) or (t2.level <> 2 and t1.level <> 2))
) as seqnum
from t
) ord join
(select t.*,
(select count(*) from t t2 where (t2.id <= t.id and t2.level <> 2 and t.level <> 2) or (t2.level = 2 and t.level = 2 and (t2.pid < t.pid or t2.pid = t.pid and t2.id < t.id)))
) as seqnum
) altord
on ord.seqnum = altord.seqnum
order by ord.id
I'm not sure if this SQL is correct, but the idea can be implemented in a single query.
Related
I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)
I have the following query which displays a list of accounts with a certain margin level:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
AND
crm_margincall.crm_account_id NOT IN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.crm_account_id = crm_margincall.crm_account_id
AND
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
)
ORDER BY
id
DESC
This query, on a table of ~22.500 records takes >10 seconds to run, this is caused by the subquery defining the NOT IN section (tried NOT EXISTS, isnt much faster). How can I join this table on itself to achieve the same effect?
This query, on a table of ~22.500 records takes >10 seconds to run,
this is caused by the subquery defining the NOT IN section (tried NOT
EXISTS, isnt much faster). How can I join this table on itself to
achieve the same effect?
This can be done in several ways, but a scan of 22500 records taking 10" means either a hardware issue, or a very inefficient JOIN.
The most likely cause of the latter is a missing index or a misconfigured index, and to investigate this, you need to issue an EXPLAIN:
EXPLAIN SELECT ...
Totally shooting in the dark, judging from the selected columns being used, I'd try with
CREATE INDEX test_index ON crm_margincall(name, crm_account_id, MarginCallLevel, id)
Other improvements might be possible, but you'd need to prepare a sample structure with some fake data in a SQLfiddle to really allow debugging.
Try something like this:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
INNER JOIN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
) tt ON crm_margincall.crm_account_id = tt.crm_account_id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
ORDER BY
id
DESC
I am needing some SQL help. I have a SELECT statement that references several tables and is hanging up in the MySQL database. I would like to know if there is a better way to write this statement so that it runs efficiently and does not hang up the DB? Any help/direction would be appreciated. Thanks.
Here is the code:
Select Max(b.BurID) As BurID
From My.AppTable a,
My.AddressTable c,
My.BurTable b
Where a.AppID = c.AppID
And c.AppID = b.AppID
And (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
There is NO mysql error in the log. Sorry.
Pro tip: use explicit JOINs rather than a comma-separated list of tables. It's easier to see the logic you're using to JOIN that way. Rewriting your query to do that gives us this.
select Max(b.BurID) As BurID
From My.AppTable AS a
JOIN My.AddressTable AS c ON a.AppID = c.AppID
JOIN My.BurTable AS b ON c.AppID = b.AppID
WHERE (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
Next pro tip: Don't use functions (like DateDiff()) in WHERE clauses, because they defeat using indexes to search. That means you should change the last line of your query to
AND a.ApplicationDate >= CurDate() - INTERVAL 30 DAY
This has the same logic as in your query, but it leaves a naked (and therefore index-searchable) column name in the search expression.
Next, we need to look at your columns to see how you are searching, and cook up appropriate indexes.
Let's start with AppTable. You're screening by specific values of Forename, Surname, and DOB. You're screening by a range of ApplicationDate values. Finally you need AppID to manage your join. So, this compound index should help. Its columns are in the correct order to use a range scan to satisfy your query, and contains the needed results.
CREATE INDEX search1 USING BTREE
ON AppTable
(Forename, Surname, DOB, ApplicationDate, AppID)
Next, we can look at your AddressTable. Similar logic applies. You'll enter this table via the JOINed AppID, and then screen by specific values of three columns. So, try this index
CREATE INDEX search2 USING BTREE
ON AddressTable
(AppID, PostcodeAnywherePostcode, PostcodeAnywhereBuildingNumber, isPrimary)
Finally, we're on to your BurTable. Use similar logic as the other two, and try this index.
CREATE INDEX search3 USING BTREE
ON BurTable
(AppID, ErrorInd, BurID)
This kind of index is called a compound covering index, and can vastly speed up the sort of summary query you have asked about.
I'm using this kind of queries with different parameters :
EXPLAIN SELECT SQL_NO_CACHE `ilan_genel`.`id` , `ilan_genel`.`durum` , `ilan_genel`.`kategori` , `ilan_genel`.`tip` , `ilan_genel`.`ozellik` , `ilan_genel`.`m2` , `ilan_genel`.`fiyat` , `ilan_genel`.`baslik` , `ilan_genel`.`ilce` , `ilan_genel`.`parabirimi` , `ilan_genel`.`tarih` , `kgsim_mahalleler`.`isim` AS mahalle, `kgsim_ilceler`.`isim` AS ilce, (
SELECT `ilanresimler`.`resimlink`
FROM `ilanresimler`
WHERE `ilanresimler`.`ilanid` = `ilan_genel`.`id`
LIMIT 1
) AS resim
FROM (
`ilan_genel`
)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `kgsim_mahalleler` ON `kgsim_mahalleler`.`id` = `ilan_genel`.`mahalle`
WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
ORDER BY `ilan_genel`.`id` DESC
LIMIT 225 , 15
and this is what i get in explain section:
these are the indexes that i already tried to use:
any help will be deeply appreciated what kind of index will be the best option or should i use another table structure ?
You should first simplify your query to understand your problem better. As it appears your problem is constrained to the ilan_gen1 table, the following query would also show you the same symptoms.:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
So the first thing to do is check that this is the case. If so, the simpler question is simply why does this query require a file sort on 3661 rows. Now the 'hepsi' index sort order is:
ilce->mahelle->durum->kategori->tip->ozelik
I've written it that way to emphasise that it is first sorted on 'ilce', then 'mahelle', then 'durum', etc. Note that your query does not specify the 'mahelle' value. So the best the index can do is lookup on 'ilce'. Now I don't know the heuristics of your data, but the next logical step in debugging this would be:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'`
Does this return 3661 rows?
If so, you should be able to see what is happening. The database is using the hepsi index, to the best of it's ability, getting 3661 rows back then sorting those rows in order to eliminate values according to the other criteria (i.e. 'durum', 'kategori', 'tip').
The key point here is that if data is sorted by A, B, C in that order and B is not specified, then the best logical thing that can be done is: first a look up on A then a filter on the remaining values against C. In this case, that filter is performed via a file sort.
Possible solutions
Supply 'mahelle' (B) in your query.
Add a new index on 'ilan_gene1' that doesn't require 'mahelle', i.e. A->C->D...
Another tip
In case I have misdiagnosed your problem (easy to do when I don't have your system to test against), the important thing here is the approach to solving the problem. In particular, how to break a complicated query into a simpler query that produces the same behaviour, until you get to a very simple SELECT statement that demonstrates the problem. At this point, the answer is usually much clearer.
Is there way to realize this algorithm with mysql without 100500 queries and lots of resources?
if (exists %name% in table.name) {
num = 2;
while(exists %newname%+(num) in table.name) num++;
%name% = newname+(num);
}
Thanks
I don't know how much better you can do with a stored procedure in MySql, but you can definitely do better than 100500 queries:
SELECT name FROM table WHERE name LIKE 'somename%' ORDER BY name DESC LIMIT 1
At that point, you know that you can increment the number at the end of name and the result will be unused.
I 'm glossing over some fine print (this approach will never find and fill any "holes" in the naming scheme that may exist, and it's still not guaranteed that the name will be available due to race conditions), but in practice it can be made to work quite easily.
The simpliest way I can see of doing it is to create a table of sequential numbers
then cross join on to it....
SELECT a.name,b.id
FROM table a
WHERE a.name = 'somename'
CROSS JOIN atableofsequentialnumbers b
WHERE NOT EXISTS (SELECT 1 FROM table x WHERE x.name = CONCAT(a.name,b.id))
LIMIT 10
This will return the first 10 available numbers/names