mysql query, different performance between = and IN - mysql

why there is this difference of time execution between these two queries even if they retrieve the same amount of rows from the same table?
select cognome, nome, lingua, count(*)
from archivio.utente
where cognome in ('rossi','pecchia','pirono')
group by cognome, nome, lingua;
…
…
…
| Rossi | Mario | it | 1 |
| Pironi | Luigi | it | 1 |
| Pecchia | Fabio | it | 1 |
+----------------------+---------+--------+----------+
779 rows in set (0.03 sec)
select cognome, nome, lingua, count(*)
from archivio.utente
where nome='corrado'
group by cognome, nome, lingua;
…
…
…
| Rossi | Mario | it | 1 |
| Pironi | Luigi | it | 1 |
| Pecchia | Fabio | it | 1 |
+----------------------+---------+--------+----------+
737 rows in set (0.47 sec)

from mysql documentation :
https://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain-join-types
when we use in
Only rows that are in a given range are retrieved, using an index to select the rows.
The key column in the output row indicates which index is used.
when we use =
A full table scan is done for each combination of rows
So in one case all lines are retrieved and compared, in another case just a range.

Related

Optimizing query that selects on the result of a group by

I have a table that contains pipeline jobs data. A pipeline is composed of many jobs that run independently, and each of them can finish at it's own pace. Once the pipelines are finished, they are archived by setting one of the columns to 1. I want to get the list of jobs of the pipelines whose state is "Done" for all their jobs.
Let's say that my table looks like (sample data shown):
mysql> select id, pipeline, archived, state from jobs where archived=0 limit 4;
+---------+-----------+----------+-------+
| id | pipeline | archived | state |
+---------+-----------+----------+-------+
| 8572387 | pipeline1 | 0 | Done |
| 8572388 | pipeline1 | 0 | Done |
| 8572389 | pipeline2 | 0 | Done |
| 8572390 | pipeline2 | 0 | Fail |
+---------+-----------+----------+-------+
4 rows in set (0.00 sec)
I managed to get the list of failed pipelines:
mysql> select distinct(pipeline) from jobs where archived=0 group by pipeline, state having state!='Done';
+-----------+
| pipeline |
+-----------+
| pipeline2 |
+-----------+
1 row in set (0.01 sec)
And I even managed to get the answer I'm looking for (real data shown):
select j1.id
from jobs j1
where j1.archived=0
and j1.pipeline not in ( select distinct(j2.pipeline)
from jobs j2
where j2.archived=0
group by j2.pipeline, j2.state having j2.state!='Done'
);
+---------+
| id |
+---------+
| 8583200 |
| 8583201 |
| 8583202 |
| 8583203 |
.
.
.
| 8584305 |
| 8584306 |
+---------+
1107 rows in set (18.77 sec)
My issue is that the first query runs in 0.01s for the real data, but as soon as I add the second select, the time goes up dramatically. This last query took 19s having a total of 2 failed pipelines out of a total of 4, each one having around 500 jobs.
When I'm doing this with a full dataset with hundreds of pipelines... it takes too much time.
I'm sure it can be done a lot quicker, in less than 1s. But I cannot manage to get it right :-( Where is my query being stuck?
For reference, the query plan is:
mysql> describe select j1.id from jobs j1 where j1.archived=0 and j1.pipeline not in (select distinct(j2.pipeline) from jobs j2 where j2.archived=0 group by j2.pipeline, j2.state having j2.state!='Done');
+----+--------------------+-------+------+---------------+----------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+----------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | j1 | ref | archived | archived | 2 | const | 2306 | Using where |
| 2 | DEPENDENT SUBQUERY | j2 | ref | archived | archived | 2 | const | 2306 | Using where; Using temporary; Using filesort |
+----+--------------------+-------+------+---------------+----------+---------+-------+------+----------------------------------------------+
2 rows in set (0.00 sec)
You could rewrite it to something like this
A combined INDEX on (pipeline,archived ,state) should speed this up.
The order of the Index column are vital and depend on the granularity of data, so you can play with it, to see which gives better results
SELECT
j1.id
FROM
jobs j1
WHERE
j1.archived = 0
AND NOT EXISTS
(SELECT 1 FROM jobs j2 WHERE j2.pipeline = j1.pipeline
AND
j2.archived = 0
AND j2.state != 'Done')

When inserting new record in existing table tt goes up instead of down

I have already created table I want to add extra row when adding extra row the created extra row goes up. I want that row at the bottom.
MariaDB [armydetails]> insert into armydetails values('r05','Shishir','Bhujel','Jhapa','9845678954','male','1978-6-7','1994-1-3','ran5','Na11088905433');
Query OK, 1 row affected (0.17 sec)
MariaDB [armydetails]> select * from armydetails;
+-------+---------+---------+-----------+------------+--------+------------+------------+--------+----------------+
| regNo | fName | lName | address | number | gender | DOB | DOJ | rankID | accountNo |
+-------+---------+---------+-----------+------------+--------+------------+------------+--------+----------------+
| r05 | Shishir | Bhujel | Jhapa | 9845678954 | male | 1978-06-07 | 1994-01-03 | ran5 | Na11088905433 |
| ro1 | Milan | Katwal | Dharan | 9811095122 | Male | 1970-01-03 | 1990-01-01 | ran1 | Na11984567823 |
| ro2 | Hari | Yadav | Kathmandu | 9810756436 | male | 1980-06-07 | 2000-05-06 | ran2 | Na119876678543 |
| ro3 | Khrisna | Neupane | Itahari | 9864578934 | male | 1980-02-02 | 2001-01-07 | ran3 | Na11954437890 |
| ro4 | Lalit | Rai | Damak | 9842376547 | male | 1989-05-09 | 2005-01-02 | ran4 | Na11064553221 |
+-------+---------+---------+-----------+------------+--------+------------+------------+--------+----------------+
5 rows in set (0.00 sec)
MariaDB [armydetails]>
The SQL 2011 publication from ISO/IEC 9075 says:
In general, rows in a table are unordered; however, rows in a table are ordered if the table is the result of a that immediately contains an « order by clause ».
In a SQL database, there is no underlying, default ordering for records. A relational database basically stores a table as a bunch of unordered records.
When records are SELECTed without an ORDER BY clause, they come out in an undefined order, that in no way is guaranteed to be consistent over subsequent queries (including the very same query being executed several times). This is true for MySQL and for other RDBMS.
The only way to properly order records is to use an ORDER BY clause, like:
select * from armydetails order by regNo
Suggested lecture: Tom Kyte Blog : Order in the Court!.
You can simply add an ORDER BY clause to your statment as follows:
SELECT * FROM armydetails ORDER BY regNO DESC;

INNER JOIN same value, but the difference is the other table are having extra word in front of the value

As I said in the title, or maybe my question is a little bit confusing. Here it is....
So, I want to combine 2 tables using INNER JOIN (ofcourse) with some difference.
This is my tables
Table 1, PK = steam_id
SELECT * FROM nmrihstats ORDER BY points DESC LIMIT 4;
+---------------------+----------------+--------+-------+--------+
| steam_id | name | points | kills | deaths |
+---------------------+----------------+--------+-------+--------+
| STEAM_0:1:88467338 | Alan14 | 50974 | 5438 | 12 |
| STEAM_0:0:95189481 | ? BlacKEaTeR ? | 35085 | 24047 | 316 |
| STEAM_0:1:79891668 | Lowell | 34410 | 44076 | 993 |
| STEAM_0:1:170948255 | Rain | 29780 | 30167 | 278 |
+---------------------+----------------+--------+-------+--------+
4 rows in set (0.01 sec)
Table 2, PK = authid
SELECT * FROM store_players ORDER BY credits DESC LIMIT 4;
+-----+-------------+---------------+---------+--------------+-------------------+
| id | authid | name | credits | date_of_join | date_of_last_join |
+-----+-------------+---------------+---------+--------------+-------------------+
| 309 | 1:88467338 | Alan14 | 15543 | 1475580801 | 1482260232 |
| 368 | 1:79891668 | Lowell | 10855 | 1475603908 | 1482253619 |
| 256 | 1:128211488 | Fuck[U]seLF | 10422 | 1475570061 | 1482316480 |
| 428 | 1:74910707 | Mightybastard | 7137 | 1475672897 | 1482209608 |
+-----+-------------+---------------+---------+--------------+-------------------+
4 rows in set (0.00 sec)
Now, how can I use INNER JOIN without doing like removing "STEAM_0:" or adding it. Also with explanation, please
You can join witn like operator, e.g.:
SELECT n.*, sp.*
FROM nmrihstats n JOIN store_players sp ON n.steam_id LIKE CONCAT('%', sp.authid);
Here's the SQL Fiddle.
Another approach would be to use String functions of MySQL to extract out relevant part from steam_id but I believe that's not what you want:
SELECT SUBSTR(steam_id, LOCATE('STEAM_0:', steam_id) + CHAR_LENGTH('STEAM_0:'))
FROM nmrihstats;
it is not possible, you need to remove "STEAM_0:", matching with WHERE, using substring for remove STEAM_0: from column equals to column in other table, or a new field into the T1 without "STEAM_0:", that 2 columns match for INNER JOIN

Only return an ordered subset of the rows from a joined table

Given a structure like this in a MySQL database
#data_table
(id) | user_id | time | (...)
#relations_table
(id) | user_id | user_coach_id | (...)
we can select all data_table rows belonging to a certain user_coach_id (let's say 1) with
SELECT rel.`user_coach_id`, dat.*
FROM `relations_table` rel
LEFT JOIN `data_table` dat ON rel.`uid` = dat.`uid`
WHERE rel.`user_coach_id` = 1
ORDER BY val.`time` DESC
returning something like
| user_coach_id | id | user_id | time | data1 | data2 | ...
| 1 | 9 | 4 | 15 | foo | bar | ...
| 1 | 7 | 3 | 12 | oof | rab | ...
| 1 | 6 | 4 | 11 | ofo | abr | ...
| 1 | 4 | 4 | 5 | foo | bra | ...
(And so on. Of course time are not integers in reality but to keep it simple.)
But now I would like to query (ideally) only up to an arbitrary number of rows from data_table per distinct user_id but still have those ordered (i.e. newest first). Is that even possible?
I know I can use GROUP BY user_id to only return 1 row per user, but then the ordering doesn't work and it seems kind of unpredictable which row will be in the result. I guess it's doable with a subquery, but I haven't figured it out yet.
To limit the number of rows in each GROUP is complicated. It is probably best done with an #variable to count, plus an outer query to throw out the rows beyond the limit.
My blog on Groupwise Max gives some hints of how to do such.

MySQL: optimize query for scoring calculation

I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.