mysql left join, limit and sorting - mysql

I've a doubt. I need to make a left join between two tables and get only the first result (I mean the first record on table A that doesn't match nothing on table B).
This is an example
create table a (
id int not null auto_increment primary key,
name varchar(50),
surname varchar(50),
prov char(2)
) engine = myisam;
insert into a (name,surname,prov)
values ('aaa','aaa','ss'),('bbb','bbb','ca'),('ccc','ccc','mi'),('ddd','ddd','mi'),('eee','eee','to'),
('fff','fff','mi'),('ggg','ggg','ss'),('hhh','hhh','mi'),('jjj','jjj','ss'),('kkk','kkk','to');
create table b (
id int not null auto_increment primary key,
id_name int
) engine = myisam;
insert into b (id_name) values (3),(4),(8),(5),(10),(1);
Query A:
select a.*
from a
left join b
on a.id = b.id_name
where b.id_name is null and a.prov = 'ss'
order by a.id
limit 1
Query B:
select a.*
from a
left join b
on a.id = b.id_name
where b.id_name is null and a.prov = 'ss'
limit 1
Both queries gives me right result, that is record with id = 7.
I want to know if I can rely on query B even without specifing sorting on id or if it's just a case that I get the right result.
I ask that because on large recordset (more than 10 millions of rows), the query without sorting gives me one record immediately while applying sorting it takes even more than 20 seconds even though a.id is primary key.
Thanks in advance.

You can't rely on query B. Mysql just returned what it found faster to return.

Is there an index on table "b" on column "id_name"? If no, then create it and tell us what You get (I mean how fast) It doesn't matter You are looking for not matched rows, JOIN has to be made before it can test if there is match or not.

Related

What is auto_increment fields values order for MySQL INSERT .. SELECT statement

Lets say we have next table:
CREATE TABLE test_insert_order (
id INT(11) NOT NULL AUTO_INCREMENT,
parent_id INT(11) NOT NULL,
name VARCHAR(20),
PRIMARY KEY (id)
);
With some data like
INSERT INTO test_insert_order (parent_id, name) VALUES (1, 'a'),(1, 'b'),(1,'c'),(2,'b'),(2,'d'),(2,'a'),(3,'d'),(3,'a'),(4,'aa'),(5,'bb'),(6,'a'),(3,'a'),(1,'d'),(2,'c');
If we do
INSERT INTO test_insert_order (parent_id, name) SELECT 7, `name` FROM test_insert_order WHERE parent_id = 2 ORDER BY id;
Can we assume that new auto_generated ids will be in the same order as ids in result of select
SELECT id, 7, `name` FROM test_insert_order WHERE parent_id = 2 ORDER BY id;
So in result next query a.name will always match b.name
SET #i:=0;set #j:=0;
SELECT a.id, a.parent_id, a.name, a.order_id, b.id, b.parent_id, b.name, b.order_id FROM
(SELECT *, #j:=#j+1 as order_id FROM test_insert_order WHERE parent_id = 2 ORDER BY id) a,
(SELECT *, #i:=#i+1 as order_id FROM test_insert_order WHERE parent_id = 7 ORDER BY id) b
WHERE a.order_id = b.order_id;
I have made a few tests with concurrent treads and it is always true. But I can not find anything in MySQL docs about this situation.
UPDATE:
I guess this could be not true in some cluster solutions when a few instances have own pattern for autoincrement value, and one is lagging behind and query execution get distributed between then some how. But I do not have environment to check this.
After more research I will answer my own question.
I guess in most situations this will be true, but I was able to find cases when this algorithm could cause problem. First is mentioned in my UPDATE to the question regarding clusters solutions. Second, I can imagine is situation when table has gaps in id - auto_increment field (initially started from 1000000 not 0) and during execution of insert statement auto_increment value manually changed to lower value. So this will break auto_increment pattern.
I would suggest instead use order by some meaningful fields that we can predict uniqueness. If there are non, then there are not difference which from 2 identical record that was just inserted to use.
Regarding question from title. Auto_increment values in single bulk insert in single MySQL instance usually will be in growing order, but there are cases when it could be interrupted with auto_increment value changed to lower. On cluster solution it depend on cluster implementation, and most likely will be not predictable too.

Query execution time increases after adding order by

I use following query to fetch around data from 15 columns from four tables
SELECT company_table.some_values,
log_table.somevalues,
employee_table.somevalues,
manager_table.somevalues
FROM company_table
JOIN log_table
ON company_table.id = log_table.id
AND company_table.department = log_table.dept
JOIN employee_table
ON employee_table.id = log_table.id
AND employee_table.year = log_table.joining
AND employee_table.department = log_table.dept
JOIN manager_table
ON manager_table.id = log_table.id
AND manager_table.year = log_table.joining
AND manager_table.department = log_table.dept
ORDER BY log_table.id DESC
Output is correct but it takes more time to execute. But if I remove order by then execution time is reduced by considerable amount. I tried with order by ascending still it takes more time
I suspect your tables are not designed correctly.
I think you have recurring values in your log_table.id column, because I suspect it doesnt truly have a unique index for a primary key, or a correct primary key, since its id column doubles up as the company id also. This is why I have this notion:
ON company_table.id = log_table.id
AND company_table.department = log_table.dept
If either or both of these columns are/make up your primary key, which I suspect they are not, they are not the correct choice.
So it will do a good job of retrieving things in the order they are found, but a little extra work has to be done when ordering because you can potentially have colliding values, that and you have to join on 2 columns.
If what above is true, you can try this. Backup before trying, or try on dev.
Try adding a new column for a primary key and ordering on that:
ALTER TABLE log_table DROP PRIMARY KEY;
ALTER TABLE log_table ADD pk_column INT AUTO_INCREMENT PRIMARY KEY;
then ORDER BY pk_column DESC.
If you have specified another column in log_table as your primary key, order by on that instead.

Select records based on multiple value of foreign key

i am currently writing query. i want to select all records from table . records will be based on mutiple values of foreign key. for example all records related to 1 and 2 both
eg. table might have
id name uid
1 bil 3
2 test 3
3 test 4
4 test 4
5 bil 5
6 bil 5
i want to select all records related to 3 but also related to 4 in this case it is record number 2
SELECT id
FROM `table`
WHERE uid = value1 AND like_id
IN (SELECT like_id
FROM likes
WHERE uid = uid2)
LIMIT 0 , 30
It's not at all clear where "value1" is coming from, or "uid2" is coming from, or where the column "like_id" is coming from. Those column names do not appear in your sample table. Your example query references two different table names (table and likes), yet you only show data for one example table, and that table does not have a column named like_id.
If we assume that "value1" and "uid2" in your query are literals, or bind parameters supplied to the query, which seems to be reasonable, given your specification (variously), of values of 1,2,3 and 4. But we're still left with "like_id" column. Given that it's referenced in the SELECT list of the IN subquery, we're going to presume that's a column in the "likes" table, and given that it's referenced in the outer query, we're going to assume that it's a column in the (unfortunately named) table table.
(Bottomline, it's not at all clear how your query is returning a "correct" result, given that you've made it impossible to replicate a working test case.)
Given a single table, as shown in your example data, e.g.
CREATE TABLE likes (id INT, name VARCHAR(4), uid INT);
INSERT INTO likes VALUES (1,'bil',3),(2,'test',3),(3,'test',4)
,(4,'test',4),(5,'bil',5),(6,'bil',5);
ALTER TABLE likes ADD PRIMARY KEY (id);
ALTER TABLE likes ADD CONSTRAINT likes_ix UNIQUE KEY (uid, name);
Assuming that we're running a query against that single table, and that we're matching "likes" associated with uid=3 to "likes" associated with uid=4, and that the matching is done on the "name" column, then
SELECT t.id
FROM `likes` t
WHERE t.uid = 3
AND EXISTS
( SELECT 1
FROM `likes` s
WHERE s.name = t.name
AND s.uid = 4
)
That will return the id of the row from the likes table for uid=3 where we also find a row in the likes table for uid=4 with a matching name value.
Given a limited number of rows to be inspected from the likes table on the outer query, that gives a limited number of times a correlated subquery would need to be run, which should give reasonable performance:
For large sets, a join operation generally performs better to return an equivalent result:
SELECT t.id
FROM `likes` t
JOIN `likes` s
ON s.name = t.name
AND s.uid = 4
WHERE t.uid = 3
GROUP
BY t.id
The key to optimum performance for either query is going to be appropriate indexes.

Generating a histogram from mysql data

I was wondering if anyone had some advice for me regarding a histogram-generating query. I have a query that I like (in that it works), but it is extremely slow. Here is the background:
I have a table of metadata, a table of data values where one row in meta_data is a key-row for many (perhaps several thousand) rows in data_values, and a table of histogram bin information:
create table meta_data (
id int not null primary key,
name varchar(100),
other_data char(10)
);
create table data_values (
id int not null primary key,
meta_data_id int not null,
data_value real
);
create table histogram_bins (
id int not null primary key,
bin_min real,
bin_max real,
bin_center real,
bin_size real
);
And a query that creates the histogram:
SELECT md.name AS `Name`,
md.other_data AS `OtherData`,
hist.bin_center AS `Bin`,
SUM(data.data_value BETWEEN hist.bin_min AND hist.bin_max) AS `Frequency`
FROM histogram_bins hist
LEFT JOIN data_values data ON 1 = 1
LEFT JOIN meta_data md ON md.id = data.meta_data_id
GROUP BY md.id, `Bin`;
In an earlier version of this query, the BETWEEN ... AND logical statement was down in the JOIN (replacing 1 = 1), but then I would only receive histogram rows with non-zero frequency. I need rows for all of the bins (even the zero-frequency ones), for analysis purposes.
Its pretty darn slow, to the tune of 10-15 minutes or so. The data_values table has about 7.9 million rows, and meta_data weighs in at 15,900 rows -- so maybe it is just going to take a long time!
Thanks very much!
I think this might help
SELECT h.bin_center AS `Bin`,
ISNULL(F.Frequency,0) AS `Frequency`
FROM histogram_bins h
LEFT JOIN
(SELECT hist.bin_center AS `Bin`,
COUNT(data_values) AS `Frequency`
FROM data_values data
LEFT JOIN histogram_bins hist ON data.data_value BETWEEN hist.bin_min AND hist.bin_max
GROUP BY md.name, md.other_data, hist.bin_center) F ON F.bin_center = h.bin_center
I changed the order of the tables because I think it's best to find the corresponding bin for every record in the data and then just count how many there are grouped by bin

Update to table joined on composite key

I am trying to update rows in a data table that intersect rows in a smaller index table. The two tables are joined on the composite PK of the data table, and explain-select using the same criteria shows that the index is being used properly, and the correct unique rows are fetched - but I'm still having issues with the update.
The update on the joined tables works fine when there's only 1 row in the temp table, but when I have more rows, I get MySql Error 1175, and none of the WHERE conditions I specify are recognized.
I'm aware that I can just switch off safe mode with SET SQL_SAFE_UPDATES=0, but can anyone tell me what I'm not understanding here? Why is my WHERE condition not accepted, and why does it even need a where when I'm doing a NATURAL JOIN - and why does this work with only one row in the right-hand-side table (MyTempTable)?
The Code
Below is vastly simplified, but structurally identical create table & updates representing my problem.
-- The Data Table.
Create Table MyDataTable
(
KeyPartOne int not null,
KeyPartTwo varchar(64) not null,
KeyPartThree int not null,
RelevantData varchar(200) null,
Primary key (KeyPartOne, KeyPartTwo, KeyPartThree)
) Engine=InnoDB;
-- The 'Temp' table.
Create Table MyTempTable
(
KeyPartOne int not null,
KeyPartTwo varchar(64) not null,
KeyPartThree int not null,
Primary key (KeyPartOne, KeyPartTwo, KeyPartThree)
)Engine=Memory;
-- The Update Query (works fine with only 1 row in Temp table)
update MyDataTable natural join MyTempTable
set RelevantData = 'Something Meaningful';
-- Specifying 'where' - roduces same effect as the other update query
update MyDataTable mdt join MyTempTable mtt
on mdt.KeyPartOne = mtt.KeyPartOne
and mdt.KeyPartTwo = mtt.KeyPartTwo
and mdt.KeyPartThree = mtt.KeyPartThree
set RelevantData = 'Something Meaningful'
where mdt.KeyPartOne = mtt.KeyPartOne
and mdt.KeyPartTwo = mtt.KeyPartTwo
and mdt.KeyPartThree = mtt.KeyPartThree;
P.S. Both of the above update statements work as expected when the temp table contains only one row, but give me the error when there's more than one row. I'm seriously curious about why!
In your first UPDATE query, you use NATURAL JOIN, which is the same as NATURAL LEFT JOIN.
In your second UPDATE query, you use JOIN, which is the same as INNER JOIN.
A LEFT JOIN is not the same as an INNER JOIN, and a NATURAL JOIN is not the same as a JOIN.
Not sure what you're trying to do, but if you are trying to update all rows in MyDataTable where a corresponding entry exists in MyTempTable, this query should do the trick:
UPDATE
myDataTable mdt
INNER JOIN MyTempTable mtt ON
mdt.KeyPartOne = mtt.KeyPartOne
AND mdt.KeyPartTwo = mtt.KeyPartTwo
AND mdt.KeyPartThree = mtt.KeyPartThree
SET
mdt.RelevantData = 'Something Meaningful'
If that's not what you're trying to do, please clarify and I will update my answer.
Per the MySql forum, the update queries are valid, and the fact that they don't work in Workbench with safe-update mode turned on does not indicate that there's anything wrong with the index. It's just a quirk of Workbench's "don't-shoot-yourself-in-the-foot" mode. :-)