Mysql Left join index - mysql

I used the following query
select a.*,b.* from firsttable a left join secondtable b on a.id=b.item_id ORDER BY a.id DESC LIMIT 0,10
To display items from two tables, where the id of the first table is the item_id of the second. My question is , when I try to display this in php , if I want to display a.id i can try with:
while($row=$go->fetch_assoc()){
echo $row['id'];
}
or
while($row=$go->fetch_assoc()){
echo $row['a.id'];
}
since both tables have id,on the first example displays only if there are matching rows from first and second table and displays the id of second table, I want the id of first
and on the second it says undefined index.
Can you explain why is this please?
Edit:
Adding tables for example
id
info
username
id
item.id
username

Both tables have a column that has the same name, so, when using select *, it is ambiguous to which column id relates to.
The only way to remove the ambiguity is to explicitly list all the columns you want to select, using aliases for homonyms:
select
a.id,
b.id b_id, -- alias for b.id
b.item_id
-- more columns here as needed
from firsttable a
left join secondtable b on a.id=b.item_id
order by a.id desc
limit 0,10
This is one of the many reasons why select * is generally considered a bad practice in SQL.
Recommend reading: What is the reason not to use select *?

Related

Speeding up select where column condition exists in another table without duplicates

If I have the following two tables:
Table "a" with 2 columns: id (int) [Primary Index], column1 [Indexed]
Table "b" with 3 columns: id_table_a (int),condition1 (int),condition2 (int) [all columns as Primary Index]
I can run the following query to select rows from Table a where Table b condition1 is 1
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id_table_a=a.id && condition1=1 LIMIT 1) ORDER BY a.column1 LIMIT 50
With a couple hundred million rows in both tables this query is very slow. If I do:
SELECT a.id FROM a INNER JOIN b ON a.id=b.id_table_a && b.condition1=1 ORDER BY a.column1 LIMIT 50
It is pretty much instant but if there are multiple matching rows in table b that match id_table_a then duplicates are returned. If I do a SELECT DISTINCT or GROUP BY a.id to remove duplicates the query becomes extremely slow.
Here is an SQLFiddle showing the example queries: http://sqlfiddle.com/#!9/35eb9e/10
Is there a way to make a join without duplicates fast in this case?
*Edited to show that INNER instead of LEFT join didn't make much of a difference
*Edited to show moving condition to join did not make much of a difference
*Edited to add LIMIT
*Edited to add ORDER BY
You can try with inner join and distinct
SELECT distinct a.id
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
but using distinct on select * be sure you don't distinct id that return wrong result in this case use
SELECT distinct col1, col2, col3 ....
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
You could also add a composite index with use also condtition1 eg: key(id, condition1)
if you can you could also perform a
ANALYZE TABLE table_name;
on both the table ..
and another technique is try to reverting the lead table
SELECT distinct a.id
FROM b INNER JOIN a ON a.id=b.id_table_a AND b.condition1=1
Using the most selective table for lead the query
Using this seem different the use of index http://sqlfiddle.com/#!9/35eb9e/15 (the last add a using where)
# USING DISTINCT TO REMOVE DUPLICATES without col and order
EXPLAIN
SELECT DISTINCT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
;
It looks like I found the answer.
SELECT a.id FROM a
INNER JOIN b ON
b.id_table_a=a.id &&
b.condition1=1 &&
b.condition2=(select b.condition2 from b WHERE b.id_table_a=a.id && b.condition1=1 LIMIT 1)
ORDER BY a.column1
LIMIT 5;
I don't know if there is a flaw in this or not, please let me know if so. If anyone has a way to compress this somehow I will gladly accept your answer.
SELECT id FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
Take the condition into the ON clause of the join, that way the index of table b can get used to filter. Also use INNER JOIN over LEFT JOIN
Then you should have less results which have to be grouped.
Wrap the fast version in a query that handles de-duping and limit:
SELECT DISTINCT * FROM (
SELECT a.id
FROM a
JOIN b ON a.id = b.id_table_a && b.condition1 = 1
) x
ORDER BY column1
LIMIT 50
We know the inner query is fast. The de-duping and ordering has to happen somewhere. This way it happens on the smallest rowset possible.
See SQLFiddle.
Option 2:
Try the following:
Create indexes as follows:
create index a_id_column1 on a(id, column1)
create index b_id_table_a_condition1 on b(a_table_a, condition1)
These are covering indexes - ones that contain all the columns you need for the query, which in turn means that index-only access to data can achieve the result.
Then try this:
SELECT * FROM (
SELECT a.id, MIN(a.column1) column1
FROM a
JOIN b ON a.id = b.id_table_a
AND b.condition1 = 1
GROUP BY a.id) x
ORDER BY column1
LIMIT 50
Use your fast query in a subselect and remove the duplicates in the outer select:
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
WHERE b.id_table_a > :offset
ORDER BY a.column1
LIMIT 50
) sub
Because of removing duplicates you might get less than 50 rows. Just repeat the query until you get anough rows. Start with :offset = 0. Use the last ID from last result as :offset in the following queries.
If you know your statistics, you can also use two limits. The limit in the inner query should be high enough to return 50 distinct rows with a probability which is high enough for you.
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
ORDER BY a.column1
LIMIT 1000
) sub
LIMIT 50
For example: If you have an average of 10 duplicates per ID, LIMIT 1000 in the inner query will return an average of 100 distinct rows. Its very unlikely that you get less than 50 rows.
If the condition2 column is a boolean, you know that you can have a maximum of two duplicates. In this case LIMIT 100 in the inner query would be enough.

BigQuery does not recognize field of a subselect in a join

I tried to combine id and first_time from table B with the time_record and type in table A, joining on id. but I got the error of
a.time_record is not a field of either table in the join
Any idea how I could fix it? I am pretty sure table A has such two columns. Below is the query I used.
select b.id, b.first_time as day0, a.time_record,a.type
from mydata.b as b
left join each
(select id
from table_date_range(mydata.b, timestamp("2016-01-20"),timestamp("2016-02-03"))
group by id)
as a
on a.id = b.id
Your subselect a does not have the field time_record. Try adding it to the subselect. (Same for a.type.)

Eliminating duplicates from SQL query

What would be the best way to return one item from each id instead of all of the other items within the table. Currently the query below returns all manufacturers
SELECT m.name
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
I have solved my question by using the DISTINCT value in my query:
SELECT DISTINCT m.name, m.id
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
ORDER BY m.name
there are 4 main ways I can think of to delete duplicate rows
method 1
delete all rows bigger than smallest or less than greatest rowid value. Example
delete from tableName a where rowid> (select min(rowid) from tableName b where a.key=b.key and a.key2=b.key2)
method 2
usually faster but you must recreate all indexes, constraints and triggers afterward..
pull all as distinct to new table then drop 1st table and rename new table to old table name
example.
create table t1 as select distinct * from t2; drop table t1; rename t2 to t1;
method 3
delete uing where exists based on rowid. example
delete from tableName a where exists(select 'x' from tableName b where a.key1=b.key1 and a.key2=b.key2 and b.rowid >a.rowid) Note if nulls are on column use nvl on column name.
method 4
collect first row for each key value and delete rows not in this set. Example
delete from tableName a where rowid not in(select min(rowid) from tableName b group by key1, key2)
note that you don't have to use nvl for method 4
Using DISTINCT often is a bad practice. It may be a sing that there is something wrong with your SELECT statement, or your data structure is not normalized.
In your case I would use this (in assumption that default_ps_products_manufacturers has unique records).
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE EXISTS (SELECT 1 FROM default_ps_products p WHERE p.manufacturer_id = m.id)
Or an equivalent query with IN:
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE m.id IN (SELECT p.manufacturer_id FROM default_ps_products p)
The only thing - between all possible queries it is better to select the one with the better execution plan. Which may depend on your vendor and/or physical structure, statistics, etc... of your data base.
I think in most cases EXISTS will work better.

NOT IN not work in my query

I have 2 tables data and page, in data table I have some records, and some data records id will be store in page table.
Now I want select id and title form data table which is not on page table.
So I wrote this query :
SELECT d.id,d.title
FROM data AS d, page AS p
WHERE d.id NOT IN (p.data_id)
ORDER BY d.title ASC
this query will work, but when page table is empty this query cannot restore records !
use LEFT JOIN
SELECT a.*
FROM data a
LEFT JOIN page b
ON a.ID = b.data_id
WHERE b.data_id IS NULL
ORDER BY a.title ASC
SQLFiddle Demo
SQLFiddle Demo (empty page table)
Here is it with subquery, but without join:
SELECT id, title
FROM data
WHERE id NOT IN (SELECT data_id FROM page)
ORDER BY title ASC
The NOT IN will give you what you want but depending on your database system (and the type of indexes) this will not be the best (fastest) solution. More often then not EXISTS will be faster. But your mileage may vary.
Give it a try:
SELECT id, title FROM data
WHERE NOT EXISTS (SELECT * FROM page WHERE page.data_id = data.id)
ORDER BY title ASC
I think you are trying to determine what data does not have a page:
SELECT d.id, d.title
FROM data d
WHERE d.id NOT IN (
SELECT data_id FROM page
)
ORDER BY d.title ASC;

ORDER BY in UNION query

I seen topics explaining this but in my case it does not work.
I have query
( SELECT * FROM my_table
left join table2 on table2.id = my_table.id
left join table3 on pension.age = my_table.age
WHERE table3.id IS NULL )
UNION
( SELECT * FROM my_table
left join table2 on table2.id = my_table.id
left join table3 on pension.age = my_table.age
WHERE my_table.id FROM 75 to 245 )
ORDER BY my_table.id ASC, table2.wage DESC, table3.compensation DESC
this does not work saying user_table. or table2. or table3. not found
when i remove it its saying
ORDER BY id ASC, wage DESC, compensation DESC
this somewhat works but not desired result. please assist
Is there part of the code missing? I see no reference to table_fired. Also, aren't those curly braces used as part of an outer join in a larger query? That's why I think there's a larger part of the query missing, which might be relevant.
SELECT * FROM my_table
left join table2 on table2.id = my_table.id
left join table3 on pension.age = my_table.age
WHERE my_table.id IS NULL OR my_table.id FROM 75 to 245
ORDER BY my_table.id ASC, table2.wage DESC, table3.compensation DESC
I replaced your "table_fired" with "my_table" and combined the two subselects into one.
The union operation requires that each of your two queries have exactly the same number of columns in their result set. In mysql, UNION will always use the column names from the frist query - so if the second query uses different column names, they will be mapped by order onto the columns that were defined by the first query.
Your ORDER BY will be applied after the UNION has been run, and so it can only refer to columns that are in the result set of the UNION. These columns are not qualified by table identifiers from the constituent queries (that's why removing the table identifiers from your ORDER BY clause gets rid of the explicit errors).
Beyond that, the problem is likely that your component queries produce multiple columns that have the same name, and are distinguishable only by their table identifiers (for example my_table.id and table2.id). When you use ORDER BY id ASC ..., which of those "id" fields will be used?
Solve this problem by replacing the * with an explicit list of the relevant columns for each of the two component queries. Ensure that each column you select is given a unique name. For Example:
SELECT
my_table.id as my_table_id,
table2.id as table2_id,
table2.compensation as compensation,
table3.wage as wage
...
Your union will then pick up distinctly named columns, and your order by clause would need to refer to those instead of the table-qualified columns in the original queries:
ORDER BY my_table_id ASC, wage DESC, compensation DESC