Eliminating duplicates from SQL query - mysql

What would be the best way to return one item from each id instead of all of the other items within the table. Currently the query below returns all manufacturers
SELECT m.name
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id

I have solved my question by using the DISTINCT value in my query:
SELECT DISTINCT m.name, m.id
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
ORDER BY m.name

there are 4 main ways I can think of to delete duplicate rows
method 1
delete all rows bigger than smallest or less than greatest rowid value. Example
delete from tableName a where rowid> (select min(rowid) from tableName b where a.key=b.key and a.key2=b.key2)
method 2
usually faster but you must recreate all indexes, constraints and triggers afterward..
pull all as distinct to new table then drop 1st table and rename new table to old table name
example.
create table t1 as select distinct * from t2; drop table t1; rename t2 to t1;
method 3
delete uing where exists based on rowid. example
delete from tableName a where exists(select 'x' from tableName b where a.key1=b.key1 and a.key2=b.key2 and b.rowid >a.rowid) Note if nulls are on column use nvl on column name.
method 4
collect first row for each key value and delete rows not in this set. Example
delete from tableName a where rowid not in(select min(rowid) from tableName b group by key1, key2)
note that you don't have to use nvl for method 4

Using DISTINCT often is a bad practice. It may be a sing that there is something wrong with your SELECT statement, or your data structure is not normalized.
In your case I would use this (in assumption that default_ps_products_manufacturers has unique records).
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE EXISTS (SELECT 1 FROM default_ps_products p WHERE p.manufacturer_id = m.id)
Or an equivalent query with IN:
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE m.id IN (SELECT p.manufacturer_id FROM default_ps_products p)
The only thing - between all possible queries it is better to select the one with the better execution plan. Which may depend on your vendor and/or physical structure, statistics, etc... of your data base.
I think in most cases EXISTS will work better.

Related

Save the intermediate result of SQL query

I am wondering if there is any way to save the intermediate result or tables in SQL. For example assume you have two different SQL statements that in the first statement you join two tables, then you want to see how many rows the resulting table has. I know there are many ways to do this but I am interested in seeing how this can be done sequentially. Consider the following example:
select * from order_table left join customer_table on order_table.id = customer_table.id
Then I want to see count of number of rows (as an easy example)
select count(*) from table
But I do not know what this table should be. How may I save the result of above query in some logical table or how to refer to what was created before in SQL.
You can use WITH like below:
WITH resultTable as ( select * from order_table left join customer_table on order_table.id = customer_table.id )
select count(*) from resultTable
For this particular example you can simply wrap the original query in a sub-query:
select count(*)
from (
select *
from order_table
left join customer_table on order_table.id = customer_table.id
) as x
If you want to store the result in a physical table (temporary or permanent) then the procedure varies for each rdbms. In SQL Server for example you would use SELECT INTO:
select *
into #temp_table
from order_table
left join customer_table on order_table.id = customer_table.id
you can also use CTE. for your question it will be:
;
with table1 as (
select * from order_table
left join customer_table on order_table.id = customer_table.id
)
select count(*) from table1
GO

Check if table a primary key is exist in table b

Table A:
ID, Name, etc.
Table B:
ID, TableA-ID.
SELECT * FROM A;
and I want to return a boolean value in the same result for this condition ( if A.ID Exists in Table B).
There are several ways of achieving what you need. Below are three possibilities. These all differ in execution plans and how database actually wants to execute them so depending on your record count one may be more efficient than the other. It's better if you see it for yourself.
1) Use LEFT JOIN and check if a non-null field from B is not null to ensure the record exists. Then apply DISTINCT clause if relationship is 1:N to only show rows from A without duplicates.
select distinct a.*, b.id is not null as exists_b
from a
left join b on
a.id = b.tablea-id
2) Use exists() function, which will be evaluated for each row being returned from table A.
select a.*, exists(select 1 from b where a.id = b.tablea-id) as exists_b
from a
3) Use a combination of subquery expression EXISTS and it's contradiction in two queries to check if a record has or has not a match within table B. Then UNION ALL to combine both results into one.
select *, true as exists_b
from a
where exists (
select 1
from b
where a.id = b.tablea-id
)
union all
select *, false as exists_b
from a
where not exists (
select 1
from b
where a.id = b.tablea-id
)
select A.*, IFNULL((select 1 from B where B.TableA-ID = A.ID limit 1),0) as `exists` from A;
The above statement will result in a 1, if the key exists, and a 0 if that key does not exist. Limit 1 is important if there are multiple records in B

sql stored procedure with out duplicates name

How can I create a query to SELECT ALL DB WITHOUT duplicates
Like (old DB that is no longer in use c,f,g. basically if it does have eur and has an original name than it is relevant):
a
b
c
ceur
d
f
feur
g
geur
I need it to be like:
a
b
ceur
d
feur
geur
Many thanks...
SELECT DISTINCT
is what you're looking for. See more here.
For instance, let's say you have a table that contains the following rows:
name, city, address, country.
You now wish to get the countries that has been stored, without duplicates. Multiple people might come from the same country, and so the table would most likely have duplicate entries of that country.
How you achieve this is by using the SELECT DISTINCT.
Example:
SELECT DISTINCT country FROM table_name;
What this will do is retreive the country row without duplicates. That way, you can see which countries are actually stored in that table without duplicates.
If you have multiple databases (I don't know if that's what you were getting at), then you will need to perform a JOIN on the relevant tables, given you have access to them all. I would recommend doing a LEFT JOIN if you are to join more than just 1 extra table.
Example:
SELECT DISTINCT table_name.row_name, table_name.row_name2, table_name.row_name3
FROM table_name
LEFT JOIN table_name2 ON table_name.row_name = table_name2.row_name
LEFT JOIN table_name3 ON table_name2.row_name = table_name3.row_name
[...]
WHERE table.row_name = 'value';
Can you query information_schema.TABLES and distinct in the select, plus a predicate to filter out whatever you don't want?
You can do:
select t.*
from t
where name like '%eur'
union all
select t.*
from t
where not like '%eur' and
not exists (select 1 from t t2 where t2.name = concat(t.name, 'eur');

MySql - Multitable - AND is not the correct choice, but what is?

I have two tables:
mytable1
UserId (int) (primary_key)
Save (blob)
mytable2
UserId (int) (primary_key)
Save (blob)
I make the following mysql command:
UPDATE mytable1 tb1, mytable2 tb2 SET tb1.Save='', tb2 .Save='' WHERE tb1.UserId=25 AND dbSv1.UserId=25
When both tables have a user with UserId = 25, then this works and Save is set to ''. However, if one table does not have a user with UserId = 25, but the other one does, then Save is not set to '' in the one that does. This is not the behaviour I want.
OR is not the thing to use, as other Saves will be set to '' which do not have an UserId of 25. So what do I need?
Your query is using the old-school comma syntax for a join operation. (There's some problems in the SQL... dbSv1 is used as a qualifier, but it doesn't appear as a table name or table alias. We're going to assume that was supposed to be tb2.
Your query is equivalent to:
UPDATE mytable1 tb1
JOIN mytable2 tb2
SET tb1.save=''
, tb2.save=''
WHERE tb1.userid=25
AND tb2.userid=25
If a matching row is not found in either tb1 or tb2, the the JOIN operation will produce an empty set. This is expected behavior.
Consider the result set returned from this query:
SELECT tb1.userid
, tb2.userid
FROM mytable1 tb1
JOIN mytable2 tb2
WHERE tb1.userid=25
AND tb2.userid=25
when there are no rows in tb2 that satisfy the predicates, the query won't return any rows.
You could use an "outer" join to make returning rows from one of the tables optional. For example, to update mytable1 even when no matching rows exist in mytable2...
UPDATE mytable1 tb1
LEFT
JOIN mytable2 tb2
ON tb2.userid=25
SET tb1.save=''
, tb2.save=''
WHERE tb1.userid=25
If there are no rows in mytable1 that have userid=25, then this won't update any rows.
MySQL doesn't support FULL OUTER JOIN. But you try something like this, using an inline view to return a row, and then performing outer joins to both mytable1 and mytable2...
UPDATE ( SELECT 25 + 0 AS userid ) i
LEFT
JOIN mytable1 tb1
ON tb1.userid = i.userid
LEFT
JOIN mytable2 tb2
ON tb2.userid = i.userid
SET tb1.save = ''
, tb2.save = ''
SQLFiddle demonstration: http://sqlfiddle.com/#!9/6f8598/1
FOLLOWUP
A "join" is a common SQL operation. You shouldn't have any trouble finding out information about what that is what it does.
The "+ 0" isn't strictly necessary. It's a convenient shorthand in MySQL to CAST to numeric. As a test, see what MySQL returns for this:
SELECT '25' + 0
, '25xyz' + 0
, 'abc' + 0
The purpose of the inline view was to return a single row. We could have written the query to hardcode the user_id two times, and ignore what's returned from the line view ....
SELECT t1.user_id AS t1_user_id
, t2.user_id AS t2_user_id
FROM ( SELECT 'foo' AS dontcare ) i
LEFT
JOIN mytable1 t1
ON t1.user_id = 25
LEFT
JOIN mytable t2
ON t2.user_id = 25
My preference is to make it more clear that our intent is for both of the values to be the same. We could code where one of them is 23 and the other is 27. That's syntactically valid to do that. When we convert this to a prepared statement with bind placeholders...
SELECT t1.user_id AS t1_user_id
, t2.user_id AS t2_user_id
FROM ( SELECT 'foo' AS dontcare ) i
LEFT
JOIN mytable1 t1
ON t1.user_id = ?
LEFT
JOIN mytable t2
ON t2.user_id = ?
We kind of "lose" the idea that those two values are the same. To get that hardcoded value specified only one time, I have the inline view return the value we want to "match" in the ON clause of the outer joins.
SELECT t1.user_id AS t1_user_id
, t2.user_id AS t2_user_id
FROM ( SELECT ? AS user_id ) i
LEFT
JOIN mytable1 t1
ON t1.user_id = i.user_id
LEFT
JOIN mytable t2
ON t2.user_id = i.user_id
Now my intent is more clear... I'm looking for "one" user_id value. Adding the "+ 0" indicates that whatever value gets passed in (e.g. '25', 'foo', or whatever), my statement is going to interpret that as a numeric value.
inline view
I used the term "inline view". That's just a SELECT query used in a context where we usually have a table.
e.g. if i have a table named mine, i can write a query...
SELECT m.id, m.name FROM mine m
test it and see that it returns rows, yada, yada.
I can also do this: wrap that query in parens and reference it in place of a table in another statement, like this...
SELECT t.*
FROM ( SELECT m.id, m.name FROM mine m ) t
MySQL requires that we assign an alias to that, like we can do if it were a table. We call that an inline view because it's similar to the pattern we use for a stored view. Let's look at a demonstration of doing that.
(This is just a demonstration of the pattern; there's some reasons we wouldn't want to do this.)
CREATE VIEW myview
AS
SELECT m.id, m.name FROM mine m
;
Then we can do this:
SELECT t.* FROM myview t
With the inline view we're following the same pattern, but we're bypassing a separate create view statement. (That's a DDL statement that causes an implicit commit, and creating a database object.) Bypassing that, we're effectively creating a view that exists only in the context of the statement, and doing that "inline", within the statement.
SELECT t.* FROM ( SELECT m.id, m.name FROM mine m ) t
The MySQL documentation refers to the inline view as a "derived table". If we (accidentally) forget the alias, the error we get back says something like "every derived table must have a alias". The more general term, used for databases other than MySQL is "inline view".

Create a VIEW where a record in t1 is not present in t2 ? Confirmation on Union/Left Join/Inner Join?

I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;