Mysql, EXIST / NOT EXIST, asterisk as column name - mysql

An example from a book about MySql:
SELECT vendor_id, vendor_name, vendor_state
FROM vendors
WHERE NOT EXISTS
(SELECT *
FROM invoices
WHERE vendor_id = vendors.vendor_id)
"In this example, the correlated subquery selects all invoices that have the same vendor_id value as the current vendor in the outer query. Because the subquery doesn't actually return a result set, it doesn't matter what columns are included in the SELECT clause. As a result it's customary to just code an asterisk."
The invoices table has like 10 separate columns which look like this: http://prntscr.com/h3106k
I am not fully understanding the asterisk part. Since there is 10 separate columns in this table is it not possible that some columns will be empty (or not empty) and we can check for that? There is no use of checking individual columns, and it only makes sense to check a table as a whole (so nothing else that the asterisk is needed here)?

In this example, there is no row satisfying the condition (WHERE …=…). So, it is not important which column is checked as there is no row to check at all.
An alternative would be the following clause, maybe it is easier to understand:
SELECT vendor_id, vendor_name, vendor_state
FROM vendors
WHERE
(
SELECT COUNT(vendor_id)
FROM invoices
WHERE vendor_id = vendors.vendor_id
) = 0

Related

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

Subquery returns more rows than straight same query in MySQL

I want to remove duplicates based on the combination of listings.product_id and listings.channel_listing_id
This simple query returns 400.000 rows (the id's of the rows I want to keep):
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
While this variation returns 1.600.000 rows, which are all records on the table, not only is_verified = 0:
SELECT *
FROM (
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
) AS keepem
I'd expect them to return the same amount of rows.
What's the reason for this? How can I avoid it (in order to use the subselect in the where condition of the DELETE statement)?
EDIT: I found that doing a SELECT DISTINCT in the outer SELECT "fixes" it (it returns 400.000 records as it should). I'm still not sure if I should trust this subquery, for there is no DISTINCT in the DELETE statement.
EDIT 2: Seems to be just a bug in the way phpMyAdmin reports the total count of the rows.
Your query as it stands is ambiguous. Suppose you have two listings with the same product_id and channel_id. Then what id is supposed to be returned? The first, the second? Or both, ignoring the GROUP request?
What if there is more than one id with different product and channel ids?
Try removing the ambiguity by selecting MAX(id) AS id and adding DISTINCT.
Are there any foreign keys to worry about? If not, you could pour the original table into a copy, empty the original and copy back in it the non-duplicates only. Messier, but you only do SELECTs or DELETEs guaranteed to succeed, and you also get to keep a backup.
Assign aliases in order to avoid field reference ambiguity:
SELECT
keepem.*
FROM
(
SELECT
innerStat.id
FROM
`listings` AS innerStat
WHERE
innerStat.is_verified = 0
GROUP BY
innerStat.product_id,
innerStat.channel_listing_id
) AS keepem

Optimise query sql on mysql

I have a query like this:
DELETE FROM doublon WHERE id in
( Select id from `doublon` where `id` not in
( Select id
From `doublon`
group by etablissement_id,amenities_id
having Count(etablissement_id) > 1 and Count(amenities_id) > 1
union
Select id
From `doublon`
group by etablissement_id,amenities_id
having Count(etablissement_id) = 1 and Count(amenities_id) = 1
)
)
My table 'doublon' is structured like that:
id
etablissement_id
amenities_id
The structure table it's like this:
http://hpics.li/bbb5eda
I have 2 millions rows and the query is to slow , many hours..
Anybody know how to optimize this query to execute that faster ?
SqlFiddle
Your query is not correct, in the first place. But keep reading, it's possible that by the end of the answer I discovered the reason you need such a strange query.
Let's discuss the last subquery:
Select id
From `doublon`
group by etablissement_id,amenities_id
having Count(etablissement_id) = 1 and Count(amenities_id) = 1
You can use a column in the SELECT clause of a query that has GROUP BY only if at least one of the following happens:
it is present in the GROUP BY clause too;
it is used as an argument of an aggregate function;
the value of that column is functionally dependent of the values of the columns that are present in the GROUP BY clause; for example, if a column that has an UNIQUE index is present (or all the columns that are present in an UNIQUE index of the table).
The column id doesn't fit in any of the cases above1. This makes the query illegal according to the SQL specification.
MySQL, however, accepts it and struggles to produce a result set for it but it says in the documentation:
... the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
The HAVING clause contains Count(etablissement_id) and Count(amenities_id). When etablissement_id and amenities_id are both not-NULL then these two expressions have the same value and that is the same as COUNT(*) (the number of rows in the group). And it is always greater than 0 (a group cannot contain 0 rows).
For the groups generated when etablissement_id or amenities_id is NULL the corresponding COUNT() returns 0. This applies also when both are NULL on the same time.
Using this information, this query returns the ids of rows whose combination (etablissement_id, amenities_id) is unique in the table (the groups contain only one row) and both fields are not NULL.
The other GROUP BY query (that is UNION-ed with this one) returns indeterminate values from the groups of rows whose combination (etablissement_id, amenities_id) is not unique in the table (and both fields are not NULL), as explained in the fragment quoted from the documentation.
It seems the UNION picks one (random) id from each group of (etablissement_id, amenities_id) where both etablissement_id and amenities_id are not-NULL. The outer SELECT intends to ignore the ids chosen by the UNION and provide to DELETE the rest of them.
(I think the intermediate SELECT is not even needed, you could use its WHERE clause in the DELETE query).
The only reason I could imagine you need to run this query is that table doublon is the correspondence table of a many-to-many relationship that was created without an UNIQUE index on (etablissement_id, amenities_id)(the FOREIGN KEY columns imported from the related tables).
If this is your intention then there are simpler ways to achieve this goal.
I would create a duplicate of the doublon table, with the correct structure then I would use an INSERT ... SELECT query with DISTINCT to get from the old table the needed values. Then I would swap the tables and remove the old one.
The queries:
# Create the new table
CREATE TABLE `doublon_fixed` LIKE `doublon`;
# Add the needed UNIQUE INDEX
ALTER TABLE `doublon_fixed`
ADD UNIQUE INDEX `etablissement_amenities`(`etablissement_id`, `amenities_id`);
# Copy the needed values
INSERT INTO `doublon_fixed` (`etablissement_id`, `amenities_id`)
SELECT DISTINCT `etablissement_id`, `amenities_id`
FROM `doublon`;
# Swap the tables
RENAME TABLE `doublon` TO `doublon_old`, `doublon_fixed` TO `doublon`;
# Remove the old table
DROP TABLE `doublon_old`;
The RENAME query atomically operates the renames, from left to right. It is useful to avoid downtime.
Notes:
1 If the id column is functionally dependent on the (etablissement_id, amenities_id) pair then all the groups produced by the UNION-ed queries contain a single row. The first SELECT won't produce any result and the second SELECT will return the entire table).
If am not wrong this should work
DELETE FROM doublon
WHERE id IN (SELECT id
FROM doublon
WHERE id NOT IN (SELECT id
FROM doublon
GROUP BY etablissement_id,
amenities_id
HAVING Count(etablissement_id) >= 1
AND Count(amenities_id) >= 1))

How do I use distinct for a column along with a where clause in sql server 2008?

I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.
The query is:
SELECT DISTINCT
ShoppingSessionId, userid
FROM
dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId, userid
HAVING
userid = 7
This query produces correct result, but if we add another column then result is wrong.
Please help me as I want to use the ShoppingSessionId as a distinct, except when I want to use all the columns from the table, including with the where clause .
How can I do that?
The DISTINCT keyword applies to the entire row, never to a column.
Presently DISTINCT is not needed at all, because your script already makes sure that ShoppingSession is distinct: by specifying the column in GROUP BY and filtering on the other grouping column (userid).
When you add a third column to GROUP BY and it results in duplicated ShoppingSession, it means that some ShoppingSession values are associated with many different values of the added column.
If you want ShoppingSession to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX() function to that column, or MIN() or any other suitable aggregate function. Note that the column should not be included in GROUP BY in this case.
Here's an illustration of what I'm talking about:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId,
userid
HAVING userid = 7
There's one more note on your query. The HAVING clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE clause instead:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId,
userid
Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.
You could go even further and exclude the userid column from GROUP BY and pull its value with an aggregate function:
SELECT
ShoppingSessionId,
MAX(userid) AS userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId
Since all userid values in your output are supposed to contain 7 (because that's in your filter), you can just pick a maximum value per every ShoppingSession, knowing that it'll always be 7.

What does it mean by select 1 from table?

I have seen many queries with something as follows.
Select 1
From table
What does this 1 mean, how will it be executed and, what will it return?
Also, in what type of scenarios, can this be used?
select 1 from table will return the constant 1 for every row of the table. It's useful when you want to cheaply determine if record matches your where clause and/or join.
SELECT 1 FROM TABLE_NAME means, "Return 1 from the table". It is pretty unremarkable on its own, so normally it will be used with WHERE and often EXISTS (as #gbn notes, this is not necessarily best practice, it is, however, common enough to be noted, even if it isn't really meaningful (that said, I will use it because others use it and it is "more obvious" immediately. Of course, that might be a viscous chicken vs. egg issue, but I don't generally dwell)).
SELECT * FROM TABLE1 T1 WHERE EXISTS (
SELECT 1 FROM TABLE2 T2 WHERE T1.ID= T2.ID
);
Basically, the above will return everything from table 1 which has a corresponding ID from table 2. (This is a contrived example, obviously, but I believe it conveys the idea. Personally, I would probably do the above as SELECT * FROM TABLE1 T1 WHERE ID IN (SELECT ID FROM TABLE2); as I view that as FAR more explicit to the reader unless there were a circumstantially compelling reason not to).
EDIT
There actually is one case which I forgot about until just now. In the case where you are trying to determine existence of a value in the database from an outside language, sometimes SELECT 1 FROM TABLE_NAME will be used. This does not offer significant benefit over selecting an individual column, but, depending on implementation, it may offer substantial gains over doing a SELECT *, simply because it is often the case that the more columns that the DB returns to a language, the larger the data structure, which in turn mean that more time will be taken.
If you mean something like
SELECT * FROM AnotherTable
WHERE EXISTS (SELECT 1 FROM table WHERE...)
then it's a myth that the 1 is better than
SELECT * FROM AnotherTable
WHERE EXISTS (SELECT * FROM table WHERE...)
The 1 or * in the EXISTS is ignored and you can write this as per Page 191 of the ANSI SQL 1992 Standard:
SELECT * FROM AnotherTable
WHERE EXISTS (SELECT 1/0 FROM table WHERE...)
it does what it says - it will always return the integer 1. It's used to check whether a record matching your where clause exists.
select 1 from table is used by some databases as a query to test a connection to see if it's alive, often used when retrieving or returning a connection to / from a connection pool.
The result is 1 for every record in the table.
To be slightly more specific, you would use this to do
SELECT 1 FROM MyUserTable WHERE user_id = 33487
instead of doing
SELECT * FROM MyUserTable WHERE user_id = 33487
because you don't care about looking at the results. Asking for the number 1 is very easy for the database (since it doesn't have to do any look-ups).
Although it is not widely known, a query can have a HAVING clause without a GROUP BY clause.
In such circumstances, the HAVING clause is applied to the entire set. Clearly, the SELECT clause cannot refer to any column, otherwise you would (correct) get the error, "Column is invalid in select because it is not contained in the GROUP BY" etc.
Therefore, a literal value must be used (because SQL doesn't allow a resultset with zero columns -- why?!) and the literal value 1 (INTEGER) is commonly used: if the HAVING clause evaluates TRUE then the resultset will be one row with one column showing the value 1, otherwise you get the empty set.
Example: to find whether a column has more than one distinct value:
SELECT 1
FROM tableA
HAVING MIN(colA) < MAX(colA);
If you don't know there exist any data in your table or not, you can use following query:
SELECT cons_value FROM table_name;
For an Example:
SELECT 1 FROM employee;
It will return a column which contains the total number of rows & all rows have the same constant value 1 (for this time it returns 1 for all rows);
If there is no row in your table it will return nothing.
So, we use this SQL query to know if there is any data in the table & the number of rows indicates how many rows exist in this table.
If you just want to check a true or false based on the WHERE clause, select 1 from table where condition is the cheapest way.
This means that You want a value "1" as output or Most of the time used as Inner Queries because for some reason you want to calculate the outer queries based on the result of inner queries.. not all the time you use 1 but you have some specific values...
This will statically gives you output as value 1.
I see it is always used in SQL injection,such as:
www.urlxxxxx.com/xxxx.asp?id=99 union select 1,2,3,4,5,6,7,8,9 from database;
These numbers can be used to guess where the database exists and guess the column name of the database you specified.And the values of the tables.
it simple means that you are retrieving the number first column from table ,,,,means
select Emply_num,Empl_no From Employees ;
here you are using select 1 from Employees;
that means you are retrieving the Emply_num column.
Thanks
The reason is another one, at least for MySQL. This is from the MySQL manual
InnoDB computes index cardinality values for a table the first time that table is accessed after startup, instead of storing such values in the table. This step can take significant time on systems that partition the data into many tables. Since this overhead only applies to the initial table open operation, to “warm up” a table for later use, access it immediately after startup by issuing a statement such as SELECT 1 FROM tbl_name LIMIT 1
This is just used for convenience with IF EXISTS(). Otherwise you can go with
select * from [table_name]
Image In the case of 'IF EXISTS', we just need know that any row with specified condition exists or not doesn't matter what is content of row.
select 1 from Users
above example code, returns no. of rows equals to no. of users with 1 in single column