SQL: do we need ANY/SOME and ALL keywords? - mysql

I'm using SQL (SQL Server, PostgreSQL) over 10 years and still I'm never used ANY/SOME and ALL keywords in my production code. All situation I've encountered I could get away with IN, MAX, MIN, EXISTS, and I think it's more readable.
For example:
-- = ANY
select * from Users as U where U.ID = ANY(select P.User_ID from Payments as P);
-- IN
select * from Users as U where U.ID IN (select P.User_ID from Payments as P);
Or
-- < ANY
select * from Users as U where U.Salary < ANY(select P.Amount from Payments as P);
-- EXISTS
select * from Users as U where EXISTS (select * from Payments as P where P.Amount > U.Salary);
Using ANY/SOME and ALL:
PostgreSQL
SQL Server
MySQL
SQL FIDDLE with some examples
So the question is: am I missing something? is there some situation where ANY/SOME and ALL shine over other solutions?

I find ANY and ALL to be very useful when you're not just testing equality or inequality. Consider
'blah' LIKE ANY (ARRAY['%lah', '%fah', '%dah']);
as used my answer to this question.
ANY, ALL and their negations can greatly simplify code that'd otherwise require non-trivial subqueries or CTEs, and they're significantly under-used in my view.
Consider that ANY will work with any operator. It's very handy with LIKE and ~, but will work with tsquery, array membership tests, hstore key tests, and more.
'a => 1, e => 2'::hstore ? ANY (ARRAY['a', 'b', 'c', 'd'])
or:
'a => 1, b => 2'::hstore ? ALL (ARRAY['a', 'b'])
Without ANY or ALL you'd probably have to express those as a subquery or CTE over a VALUES list with an aggregate to produce a single result. Sure, you can do that if you want, but I'll stick to ANY.
There's one real caveat here: On older Pg versions, if you're writing ANY( SELECT ... ), you're almost certainly going to be better off in performance terms with EXISTS (SELECT 1 FROM ... WHERE ...). If you're on a version where the optimizer will turn ANY (...) into a join then you don't need to worry. If in doubt, check EXPLAIN output.

No, I've never used the ANY, ALL, or SOME keywords either, and I've never seen them used in other people's code. I assume these are vestigal syntax, like the various optional keywords that appear in some places in SQL (for example, AS).
Keep in mind that SQL was defined by a committee.

I had tried anything but no missing anything, just different type of habit only if i use a Not condition. the exists and in will need to add not while any/some just change the operator to <>. i only use sql server and i not sure about the other software might missing something

Related

Join to tables and String Compare (large data set)

I am very new to SQL and don't really know much about what i'm doing. I'm trying to figure out how to get a list of leads and owners whose corresponding campaign record types are stated as "inter"
So far I have tried joining the two tables and running a string compare I found on a different stack overflow page. Separately they work fine but together everything breaks... I only get the error "You have an error in your SQL syntax; check the manual"
select a.LeadId, b.OwnerId from
(select * from CampaignMember as a
join
select * from Campaign as b
on b.id = a.CampaignId)
where b.RecordTypeId like "inter%"
Schema:
Campaign CampaignMember
------------- ----------------
Id CampaignId
OwnerId LeadId
RecordTypeId ContactId
The string compare is also very slow. I am looking at a table of 600M values. Is there a faster alternative?
Is there also a way to get more specific errors in MySQL?
If you format your code properly, it will be very easy to see why it's not working.
select a.LeadId, b.OwnerId
from (
select *
from CampaignMember as a
join select *
from Campaign as b on b.id = a.CampaignId
)
where b.RecordTypeId like "inter%"
It's not a valid JOIN format. Also the last part, SQL use single quote ' instead of double quote "
Probably what you want is something like this
SELECT a.LeadId, b.OwnwerId
FROM CampaignMember a
JOIN Campaign b ON b.id = a.CampaignId
WHERE b.RecordTypeId LIKE 'inter%'
Try this:
select CampaignMember.LeadId, Campaign.OwnerId from
Campaign
inner join
CampaignMember
on CampaignMember.CampaignId= Campaign.id
where Campaign.RecordTypeId like "inter%"
MySql is generally pretty poor and handling sub-selects, so you should avoid them when possible. Also, your sub-select isn't filtering any rows, so it has to evaluate every row before applying the LIKE filter. This is sometimes "intelligently" handled by the query engine, but you should try to minimize reliance on the engine to optimize the query.
Additionally, you really should only return the columns that you care about; SELECT * is ok for confirming things, but slows queries down.
Therefore, the query posted by Eric (above) is actually the best choice.

SQL query to select based on many-to-many relationship

This is really a two-part question, but in order not to mix things up, I'll divide into two actual questions. This one is about creating the correct SQL statement for selecting a row based on values in a many-to-many related table:
Now, the question is: what is the absolute simplest way of getting all resources where e.g metadata.category = subject AND where that category's corresponding metadata.value ='introduction'?
I'm sure this could be done in a lot of different ways, but I'm a novice in SQL, so please provide the simplest way possible... (If you could describe briefly what the statement means in plain English that would be great too. I have looked at introductions to SQL, but none of those I have found (for beginners) go into these many-to-many selections.)
The easiest way is to use the EXISTS clause. I'm more familiar with MSSQL but this should be close
SELECT *
FROM resources r
WHERE EXISTS (
SELECT *
FROM metadata_resources mr
INNER JOIN metadata m ON (mr.metadata_id = m.id)
WHERE mr.resource_id = r.id AND m.category = 'subject' AND m.value = 'introduction'
)
Translated into english it's 'return me all records where this subquery returns one or more rows, without returning the data for those rows'. This sub query is correlated to the outer query by the predicate mr.resource_id = r.id which uses the outer row as the predicate value.
I'm sure you can google around for more examples of the EXIST statement

Rewriting MySQL query with subquery for version 3.23.58

I need some help with a query on a database with MySQL version 3.23.58.
In principle I wanted to do this query (with some php included):
SELECT *
FROM abstracts
LEFT JOIN
(SELECT * FROM reviewdata WHERE reviewerid='$userid') as reviewdata
ON abstracts.id=reviewdata.abstractid
WHERE session='$session'
In words, I want a list with all abstracts combined with reviewdata for the abstracts if the reviewer already has reviewed it. Otherwise I still want the abstract but with empty reviewdata (so that he can change it).
The above query works fine in the newer versions but in this old version I'm not allowed to use subqueries, so MySQL complains when I use the nested SELECT in the LEFT JOIN.
So now I'm looking for a way to avoid this subquery...
I tried:
SELECT *
FROM abstracts
LEFT JOIN reviewdata
ON abstracts.id=reviewdata.abstractid
WHERE session='$session' AND (reviewerid='$userid' or reviewerid is null)
But this results in that abstracts reviewed by other reviewers, but not by this specific reviewer, are not shown for him at all.
My problem is that I don't know what was allowed back then...
Without an old installation I can't try this, but try putting the check for the reviewerid in the ON clause.
SELECT *
FROM abstracts
LEFT OUTER JOIN reviewdata
ON abstracts.id=reviewdata.abstractid
AND reviewerid='$userid'
WHERE session='$session'
From my perspective your refactored query should display abstract of all reviewers including specific one. The following condition
AND (reviewerid='$userid' or reviewerid is null)
doesn't make any sense and can be just removed. In order to restrict result set with data realted to specific reviewer you must exclude or reviewerid is null part
try
SELECT *
FROM abstracts
LEFT JOIN reviewdata
ON abstracts.id=reviewdata.abstractid
WHERE session='$session' AND reviewerid='$userid'
another advice - use aliases when specifying fields in query

How to retrieve "dynamic" attributes stored in multiple rows as normal records?

I have a system built on a relational MySQL database that allows people to store details of "leads". In addition, people can create their own columns under which to store data and then when adding new accounts can add data under them. The table structure looks like this:
LEADS -
id,
email,
user_id
ATTRIBUTES -
id,
attr_name,
user_id
ATTR_VALUES -
lead_id,
attr_id,
value,
user_id
Obviously in these tables "user_id" refers to a "Users" table that just contains people that can log into the system.
I am writing a function to output lead details and currently am just pulling through the basic lead details as a query, and then pulling through every attribute value associated with that lead (joining on the attributes table to get the name) and then joining the arrays in PHP. This is a little messy, and I was wondering if there was a way to do this in one SQL query. I have read a little about something called a "pivot table", but am struggling to understand how it works.
Any help would be greatly appreciated. Thanks!
You could do the pivoting in a single query like the following:
select l.id lead_id,
l.email,
group_concat(distinct case when a.attr_name = 'Home Phone' then v.value end) HomePhone,
...
from leads l
left join attr_values v on l.id = v.lead_id
left join attributes a on v.attr_id = a.id
group by l.id
You will need to include a separate group_concat-derived field for each attribute you want to display.
I would have a look at this link. That explain the fundamental of a pivot:
"pivot table" or a "crosstab report" SQL Characteristic Functions: Do
it without "if", "case", or "GROUP_CONCAT". Yes, there is use for
this..."if" statements sometimes cause problems when used in
combination. The simple secret, and it's also why they work in almost
all databases, is the following functions: sign (x) returns -1,0, +1
for values x < 0, x = 0, x > 0 respectively abs( sign( x) ) returns 0
if x = 0 else, 1 if x > 0 or x < 0 1-abs( sign( x) ) complement of the
above, since this returns 1 only if x = 0
It a also explain a more simple way of pivoting exams. Maybe this can shed some light over it?
What you probably want from mysql is to make an sql value (attr_name in your case) a column. This principle is called pivot table (sometimes also cross tables or crosstab queries) and is not supported by mysql. Not because mysql is insufficient, but because the pivot operation is not a database operation - the result is not a normal database table and is not designed for further database operations. The only purpose of pivot operation a presentation - that's why it belongs to presentation layer, not database.
Thus, every solution of trying to get a pivot table from mysql will always be hacky. What I recommend is to get the data from database in normal format, by simply doing something like:
select *
from attr_values join attributes using on attr_id = attributes.id
join leads on leads.id = lead_id
and then transform the database output in the presentation language (PHP, JSP, Python or whatever you use).
I'll be careful to assume that pivot will achieve your simplification goal. Pivot will only work if you attr_name are consistent. Since you tied a userid to it, I assume it wouldn't. In addition, you will have multiple values for one attr_name. I'm afraid pivot table wouldn't produce the result you are looking for.
I would suggest that you keep your transactional and reporting tables separate. Have an ETL routine that will clean (ie. make the attr_name and attr_value) consistent through translation. This will make your reports more meaningful.
In summary, for immediate output to end-user, PHP is the best you can do. For reporting, transform the EAV to a row/column first before attempting to report on it.

MySQL - Fastest way to select relational data avoiding left join

I've currently got a query that selects metrics data from two tables whilst getting the projects to query from two other tables (one is owned projects, the other is projects to which the user has access).
SELECT v.`projectID`,
(SELECT COUNT(m.`session`)
FROM `metricData` m
WHERE m.`projectID` = v.`projectID`) AS `sessions`,
(SELECT COUNT(pb.`interact`)
FROM `interactionData` pb WHERE pb.`projectID` = v.`projectID` GROUP BY pb.`projectID`) AS `interactions`
FROM `medias` v
LEFT JOIN `projectsExt` pa ON v.`projectsExtID` = pa.`projectsExtID`
WHERE (pa.`user` = '1' OR v.`ownerUser` = '1')
GROUP BY v.`projectID`
It takes too long, 1-2seconds. This is obviously the multi left-join scenario. But, I've got a couple of ideas to improve speed and wondered what the thoughts were in principle. Do I:-
Try and select the list in the query and then get the data, rather than doing the joins. Not sure how this would work.
Do a select in a separate query to get the projectIDs and then run queries on each projectID afterwards. This may lead to hundreds of potentially thousands of requests, but may be better for the processing?
Other ideas?
There's two questions here:
how can I get my result in less than 2 seconds
how can I avoid a left join.
To answer #1 properly there has to be more information. Technical information, such as the explain plan for this particular query is a good start. Even better if we'd have the SHOW CREATE TABLE of all tables that you access, as well as the number of rows they contain.
But I'd also appreciate more functional information: what exactly is the question you're trying to answer? Right now, it seems you're looking at two different sets of medias:
either there is no matching row in projectsExt, in which case medias.ownerUser must equal '1' (is that '1' supposed to be a string btw?)
or there is exactly one mathching row in projectsExt for which projectsExt.user must equal '1' (is that '1' supposed to be a string btw?)
By lack of enough information to answer #1, I can answer #2 - "how to avoid a left join". Answer is: write a UNION of the two sets, one where there is a match and one where there isn't a match.
SELECT v.`projectID`
, (
SELECT COUNT(m.`session`)
FROM `metricData` m
WHERE m.`projectID` = v.`projectID`
) AS `sessions`
, (
SELECT COUNT(pb.`interact`)
FROM `interactionData` pb
WHERE pb.`projectID` = v.`projectID`
GROUP BY pb.`projectID`
) AS `interactions`
FROM (
SELECT v.projectID
FROM medias
WHERE ownerUser = '1'
GROUP BY projectID
UNION ALL
SELECT v.projectID
FROM medias v
INNER JOIN projectsExt pa
ON v.projectsExtID = pa.projectsExtID
WHERE v.ownerUser != '1'
AND pa.user = '1'
GROUP BY v.`projectID
) v
Have you tried, instead, to refactor everything into left joins? Seeing as how you're always grouping on the same field, it shouldn't be a problem. Try that and post an EXPLAIN to see what the bottlenecks are.
Subselects are less performant than joins, because the engine can optimize the joins to a much higher degree. In fact, subselects will usually, where applicable, be rewritten into joins by the engine where possible.
As a rule of a thumb, there is no gain in splitting queries, all you gain is overhead and confusing the optimizer. There are, as always, exceptions to this rule, but they come into play after you've done what you can traditionally and know you keen such an approach.