Lazily evaluate MySQL view - mysql

I have some MySQL views which define a number of extra columns based on some relatively straightforward subqueries. The database is also multi-tenanted so each row has a company ID against it.
The problem I have is my views are evaluated for every row before being filtered by the company ID, giving huge performance issues. Is there any way to lazily evaluate the view so the 'where' clause in the outer query applies to the subqueries in the view. Or is there something similar to views that I can use to add the extra fields. I want to calculate them in SQL so the calculated fields can be used for filtering/searching/sorting/pagination.
I've taken a look at the MySQL docs that explain the algorithms available and am aware that the views can't be proccessed as a 'merge' since they contain subqueries.
view
create view companies_view as
select *,
(
select count(id) from company_user where company_user.company_id = companies.id
) as user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 1
) as active_user_count,
(
select count(company_user.user_id)
from company_user join users on company_user.user_id = users.id
where company_user.company_id = companies.id
and users.active = 0
as inactive_user_count
from companies;
query
select * from companies_view where company_id = 123;
I want the subqueries in the view to be evaluated AFTER applying the 'where company_id = 123' from the main query scope. I can't hard code the company ID in the view since I want the view to be usable for any company ID.

You cannot change the order of evaluation, that is set by the MySQL server.
However, in this particular case you could rewrite the whole sql statement to use joins and conditional counts instead of subqueries:
select c.*,
count(u.id) as user_count,
count(if(u.active=1, 1, null)) as active_user_count,
count(if(u.active=0, 1, null)) as inactive_user_count
from companies c
left join company_user cu on c.id=cu.company_id
left join users u on cu.user_id = u.id
group by c.company_id, ...
If you have MySQL v5.7, then you may not need to add any further fields to the group by clause since the other fields in the companies table would be functionally dependent on the company_id. In earlier versions you may have to list all fields in the companies table (depends on the sql mode settings).
Another way to optimalise such query would be using denormalisation. Your users and company_user table probably have a lot more records than your companies table. You could add a user_count, an active_user_count, and an inactive_user_count field to the companies table, add after insert / update / delete triggers to the company_user table and an after update to the users table and update these 2 fields there. This way you would not need to do the joins and the conditional counts in the view.

It is possible to convince the optimizer to handle a view with scalar subqueries using the MERGE algorithm... you just have to beat the optimizer at its own game.
This will seem quite unorthodox to some, but it is a pattern I use with success in cases where this is needed.
Create a stored function to encapsulate each subquery, then reference the stored function in the view. The optimizer remains blissfully unaware that the functions will invoke the subqueries.
CREATE FUNCTION user_count (_cid INT) RETURNS INT
DETERMINISTIC
READS SQL DATA
RETURN (SELECT count(id) FROM company_user WHERE company_user.company_id = _cid);
Note that a stored function with a single statement does not need BEGIN/END or a change of DELIMITER.
Then in the view, replace the subquery with:
user_count(id) AS user_count,
And repeat the process for each subquery.
The optimizer will then process the view as a MERGE view, select the one appropriate row from the companies table based on the outer WHERE, invoke the functions, and... problem solved.

Related

Optimizing these nested IN statements by using JOIN in MySQL

I'm writing a query to exclude a certain group of employees from a table. Let's say I have Table 1, transaction_details, that has information that I want to select. Table 2, employee_task_associations, maps each employee to a task that they were assigned to on a particular day. It has a field for the employee_id, and a field called map_id which is the id of the different tasks. This table is an associative table so that Tables 1 and 3 can have a many-to-many relationship. Table 3, employee_tasks, has a list of all tasks that an employee can have.
I have written this query, which is functional, but not optimized:
SELECT someInfo FROM transaction_details TD
WHERE TD.employee_id NOT IN
(SELECT employee_id from employee_task_associations ETA
WHERE map_id IN
(SELECT id FROM employee_tasks ET
WHERE ET.taskName = "The task I want to exclude"))
While this works, it will run multiple queries. I want to speed things up by replacing my nested NOT IN and IN statements with JOINS.
I know that I can replace the bottom four lines with the following:
SELECT employee_id FROM employee_task_assocations ETA
LEFT OUTER JOIN employee_tasks ET
ON ETA.map_id = ET.id
WHERE ET.taskName = "The task I want to exclude"
This will return a list of all ids of the employees that have had this task. I want to exclude these from my SELECT statement from transaction_details by using a JOIN instead of a subquery. I have tried using a LEFT OUTER JOIN WHERE ETA.id IS NULL, but this does not work. How can I use a JOIN to exclude certain employees in this case?
You appear to be close on your initial query, but why not have the NOT IN query joined and get distinct employees... Something like
SELECT
TD.someInfo
FROM
transaction_details TD
WHERE
TD.employee_id NOT IN
(SELECT DISTINCT
employee_id
from
employee_task_associations ETA
JOIN employee_tasks ET
ON ETA.map_id = ET.ID
AND ET.taskName = "The task I want to exclude")
You seem to think that an outer join is more performant than a subquery, but that's not true. It all depends on the SQL planner, SQL optimizer, existing indexes, table stats, and definitively on the repertoire of data operators your database engine offers.
Also, you should consider that after parsing, the query enters the transformation phase where the database engine is free to rewrite your query in a more efficient way. This means that behind the scenes your query may actually be executed using outer joins, even if you write subqueries. After that the [rewritten] query enters the query planner, and later the SQL optimizer.
The only way of optimizing it is to get the execution plan of all query options and compare them.

MySQL subquery returns more than one row selecting count with subquery

Im executing this query
SELECT COUNT(*) AS CONNECTIONS, SERVERS.NAME AS SERVER_NAME, USERS.USERNAME AS USER, CONECT_DATE AS CONNECTION_DATE
FROM CONECTIONS JOIN SERVERS ON(CONECTIONS.SERVERID = SERVERS.ID)
JOIN USERS ON (CONECTIONS.USERID = USERS.ID)
WHERE CONECTIONS.USERID = (SELECT ID FROM USERS
WHERE UPPER(USERNAME) = UPPER((SELECT USERNAME FROM USERS)))
AND CONECT_DATE BETWEEN '2010-06-11' AND '2019-06-11'
GROUP BY SERVERID, CONECT_DATE;
Im trying to get this query from each user in the db TABLE USERS with making a subquery to select everyone 'UPPER((SELECT USERNAME FROM USERS)))' and the thing is that is returning more than one row of the result but if I put directly the USERNAME like 'ADMIN' for example it gives me the results.
You're doing username=select.... A subquery used in an equality test can return only ONE field/row. Since you're going an unfiltered "gimme everything in the users table", you're returning MANY rows of one field each. For that you need an IN match:
SELECT ...
WHERE ... UPPER(username) IN (SELECT USERNAME FROM users)
^^^^---this
SQL doesn't permit what is essentially
WHERE foo=1,2,3,4,5,6
which is why the IN operator works, for comparisons of sets of values.
This is your where clause:
WHERE CONECTIONS.USERID = (SELECT ID
FROM USERS
WHERE UPPER(USERNAME) = UPPER((SELECT USERNAME FROM USERS))
)
This has at least three problems.
The subquery argument to UPPER() might return more than one value, and likely will, unless the USERS table has 0 or 1 rows.
The = to the UPPER() has a similar problem. An in is more appropriate.
The use of = in the outer where clause has a similar problem. An in is more appropriate.
The following fixes these problems:
WHERE CONECTIONS.USERID IN (SELECT ID
FROM USERS
WHERE UPPER(USERNAME) IN (SELECT UPPER(USERNAME) FROM USERS))
)
However, I doubt that will fix your overall query. Your knowledge of SQL seems a bit limited. I would suggest that you ask another question, provide sample data, desired results, and explain what you want to do. That will make it easier for knowledgeable people on this site to point you in the right directions for writing your queries.

Need to find all client id's the most efficient/fastest way using mySql

I have a bridging table that looks like this
clients_user_groups
id = int
client_id = int
group_id = int
I need to find all client_id's of of clients that belong to the same group as client_id 46l
I can achieve it doing a query as below which produces the correct results
SELECT client_id FROM clients_user_groups WHERE group_id = (SELECT group_id FROM clients_user_groups WHERE client_id = 46);
Basically what I need to find out is if there's a way achieving the same results without using 2 queries or a faster way, or is the method above the best solution
You're using a WHERE-clause subquery which, in MySQL, ends up being reevaluated for every single row in your table. Use a JOIN instead:
SELECT a.client_id
FROM clients_user_groups a
JOIN clients_user_groups b ON b.client_id = 46
AND a.group_id = b.group_id
Since you plan on facilitating clients having more than one group in the future, you might want to add DISTINCT to the SELECT so that multiple of the same client_ids aren't returned when you do switch (as a result of the client being in more than one of client_id 46's groups).
If you haven't done so already, create the following composite index on:
(client_id, group_id)
With client_id at the first position in the index since it most likely offers the best initial selectivity. Also, if you've got a substantial amount of rows in your table, ensure that the index is being utilized with EXPLAIN.
you can try with a self join also
SELECT a.client_id
FROM clients_user_groups a
LEFT JOIN clients_user_groups b on b.client_id=46
Where b.group_id=a.group_id
set #groupID = (SELECT group_id FROM clients_user_groups WHERE client_id = 46);
SELECT client_id FROM clients_user_groups WHERE group_id = #groupID;
You will have a query which gets the group ID and you store it into a variable. After this you select the client_id values where the group_id matches the value stored in your variable. You can speed up this query even more if you define an index for clients_user_groups.group_id.
Note1: I didn't test my code, hopefully there are no typos, but you've got the idea I think.
Note2: This should be done in a single request, because DB requests are very expensive if we look at the needed time.
Based on your comment that each client can only belong to one group, I would suggest a schema change to place the group_id relation into the client table as a field. Typically, one would use the sort of JOIN table you have described to express many-to-many relationships within a relational database (i.e. clients could belong to many groups and groups could have many clients).
In such a scenario, the query would be made without the need for a sub-select like this:
SELECT c.client_id
FROM clients as c
INNER JOIN clients as c2 ON c.group_id = c2.group_id
WHERE c2.client_id = ?

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.

In what order are MySQL JOINs evaluated?

I have the following query:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123;
I have the following questions:
Is the USING syntax synonymous with ON syntax?
Are these joins evaluated left to right? In other words, does this query say: x = companies JOIN users; y = x JOIN jobs; z = y JOIN useraccounts;
If the answer to question 2 is yes, is it safe to assume that the companies table has companyid, userid and jobid columns?
I don't understand how the WHERE clause can be used to pick rows on the companies table when it is referring to the alias "j"
Any help would be appreciated!
USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.
SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.
That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.
More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.
No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.
The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.
I can't answer the bit about the USING syntax. That's weird. I've never seen it before, having always used an ON clause instead.
But what I can tell you is that the order of JOIN operations is determined dynamically by the query optimizer when it constructs its query plan, based on a system of optimization heuristics, some of which are:
Is the JOIN performed on a primary key field? If so, this gets high priority in the query plan.
Is the JOIN performed on a foreign key field? This also gets high priority.
Does an index exist on the joined field? If so, bump the priority.
Is a JOIN operation performed on a field in WHERE clause? Can the WHERE clause expression be evaluated by examining the index (rather than by performing a table scan)? This is a major optimization opportunity, so it gets a major priority bump.
What is the cardinality of the joined column? Columns with high cardinality give the optimizer more opportunities to discriminate against false matches (those that don't satisfy the WHERE clause or the ON clause), so high-cardinality joins are usually processed before low-cardinality joins.
How many actual rows are in the joined table? Joining against a table with only 100 values is going to create less of a data explosion than joining against a table with ten million rows.
Anyhow... the point is... there are a LOT of variables that go into the query execution plan. If you want to see how MySQL optimizes its queries, use the EXPLAIN syntax.
And here's a good article to read:
http://www.informit.com/articles/article.aspx?p=377652
ON EDIT:
To answer your 4th question: You aren't querying the "companies" table. You're querying the joined cross-product of ALL four tables in your FROM and USING clauses.
The "j.jobid" alias is just the fully-qualified name of one of the columns in that joined collection of tables.
In MySQL, it's often interesting to ask the query optimizer what it plans to do, with:
EXPLAIN SELECT [...]
See "7.2.1 Optimizing Queries with EXPLAIN"
Here is a more detailed answer on JOIN precedence. In your case, the JOINs are all commutative. Let's try one where they aren't.
Build schema:
CREATE TABLE users (
name text
);
CREATE TABLE orders (
order_id text,
user_name text
);
CREATE TABLE shipments (
order_id text,
fulfiller text
);
Add data:
INSERT INTO users VALUES ('Bob'), ('Mary');
INSERT INTO orders VALUES ('order1', 'Bob');
INSERT INTO shipments VALUES ('order1', 'Fulfilling Mary');
Run query:
SELECT *
FROM users
LEFT OUTER JOIN orders
ON orders.user_name = users.name
JOIN shipments
ON shipments.order_id = orders.order_id
Result:
Only the Bob row is returned
Analysis:
In this query the LEFT OUTER JOIN was evaluated first and the JOIN was evaluated on the composite result of the LEFT OUTER JOIN.
Second query:
SELECT *
FROM users
LEFT OUTER JOIN (
orders
JOIN shipments
ON shipments.order_id = orders.order_id)
ON orders.user_name = users.name
Result:
One row for Bob (with the fulfillment data) and one row for Mary with NULLs for fulfillment data.
Analysis:
The parenthesis changed the evaluation order.
Further MySQL documentation is at https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html
SEE http://dev.mysql.com/doc/refman/5.0/en/join.html
AND start reading here:
Join Processing Changes in MySQL 5.0.12
Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.
These changes have five main aspects:
The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).
Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.
Resolution of column names in NATURAL or USING joins.
Transformation of NATURAL or USING joins into JOIN ... ON.
Resolution of column names in the ON condition of a JOIN ... ON.
Im not sure about the ON vs USING part (though this website says they are the same)
As for the ordering question, its entirely implementation (and probably query) specific. MYSQL most likely picks an order when compiling the request. If you do want to enforce a particular order you would have to 'nest' your queries:
SELECT c.*
FROM companies AS c
JOIN (SELECT * FROM users AS u
JOIN (SELECT * FROM jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123)
)
as for part 4: the where clause limits what rows from the jobs table are eligible to be JOINed on. So if there are rows which would join due to the matching userids but don't have the correct jobid then they will be omitted.
1) Using is not exactly the same as on, but it is short hand where both tables have a column with the same name you are joining on... see: http://www.java2s.com/Tutorial/MySQL/0100__Table-Join/ThekeywordUSINGcanbeusedasareplacementfortheONkeywordduringthetableJoins.htm
It is more difficult to read in my opinion, so I'd go spelling out the joins.
3) It is not clear from this query, but I would guess it does not.
2) Assuming you are joining through the other tables (not all directly on companyies) the order in this query does matter... see comparisons below:
Origional:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123
What I think it is likely suggesting:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = u.userid
JOIN useraccounts AS us on us.userid = u.userid
WHERE j.jobid = 123
You could switch you lines joining jobs & usersaccounts here.
What it would look like if everything joined on company:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = c.userid
JOIN useraccounts AS us on us.userid = c.userid
WHERE j.jobid = 123
This doesn't really make logical sense... unless each user has their own company.
4.) The magic of sql is that you can only show certain columns but all of them are their for sorting and filtering...
if you returned
SELECT c.*, j.jobid....
you could clearly see what it was filtering on, but the database server doesn't care if you output a row or not for filtering.