Getting LIMIT value from table in SQL - mysql

I am trying to run an insert statement in SQL for a fixed time. So far I have tried this and it works as I wanted, but is there any way to combine these two parts ?
INSERT INTO assigns (AgencyName, ScoutID, RequestID)
SELECT AgencyName, ScoutID, RequestID
FROM employs NATURAL JOIN Scout NATURAL JOIN agency_response NATURAL JOIN Request
WHERE Answer = #option AND AgencyName = #agency_name
LIMIT 1;
This inserts into assigns table 1 time. But I have the desired LIMIT value in the table that I obtained from NATURAL JOIN's. In this case it is stored in NumberOfScouts. Below returns 8 for example and I want to limit to 8.
SELECT NumberOfScouts
FROM employs NATURAL JOIN Scout NATURAL JOIN agency_response NATURAL JOIN Request
WHERE Answer = #option AND AgencyName = #agency_name;
Is there any way to get the value of integer used in LIMIT from the table I used. I tried to put LIMIT to upper parts of query but it gave syntax error.

You can use window functions:
WITH asr as (
SELECT ?.AgencyName, ?.ScoutID, ?.RequestID, ?.NumberOfScouts,
ROW_NUMBER() OVER (ORDER BY ?.AgencyName) as seqnum
FROM employs e JOIN
Scout s
ON ?? JOIN -- JOIN conditions here
agency_response ar
ON ?? JOIN -- JOIN conditions here
Request r
ON ?? -- JOIN conditions here
WHERE ?.Answer = #option AND ?.AgencyName = #agency_name
)
SELECT ?.AgencyName, ?.ScoutID, ?.RequestID
FROM asr
WHERE seqnum <= NumberOfScouts;
Notes:
Use table aliases which are abbreviations of table names.
Qualify all column references so you -- and everyone else -- knows what table columns come from.
Use JOIN with ON/USING so you know what columns are used for the JOINs.
I describe NATURAL JOIN as an abomination because it does not use properly declared foreign key relationships for the JOIN condition. Plus, most of my tables have createdAt and createdBy columns which would confuse the so-called "natural" join.

Related

Incorrect ordering on query with group by clause

So I have the following query:
SELECT sensor.id as `sensor_id`,
sensor_reading.id as `reading_id`,
sensor_reading.reading as `reading`,
from_unixtime(sensor_reading.reading_timestamp) as `reading_timestamp`,
sensor_reading.lower_threshold as `lower_threshold`,
sensor_reading.upper_threshold as `upper_threshold`,
sensor_type.units as `unit`
FROM sensor
LEFT JOIN sensor_reading ON sensor_reading.sensor_id = sensor.id
LEFT JOIN sensor_type ON sensor.sensor_type_id = sensor_type.id
WHERE sensor.company_id = 1
GROUP BY sensor_reading.sensor_id
ORDER BY sensor_reading.reading_timestamp DESC
There are three tables in play here. A sensor_type table, which is just used for a single display field (units), a sensor table, which contains information on a sensor, and a sensor_reading table, which contains the individual readings for a sensor. There are multiple readings which apply to a single sensor, and so each entry in the sensor_reading table has a sensor_id which is linked to the ID field in the sensor table with a foreign key constraint.
In theory, this query should return the most recent sensor_reading for EACH unique sensor. Instead, it's returning the first reading for each sensor instead. I've seen a few posts on here with similar issues, but haven't been able to resolve this using any of their answers. Ideally, the query needs to be as efficient as possible, as this table has several thousand readings (and continues to grow).
Does anyone know how I might change this query to return the most recent reading? If I remove the GROUP BY clause, it returns the right order, but I then have to sift through the data to get the most recent for each sensor.
Ideally, I don't want to run sub-queries as this slows things down a lot, and speed is a big factor here.
Thanks!
In theory, this query should return the most recent sensor_reading for EACH unique sensor.
This is a fairly common misconception with the MySQL Group by extension, that allows you to select columns with no aggregation that are not contained in the group by clause. What the documentation states is:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause
So since you are grouping by sensor_reading.sensor_id, MySQL will chose any row from sensor_reading for each sensor_id, then after choosing one row for each sensor_id it will then apply the ordering to the rows that are chosen.
Since you only want the latest row for each sensor, the general approach would be:
SELECT *
FROM sensor_reading AS sr
WHERE NOT EXISTS
( SELECT 1
FROM sensor_reading AS sr2
WHERE sr2.sensor_id = sr.sensor_id
AND sr2.reading_timestamp > sr.reading_timestamp
);
However, MySQL will optimise LEFT JOIN/IS NULL better than NOT EXISTS so a MySQL specific solution would be:
SELECT sr.*
FROM sensor_reading AS sr
LEFT JOIN sensor_reading AS sr2
ON sr2.sensor_id = sr.sensor_id
AND sr2.reading_timestamp > sr.reading_timestamp
WHERE sr2.id IS NULL;
So incorporating this into your query, you would end up with:
SELECT sensor.id as `sensor_id`,
sensor_reading.id as `reading_id`,
sensor_reading.reading as `reading`,
from_unixtime(sensor_reading.reading_timestamp) as `reading_timestamp`,
sensor_reading.lower_threshold as `lower_threshold`,
sensor_reading.upper_threshold as `upper_threshold`,
sensor_type.units as `unit`
FROM sensor
LEFT JOIN sensor_reading
ON sensor_reading.sensor_id = sensor.id
LEFT JOIN sensor_type
ON sensor.sensor_type_id = sensor_type.id
LEFT JOIN sensor_reading AS sr2
ON sr2.sensor_id = sensor_reading.sensor_id
AND sr2.reading_timestamp > sensor_reading.reading_timestamp
WHERE sensor.company_id = 1
AND sr2.id IS NULL
ORDER BY sensor_reading.reading_timestamp DESC;
An alternative method for getting the maximum per group is to inner join back to the latest row, so something like:
SELECT sr.*
FROM sensor_reading AS sr
INNER JOIN
( SELECT sensor_id, MAX(reading_timestamp) AS reading_timestamp
FROM sensor_reading
GROUP BY sensor_id
) AS sr2
ON sr2.sensor_id = sr.sensor_id
AND sr2.reading_timestamp = sr.reading_timestamp;
You may find that this is more efficient than the other method, or you may not, YMMV. It basically depends on your data and indexes, and as you have said, subqueries can be an issue in MySQL due to the fact that the full result is matierialised initially.

Use * in subquery

I have two tables: finaldata and finalqualification, and finlpcqlfctin_qlifictins_id is the foreign key in finaldata i.e. is a primary key in finalqualification table.
I want to get all the data from these two tables through a subquery in MYSQL , but it is showing me an error. My query is:
SELECT result.name FROM (SELECT * FROM finalpocpassportdata f LEFT JOIN finalpocqualification q ON f.id=q.id)result
Quick Hint here - if your sub query returns more than one column with the same name it will fall over. We strongly GUID all our data with consistent field names, eg. customerGUID, so I come across this quite regular. Especially when you are using sub queries with joins.
Your subquery return more than value
SELECT result.name FROM (SELECT * FROM finaldata f LEFT JOIN finalqualification ON f.id=f.finlpcqlfctin_qlifictins_id)
you need to specify the name of column also you have a problem in your alias
SELECT result.name FROM (SELECT only_one_column FROM finaldata f LEFT JOIN finalqualification ON f.id=finalqualification .finlpcqlfctin_qlifictins_id)
This is your query:
SELECT result.name
FROM (SELECT *
FROM finalpocpassportdata f LEFT JOIN
finalpocqualification q
ON f.id = q.id
) result;
Obviously, this is a test query, because the subquery is unnecessary and adversely affects performance.
Equally obvious, the SELECT * has two columns with the name id. It might have other columns with the same name as well. But, if id are the only duplicates, then one simple solution is to use USING:
SELECT result.name
FROM (SELECT *
FROM finalpocpassportdata f LEFT JOIN
finalpocqualification q
USING (id)
) result;
(Although NATURAL JOIN is also an option, I strongly advise not to use it here or anywhere. It is just bugs waiting to happen.)
If there are other columns with the same name -- which can happen for very reasonable purposes -- then you can list the columns individually. That is sometimes a pain, it can be easier to use * for one table and list from another:
SELECT . . .
FROM (SELECT f.*, q.name
. . .
In practice, I cannot see a need for this sort of subquery. There is little need to do a join as a subquery, because you can have many joins in a single FROM clause.
I don't think so to get name you need subquery
SELECT name FROM finalpocpassportdata f LEFT JOIN
finalpocqualification q ON f.id=q.id
This query will also give the same result.

Query efficiency (multiple selects)

I have two tables - one called customer_records and another called customer_actions.
customer_records has the following schema:
CustomerID (auto increment, primary key)
CustomerName
...etc...
customer_actions has the following schema:
ActionID (auto increment, primary key)
CustomerID (relates to customer_records)
ActionType
ActionTime (UNIX time stamp that the entry was made)
Note (TEXT type)
Every time a user carries out an action on a customer record, an entry is made in customer_actions, and the user is given the opportunity to enter a note. ActionType can be one of a few values (like 'designatory update' or 'added case info' - can only be one of a list of options).
What I want to be able to do is display a list of records from customer_records where the last ActionType was a certain value.
So far, I've searched the net/SO and come up with this monster:
SELECT * FROM (
SELECT * FROM (
SELECT * FROM `customer_actions` ORDER BY `EntryID` DESC
) list1 GROUP BY `CustomerID`
) list2 WHERE `ActionType`='whatever' LIMIT 0,30
Which is great - it lists each customer ID and their last action. But the query is extremely slow on occasions (note: there are nearly 20,000 records in customer_records). Can anyone offer any tips on how I can sort this monster of a query out or adjust my table to give faster results? I'm using MySQL. Any help is really appreciated, thanks.
Edit: To be clear, I need to see a list of customers who's last action was 'whatever'.
To filter customers by their last action, you could use a correlated sub-query...
SELECT
*
FROM
customer_records
INNER JOIN
customer_actions
ON customer_actions.CustomerID = customer_records.CustomerID
AND customer_actions.ActionDate = (
SELECT
MAX(ActionDate)
FROM
customer_actions AS lookup
WHERE
CustomerID = customer_records.CustomerID
)
WHERE
customer_actions.ActionType = 'Whatever'
You may find it more efficient to avoid the correlated sub-query as follows...
SELECT
*
FROM
customer_records
INNER JOIN
(SELECT CustomerID, MAX(ActionDate) AS ActionDate FROM customer_actions GROUP BY CustomerID) AS last_action
ON customer_records.CustomerID = last_action.CustomerID
INNER JOIN
customer_actions
ON customer_actions.CustomerID = last_action.CustomerID
AND customer_actions.ActionDate = last_action.ActionDate
WHERE
customer_actions.ActionType = 'Whatever'
I'm not sure if I understand the requirements but it looks to me like a JOIN would be enough for that.
SELECT cr.CustomerID, cr.CustomerName, ...
FROM customer_records cr
INNER JOIN customer_actions ca ON ca.CustomerID = cr.CustomerID
WHERE `ActionType` = 'whatever'
ORDER BY
ca.EntryID
Note that 20.000 records should not pose a performance problem
Please note that I've adapted Lieven's answer (I made a separate post as this was too long for a comment). Any credit for the solution itself goes to him, I'm just trying to show you some key points for improving performance.
If speed is a concern then the following should give you some suggestions for improving it:
select top 100 -- Change as required
cr.CustomerID ,
cr.CustomerName,
cr.MoreDetail1,
cr.Etc
from customer_records cr
inner join customer_actions ca
on ca.CustomerID = cr.CustomerID
where ca.ActionType = 'x'
order by cr.CustomerID
A few notes:
In some cases I find left outer joins to be faster then inner joins - It would be worth measuring performance for both for this query
Avoid returning * wherever possible
You don't have to reference 'cr.x' in the initial select but it's a good habit to get into for when you start working on large queries that can have multiple joins in them (this will make a lot of sense once you start doing this
When using joins always join on a primary key
Maybe I'm missing something but what's wrong with a simple join and a where clause?
Select ActionType, ActionTime, Note
FROM Customer_Records CR
INNER JOIN customer_Actions CA
ON CR.CustomerID = CA.CustomerID
Where ActionType = 'added case info'

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.

In what order are MySQL JOINs evaluated?

I have the following query:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123;
I have the following questions:
Is the USING syntax synonymous with ON syntax?
Are these joins evaluated left to right? In other words, does this query say: x = companies JOIN users; y = x JOIN jobs; z = y JOIN useraccounts;
If the answer to question 2 is yes, is it safe to assume that the companies table has companyid, userid and jobid columns?
I don't understand how the WHERE clause can be used to pick rows on the companies table when it is referring to the alias "j"
Any help would be appreciated!
USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.
SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.
That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.
More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.
No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.
The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.
I can't answer the bit about the USING syntax. That's weird. I've never seen it before, having always used an ON clause instead.
But what I can tell you is that the order of JOIN operations is determined dynamically by the query optimizer when it constructs its query plan, based on a system of optimization heuristics, some of which are:
Is the JOIN performed on a primary key field? If so, this gets high priority in the query plan.
Is the JOIN performed on a foreign key field? This also gets high priority.
Does an index exist on the joined field? If so, bump the priority.
Is a JOIN operation performed on a field in WHERE clause? Can the WHERE clause expression be evaluated by examining the index (rather than by performing a table scan)? This is a major optimization opportunity, so it gets a major priority bump.
What is the cardinality of the joined column? Columns with high cardinality give the optimizer more opportunities to discriminate against false matches (those that don't satisfy the WHERE clause or the ON clause), so high-cardinality joins are usually processed before low-cardinality joins.
How many actual rows are in the joined table? Joining against a table with only 100 values is going to create less of a data explosion than joining against a table with ten million rows.
Anyhow... the point is... there are a LOT of variables that go into the query execution plan. If you want to see how MySQL optimizes its queries, use the EXPLAIN syntax.
And here's a good article to read:
http://www.informit.com/articles/article.aspx?p=377652
ON EDIT:
To answer your 4th question: You aren't querying the "companies" table. You're querying the joined cross-product of ALL four tables in your FROM and USING clauses.
The "j.jobid" alias is just the fully-qualified name of one of the columns in that joined collection of tables.
In MySQL, it's often interesting to ask the query optimizer what it plans to do, with:
EXPLAIN SELECT [...]
See "7.2.1 Optimizing Queries with EXPLAIN"
Here is a more detailed answer on JOIN precedence. In your case, the JOINs are all commutative. Let's try one where they aren't.
Build schema:
CREATE TABLE users (
name text
);
CREATE TABLE orders (
order_id text,
user_name text
);
CREATE TABLE shipments (
order_id text,
fulfiller text
);
Add data:
INSERT INTO users VALUES ('Bob'), ('Mary');
INSERT INTO orders VALUES ('order1', 'Bob');
INSERT INTO shipments VALUES ('order1', 'Fulfilling Mary');
Run query:
SELECT *
FROM users
LEFT OUTER JOIN orders
ON orders.user_name = users.name
JOIN shipments
ON shipments.order_id = orders.order_id
Result:
Only the Bob row is returned
Analysis:
In this query the LEFT OUTER JOIN was evaluated first and the JOIN was evaluated on the composite result of the LEFT OUTER JOIN.
Second query:
SELECT *
FROM users
LEFT OUTER JOIN (
orders
JOIN shipments
ON shipments.order_id = orders.order_id)
ON orders.user_name = users.name
Result:
One row for Bob (with the fulfillment data) and one row for Mary with NULLs for fulfillment data.
Analysis:
The parenthesis changed the evaluation order.
Further MySQL documentation is at https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html
SEE http://dev.mysql.com/doc/refman/5.0/en/join.html
AND start reading here:
Join Processing Changes in MySQL 5.0.12
Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.
These changes have five main aspects:
The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).
Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.
Resolution of column names in NATURAL or USING joins.
Transformation of NATURAL or USING joins into JOIN ... ON.
Resolution of column names in the ON condition of a JOIN ... ON.
Im not sure about the ON vs USING part (though this website says they are the same)
As for the ordering question, its entirely implementation (and probably query) specific. MYSQL most likely picks an order when compiling the request. If you do want to enforce a particular order you would have to 'nest' your queries:
SELECT c.*
FROM companies AS c
JOIN (SELECT * FROM users AS u
JOIN (SELECT * FROM jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123)
)
as for part 4: the where clause limits what rows from the jobs table are eligible to be JOINed on. So if there are rows which would join due to the matching userids but don't have the correct jobid then they will be omitted.
1) Using is not exactly the same as on, but it is short hand where both tables have a column with the same name you are joining on... see: http://www.java2s.com/Tutorial/MySQL/0100__Table-Join/ThekeywordUSINGcanbeusedasareplacementfortheONkeywordduringthetableJoins.htm
It is more difficult to read in my opinion, so I'd go spelling out the joins.
3) It is not clear from this query, but I would guess it does not.
2) Assuming you are joining through the other tables (not all directly on companyies) the order in this query does matter... see comparisons below:
Origional:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123
What I think it is likely suggesting:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = u.userid
JOIN useraccounts AS us on us.userid = u.userid
WHERE j.jobid = 123
You could switch you lines joining jobs & usersaccounts here.
What it would look like if everything joined on company:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = c.userid
JOIN useraccounts AS us on us.userid = c.userid
WHERE j.jobid = 123
This doesn't really make logical sense... unless each user has their own company.
4.) The magic of sql is that you can only show certain columns but all of them are their for sorting and filtering...
if you returned
SELECT c.*, j.jobid....
you could clearly see what it was filtering on, but the database server doesn't care if you output a row or not for filtering.