I am currently experiencing a (to me) very strange behaviour for one of my mysql 5.6 queries.
I have a given system I am trying to optimize. One step is to only select the fields necessary for the next operation.
The given query looks as follows:
SELECT oxv_oxcategories_6_fr.*
FROM oxv_oxobject2category_6 AS oxobject2category
LEFT JOIN oxv_oxcategories_6_fr ON oxv_oxcategories_6_fr.oxid =
oxobject2category.oxcatnid
WHERE oxobject2category.oxobjectid = '<hashed id>'
AND oxv_oxcategories_6_fr.oxid IS NOT NULL
AND (oxv_oxcategories_6_fr.oxactive = 1
AND oxv_oxcategories_6_fr.oxhidden = '0')
ORDER BY oxobject2category.oxtime
I have taken the libery to use more sensible naming in my own query:
SELECT
category_view.*
FROM oxv_oxobject2category_6 category_mapping_view
LEFT JOIN oxv_oxcategories_6_fr category_view ON category_view.OXID =
category_mapping_view.OXCATNID
WHERE category_mapping_view.OXOBJECTID = '<hashed id>'
AND category_view.OXID IS NOT NULL
AND (category_view.OXACTIVE = 1
AND category_view.OXHIDDEN = '0')
ORDER BY category_mapping_view.OXTIME
As you can see, there is not much difference, only the naming is different. So far, everything works as expected. Now I am trying to only select the values I need. So the query looks like this:
SELECT
category_view.OXID,
category_view.OXTITLE
FROM oxv_oxobject2category_6 category_mapping_view
LEFT JOIN oxv_oxcategories_6_fr category_view ON category_view.OXID =
category_mapping_view.OXCATNID
WHERE category_mapping_view.OXOBJECTID = '<hashed id>'
AND category_view.OXID IS NOT NULL
AND (category_view.OXACTIVE = 1
AND category_view.OXHIDDEN = '0')
ORDER BY category_mapping_view.OXTIME;
This also works as expected. But, I also need the field OXPARENTID, so I change the SELECT statement to
category_view.OXID,
category_view.OXTITLE,
category_view.OXPARENTID
Now the order of the items is different and I cannot seem to find out why that is. The new as well as the original query both sort for OXTIME without that field being present in the final result set. There are about 10 entries where OXTIME is 0, and it is those items that get turned around (ordering-wise) as soon as I query for OXPARENTID.
In the original query, OXPARENTID is present as well, so why does it make a difference now? I am guessing that there is some sort of ordering logic going on I do not yet know about.
Mind, that both joined tables are actually views, maybe that has something to do with it. Also, OXID and OXPARENTID are both md5 hashed values.
Any help would be greatly appreciated.
EDIT
In order to clarify, I know that the fact that multiple entries have OXTIME equal 0 makes it impossible to predict beforehand, which entry will be the top one. However, I still expected the order of the entries to be the same every time I call the query (regardless of what I am selecting).
One answer (#GordonLinoff) explains, that
[...] the same query can return the results in different order on different runs
Where does this "randomness" come from?
Your ordering is:
ORDER BY category_mapping_view.OXTIME;
And then you state:
There are about 10 entries where OXTIME is 0, and it is those items that get turned around (ordering-wise) as soon as I query for OXPARENTID.
What you have are ties in the keys. The results can be in any order -- and the same query can return the results in different order on different runs. Technically, the ordering in SQL is unstable.
You can fix this by including another column in the ORDER BY so each row is uniquely defined by the ORDER BY keys. Perhaps that is OXID:
ORDER BY category_mapping_view.OXTIME, category_view.OXID;
By the way, it is "obvious" that sorting in SQL is unstable. Why? SQL tables represent unordered sets. There is no ordering to fall back on when the keys are the same.
Related
this is the indexes on tblNewsToCity:
I have this query:
SELECT *
FROM `tblnews`
INNER JOIN `tblnewstocity`
ON `tblnews`.`id` = `tblnewstocity`.`fkid`
WHERE `tblnewstocity`.`city` = '233'
AND `tblnews`.`id` != '1771'
ORDER BY `tblnews`.`id` DESC
LIMIT 3 offset 0
which is running slow
but if i am changing tblnewstocitycity = 233 to: tblnewstocitycity like 233
I am getting this good results:
would love to understand why? what Am i missing? why the first query not running good as the second when the second uses the like operator on integers when it should be even slower
MySQL has several options to execute that query:
look for ALL rows with tblnewstocity.city = '233' by using the index, do the join and order by the id. It has to check all rows given by the index, because the first 3 don't have to have the largest tblnews.id.
go through tblnews in the order by-order (from the end), do the join, and look for the FIRST 3 rows that happens to have the right city-value. It can stop after having found 3 rows, as there cannot be larger tblnews.id.
It depends on your data which way is faster. If you have e.g. only 2 rows that fit your index-conditions (and the join), the first one will be faster, because it just have to check a handful of rows, while the second query would have to check the whole table to realize there are only 2. If e.g. all rows would have city = 233, the first query would have to find all (by index, but it will still be all), order them and take the first 3, while the second query will only have to test the first 3 rows, because the are already ordered.
A realistic distribution will lie somewhere in between these possibilities. MySQL has to guess. It guessed that the index (for =) will give you only a little number of rows, so it took option 1. like will make MySQL trust the index less, so it will prefer the second option, which, luckily, was the faster one. But it could have gone the other way too, try e.g. like and = with a value that has no city (well, depending on your mysql server version, the optimizer might check it and might not fall for that, so maybe test it with a value that would only give a handfull of rows without the limit.
Long story short: there are two solutions for that:
either replace INNER JOIN tblnewstocity with straight_join tblnewstocity, this will force mysql to take the second way (but of course risking slow execution for the counter examples)
or add a proper index: tblnewstocity (city, fkid) should take care of the problem. You might have to change ORDER BY tblnews.id DESC to ORDER BY tblnewstocity.fkid DESC, which is equivalent because tblnewstocity.fkid = tblnews.id according to the join-condition, and mysql should realize this on its own, but you never know...
Not sure regarding the LIKE issue, but you should include the filter in the JOIN
SELECT *
FROM `tblnews`
INNER JOIN `tblnewstocity`
ON `tblnews`.`id` = `tblnewstocity`.`fkid`
AND `tblnews`.`id` != '1771'
AND `tblnewstocity`.`city` = '233'
ORDER BY `tblnews`.`id` DESC
LIMIT 3 offset 0
I'm trying to figure out the best way to get data from a MySQL database and process it. I have 2 tables 'objects', and 'objects_metadata'. rows in the objects_metadata table belong to rows in the objects table and the link is defined by a 'parent_id' column in objects_metadata that corresponds to an 'id' column in objects. (SQLFiddle below).
The Scenario
When I search against these tables I'm always looking for rows from the objects table. I sometimes have to query the objects_metadata table to get the right results. I do this by defining boundaries such as "hasMetadataWithValue". This boundary would run the following query by itself:
SELECT * FROM objects
INNER JOIN objects_metadata ON objects.id=objects_metadata.parent_id
WHERE objects_metadata.type_id = ? AND objects_metadata.value = ?
Another example boundary "notSelf" would use a query such as:
SELECT * FROM objects WHERE objects.id != ?
My scenario caters for multiple boundaries at a time. For a row from the objects table to be selected it MUST pass all boundaries. (i.e. if each boundary query was run independently the row would appear in every set of results)
I'm wondering if anyone has any thoughts on the best way to do this?
Use each boundary's query as a subquery in a single query on the database (my original goal)
Run each boundary's query as a full query and then use PHP to process the results
I would prefer to make the database do most of the work and spit out the results simply to avoid running a bunch of queries instead of a single one. Here's the tricky part, I've tried to create a full query using subqueries, but I'm not getting the hang of it at all. My latest attempt is below:
SELECT * FROM objects
WHERE type_id = 7
AND confirmed = 1
AND (SELECT * FROM objects WHERE objects.id != 1)
AND (SELECT * FROM objects LEFT JOIN objects_metadata ON objects.id=objects_metadata.parent_id WHERE objects_metadata.type_id = 8 AND objects_metadata.value ='male')
LIMIT 0,20
I can see that the way I'm trying to use these subqueries is obviously wrong, but I can't figure out what the right way is.
SQL Fiddle is here
Any insights into the best way of doing this would be much appreciated.
I think you can just put those 'boundaries' inside your joined query.
SELECT
*
FROM objects LEFT JOIN objects_metadata
ON objects.id = objects_metadata.parent_id
WHERE
objects_metadata.type_id = 8
AND objects.confirmed=1
AND ( objects.id!=1 )
AND ( objects_metadata.type_id=8 AND objects_metadata.value='male' )
LIMIT 0,20
SQL Fiddle: http://sqlfiddle.com/#!2/0ee42/34
Just mind the same column names for both tables, so you have to specify the exact table as well (e.g., objects_metadata.type_id = 8). If I completely misunderstand your question let me know! :)
this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.
Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one: Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.
Therefore I would prefer using the having statement like here: Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)
SELECT COUNT(MATNR) AS MORE
FROM(
SELECT DISTINCT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG
FROM
FKT_LAB
) AS TEMP1
SELECT COUNT(MATNR) AS LESS
FROM(
SELECT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG,
LAB_PDATUM
FROM
FKT_LAB
GROUP BY
LAB_MTKNR,
LAB_STG,
LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
)AS TEMP2
Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?
Your second query is not doing what you expect it to be doing. This is the query:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
) TEMP2;
The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.
Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.
If you want the most recent value, here is a relatively easy way:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
max(LAB_PDATUM) as maxLAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
) TEMP2;
It does not, however, affect the outer count.
I have read a few post on this, but not seeming to be able to fix my problem.
I am calling two database queries to populate two array's that run along side by side of each other, but they aren't matching, as the order that they come out is different. I believe i have something to do with the Group By, and this may require a sub query, but again a little lost...
Query 1:
SELECT count(bids_bid.total_bid), bidtime_bid, users_usr.company_usr, users_usr.id_usr
FROM bids_bid
INNER JOIN users_usr
ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY user_bid
ORDER BY bidtime_bid ASC
Query 2:
SELECT auction_bid, user_bid, bidtime_bid, bids_bid.total_bid
FROM bids_bid
WHERE auction_bid = 36
ORDER BY bidtime_bid ASC
Even though the 'Order by' is the same the results aren't matching. The users are coming out in a different sequence.
I hope this makes sense, and thanks in advance.
* Update *
I just wanted to add a bit of clarity on what the output I want is. I need to only show 1 result by one user (user_bid) the second query show all users rows. I only need the first one to show the first row entered for each user. So if I could order before the the group and by min date, that would be ace...
It's to be expected. You're fetching fields that are NOT involved in the grouping, and are not part of an aggregate function. MySQL allows such things, but generally the results of the ungrouped/unaggregated functions can be wonky.
Because MySQL is free to chose WHICH of the potentially multiple 'free' rows to choose for the actual result row, you will get different results. Generally it picks the first-encountered 'free choice' result, but that's not defined/guaranteed.
You use grouping when you want unique results in result set according to some
group id (column name). usually grouping is used with aggregate functions such as
(min, max,count,sum..).
Ordering or inner query is nothing to do with result set, i suggest read some introductory
tutorials about grouping and think/treat Sql as a set based language and most of the set theory is applied on sql you'll be fine.
So I was complicating issues that I didn't need to. The solution I found was before.
SELECT users_usr.company_usr,
users_usr.id_usr,
bids_bid.bidtime_bid, min(bidtime_bid) as minbid FROM bids_bid INNER JOIN users_usr ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY id_usr
ORDER BY minbid ASC
Thanks everyone for making me look (try) harder...
So I have a couple SQL commands that I basically want to make a proc, but while doing this, I'd like to optimize them a little bit more.
The first part of it is this:
select tr_reference_nbr
from cfo_daily_trans_hist
inner join cfo_fas157_valuation on fv_dh_daily_trans_hist_id = dh_daily_trans_hist_id
inner join cfo_tran_quote on tq_tran_quote_id = dh_tq_tran_quote_id
inner join cfo_transaction on tq_tr_transaction_id = tr_transaction_id
inner join cfo_fas157_project_valuation ON fpv_fas157_project_valuation_id = fv_fpv_fas157_project_valuation_id AND fpv_status_bit = 1
group by tr_reference_nbr, fv_dh_daily_trans_hist_id
having count(*)>1
This query returns to me which tr_reference_nbr's exist that have duplicate data in our system, which needs to be removed. After this is run, I run this other query, copying and pasting in the tr_reference_nbr one at a time that the above query gave me:
select
tr_reference_nbr , dh_daily_trans_hist_id ,cfo_fas157_project_valuation.*,
cfo_daily_trans_hist.* ,
cfo_fas157_valuation.*
from cfo_daily_trans_hist
inner join cfo_fas157_valuation on fv_dh_daily_trans_hist_id = dh_daily_trans_hist_id
inner join cfo_tran_quote on tq_tran_quote_id = dh_tq_tran_quote_id
inner join cfo_transaction on tq_tr_transaction_id = tr_transaction_id
iNNER JOIN cfo_fas157_project_valuation ON fpv_fas157_project_valuation_id = fv_fpv_fas157_project_valuation_id
where
tr_reference_nbr in
(
[PASTEDREFERENCENUMBER]
)
and fpv_status_bit = 1
order by dh_val_time_stamp desc
Now this query gives me a bunch of records for that specific tr_reference_nbr. I then have to look through this data and find the rows that have a matching (duplicate) dh_daily_trans_hist_id. Once this is found, I look and make sure that the following columns also match for that row so I know they are true duplicates: fpv_unadjusted_sponsor_charge, fpv_adjusted_sponsor_charge, fpv_unadjusted_counterparty_charge, and fpv_adjusted_counterparty_charge.
If THOSE all match, I then look to yet another column, fv_create_dt, and make sure that there is less then a minute difference between the two timestamps there. If there is, I run yet another query on the row that was stored EARLIER, which looks like this:
begin tran
update cfo_fas157_valuation set fpv_status_bit = 0 where fpv_fas157_project_valuation_id = [IDRECIEVEDFROMTHEOTHERTABLE]
commit
As you can see, this is still a very manual process even though we do have a few queries written, but I'm trying to find a solution to where we can just run one query, and it would basically do EVERYTHING except for the final query. So basically something that would provide to us a few fpv_fas157_project_valuation_id's that need to be updated.
From looking at these queries, do any of you guys see an easy way to combine all this? I've been working on it all day and can't seem to get something to run. I feel like I keep screwing up the joins and stuff.
Thanks!
You can combine these queries in multiple ways:
use temporary tables to store results of queries - suitable for stored procedure
use table variables to store results of queries - suitable for stored procedure
use Common Table Expressions (CTEs) to store results of queries - suitable for single query
Once You have them in separate tables/variables/CTEs You can easily join them.
Then You have to do one more thing, and that is to find difference in datetime in two consecutive rows. There is a trick to do this:
use ROW_NUMBER() to add a column with number of row partitioned by grouping fields (tr_reference_nbr, ... ) ordered by fv_create_dt
do a self join on A.ROW_NUMBER = B.ROW_NUMBER + 1
check the difference between A.fv_create_dt and B.fv_create_dt to filter the rows with difference less than a minute
Just do a good test of your self-join to make sure You filter only rows You need to filter.
If You still have problems with this, don't hesitate to leave a comment.
Interesting note: SQL Server Denali has T-SQL enhancements LEAD and LAG to access subsequent and previous row without self-joins.