So I have a couple SQL commands that I basically want to make a proc, but while doing this, I'd like to optimize them a little bit more.
The first part of it is this:
select tr_reference_nbr
from cfo_daily_trans_hist
inner join cfo_fas157_valuation on fv_dh_daily_trans_hist_id = dh_daily_trans_hist_id
inner join cfo_tran_quote on tq_tran_quote_id = dh_tq_tran_quote_id
inner join cfo_transaction on tq_tr_transaction_id = tr_transaction_id
inner join cfo_fas157_project_valuation ON fpv_fas157_project_valuation_id = fv_fpv_fas157_project_valuation_id AND fpv_status_bit = 1
group by tr_reference_nbr, fv_dh_daily_trans_hist_id
having count(*)>1
This query returns to me which tr_reference_nbr's exist that have duplicate data in our system, which needs to be removed. After this is run, I run this other query, copying and pasting in the tr_reference_nbr one at a time that the above query gave me:
select
tr_reference_nbr , dh_daily_trans_hist_id ,cfo_fas157_project_valuation.*,
cfo_daily_trans_hist.* ,
cfo_fas157_valuation.*
from cfo_daily_trans_hist
inner join cfo_fas157_valuation on fv_dh_daily_trans_hist_id = dh_daily_trans_hist_id
inner join cfo_tran_quote on tq_tran_quote_id = dh_tq_tran_quote_id
inner join cfo_transaction on tq_tr_transaction_id = tr_transaction_id
iNNER JOIN cfo_fas157_project_valuation ON fpv_fas157_project_valuation_id = fv_fpv_fas157_project_valuation_id
where
tr_reference_nbr in
(
[PASTEDREFERENCENUMBER]
)
and fpv_status_bit = 1
order by dh_val_time_stamp desc
Now this query gives me a bunch of records for that specific tr_reference_nbr. I then have to look through this data and find the rows that have a matching (duplicate) dh_daily_trans_hist_id. Once this is found, I look and make sure that the following columns also match for that row so I know they are true duplicates: fpv_unadjusted_sponsor_charge, fpv_adjusted_sponsor_charge, fpv_unadjusted_counterparty_charge, and fpv_adjusted_counterparty_charge.
If THOSE all match, I then look to yet another column, fv_create_dt, and make sure that there is less then a minute difference between the two timestamps there. If there is, I run yet another query on the row that was stored EARLIER, which looks like this:
begin tran
update cfo_fas157_valuation set fpv_status_bit = 0 where fpv_fas157_project_valuation_id = [IDRECIEVEDFROMTHEOTHERTABLE]
commit
As you can see, this is still a very manual process even though we do have a few queries written, but I'm trying to find a solution to where we can just run one query, and it would basically do EVERYTHING except for the final query. So basically something that would provide to us a few fpv_fas157_project_valuation_id's that need to be updated.
From looking at these queries, do any of you guys see an easy way to combine all this? I've been working on it all day and can't seem to get something to run. I feel like I keep screwing up the joins and stuff.
Thanks!
You can combine these queries in multiple ways:
use temporary tables to store results of queries - suitable for stored procedure
use table variables to store results of queries - suitable for stored procedure
use Common Table Expressions (CTEs) to store results of queries - suitable for single query
Once You have them in separate tables/variables/CTEs You can easily join them.
Then You have to do one more thing, and that is to find difference in datetime in two consecutive rows. There is a trick to do this:
use ROW_NUMBER() to add a column with number of row partitioned by grouping fields (tr_reference_nbr, ... ) ordered by fv_create_dt
do a self join on A.ROW_NUMBER = B.ROW_NUMBER + 1
check the difference between A.fv_create_dt and B.fv_create_dt to filter the rows with difference less than a minute
Just do a good test of your self-join to make sure You filter only rows You need to filter.
If You still have problems with this, don't hesitate to leave a comment.
Interesting note: SQL Server Denali has T-SQL enhancements LEAD and LAG to access subsequent and previous row without self-joins.
Related
I have trawled many of the similar responses on this site and have improved my code at several stages along the way. Unfortunately, this 3-row query still won't run.
I have one table with 100k+ rows and about 30 columns of which I can filter down to 3-rows (in this example) and then perform INNER JOINs across 21 small lookup tables.
In my first attempt, I was lazy and used implicit joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `lookup_table` x 21
WHERE `master_table`.`indexed_col` = "value"
AND `lookup_table`.`id` = `lookup_col` x 21
The query looked to be timing out:
#2013 - Lost connection to MySQL server during query
Following this, I tried being explicit about the joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `master_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `master_table`.`lookup_col` x 21
WHERE `master_table`.`indexed_col` = "value"
Still got the same result. I then realised that the query was probably trying to perform the joins first, then filter down via the WHERE clause. So after a bit more research, I learned how I could apply a subquery to perform the filter first and then perform the joins on the newly created table. This is where I got to, and it still returns the same error. Is there any way I can improve this query further?
SELECT `temp_table`.*, `lookup_table`.`data_point` x 21
FROM (SELECT * FROM `master_table` WHERE `indexed_col` = "value") as `temp_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `temp_table`.`lookup_col` x 21
Is this the best way to write up this kind of query? I tested the subquery to ensure it only returns a small table and can confirm that it returns only three rows.
First, at its most simple aspect you are looking for
select
mt.*
from
Master_Table mt
where
mt.indexed_col = 'value'
That is probably instantaneous provided you have an index on your master table on the given indexed_col in the first position (in case you had a compound index of many fields)…
Now, if I am understanding you correctly on your different lookup columns (21 in total), you have just simplified them for redundancy in this post, but actually doing something in the effect of
select
mt.*,
lt1.lookupDescription1,
lt2.lookupDescription2,
...
lt21.lookupDescription21
from
Master_Table mt
JOIN Lookup_Table1 lt1
on mt.lookup_col1 = lt1.pk_col1
JOIN Lookup_Table2 lt2
on mt.lookup_col2 = lt2.pk_col2
...
JOIN Lookup_Table21 lt21
on mt.lookup_col21 = lt21.pk_col21
where
mt.indexed_col = 'value'
I had a project well over a decade ago dealing with a similar situation... the Master table had about 21+ million records and had to join to about 30+ lookup tables. The system crawled and queried died after running a query after more than 24 hrs.
This too was on a MySQL server and the fix was a single MySQL keyword...
Select STRAIGHT_JOIN mt.*, ...
By having your master table in the primary position, where clause and its criteria directly on the master table, you are good. You know the relationships of the tables. Do the query in the exact order I presented it to you. Don't try to think for me on this and try to optimize based on a subsidiary table that may have smaller record count and somehow think that will help the query faster... it won't.
Try the STRAIGHT_JOIN keyword. It took the query I was working on and finished it in about 1.5 hrs... it was returning all 21 million rows with all corresponding lookup key descriptions for final output, hence still needed a longer duration than just 3 records.
First, don't use a subquery. Write the query as:
SELECT mt.*, lt.`data_point`
FROM `master_table` mt INNER JOIN
`lookup_table` l
ON l.`id` = mt.`lookup_col`
WHERE mt.`indexed_col` = value;
The indexes that you want are master_table(value, lookup_col) and lookup_table(id, data_point).
If you are still having performance problems, then there are multiple possibilities. High among them is that the result set is simply too big to return in a reasonable amount of time. To see if that is the case, you can use select count(*) to count the number of returned rows.
I am currently experiencing a (to me) very strange behaviour for one of my mysql 5.6 queries.
I have a given system I am trying to optimize. One step is to only select the fields necessary for the next operation.
The given query looks as follows:
SELECT oxv_oxcategories_6_fr.*
FROM oxv_oxobject2category_6 AS oxobject2category
LEFT JOIN oxv_oxcategories_6_fr ON oxv_oxcategories_6_fr.oxid =
oxobject2category.oxcatnid
WHERE oxobject2category.oxobjectid = '<hashed id>'
AND oxv_oxcategories_6_fr.oxid IS NOT NULL
AND (oxv_oxcategories_6_fr.oxactive = 1
AND oxv_oxcategories_6_fr.oxhidden = '0')
ORDER BY oxobject2category.oxtime
I have taken the libery to use more sensible naming in my own query:
SELECT
category_view.*
FROM oxv_oxobject2category_6 category_mapping_view
LEFT JOIN oxv_oxcategories_6_fr category_view ON category_view.OXID =
category_mapping_view.OXCATNID
WHERE category_mapping_view.OXOBJECTID = '<hashed id>'
AND category_view.OXID IS NOT NULL
AND (category_view.OXACTIVE = 1
AND category_view.OXHIDDEN = '0')
ORDER BY category_mapping_view.OXTIME
As you can see, there is not much difference, only the naming is different. So far, everything works as expected. Now I am trying to only select the values I need. So the query looks like this:
SELECT
category_view.OXID,
category_view.OXTITLE
FROM oxv_oxobject2category_6 category_mapping_view
LEFT JOIN oxv_oxcategories_6_fr category_view ON category_view.OXID =
category_mapping_view.OXCATNID
WHERE category_mapping_view.OXOBJECTID = '<hashed id>'
AND category_view.OXID IS NOT NULL
AND (category_view.OXACTIVE = 1
AND category_view.OXHIDDEN = '0')
ORDER BY category_mapping_view.OXTIME;
This also works as expected. But, I also need the field OXPARENTID, so I change the SELECT statement to
category_view.OXID,
category_view.OXTITLE,
category_view.OXPARENTID
Now the order of the items is different and I cannot seem to find out why that is. The new as well as the original query both sort for OXTIME without that field being present in the final result set. There are about 10 entries where OXTIME is 0, and it is those items that get turned around (ordering-wise) as soon as I query for OXPARENTID.
In the original query, OXPARENTID is present as well, so why does it make a difference now? I am guessing that there is some sort of ordering logic going on I do not yet know about.
Mind, that both joined tables are actually views, maybe that has something to do with it. Also, OXID and OXPARENTID are both md5 hashed values.
Any help would be greatly appreciated.
EDIT
In order to clarify, I know that the fact that multiple entries have OXTIME equal 0 makes it impossible to predict beforehand, which entry will be the top one. However, I still expected the order of the entries to be the same every time I call the query (regardless of what I am selecting).
One answer (#GordonLinoff) explains, that
[...] the same query can return the results in different order on different runs
Where does this "randomness" come from?
Your ordering is:
ORDER BY category_mapping_view.OXTIME;
And then you state:
There are about 10 entries where OXTIME is 0, and it is those items that get turned around (ordering-wise) as soon as I query for OXPARENTID.
What you have are ties in the keys. The results can be in any order -- and the same query can return the results in different order on different runs. Technically, the ordering in SQL is unstable.
You can fix this by including another column in the ORDER BY so each row is uniquely defined by the ORDER BY keys. Perhaps that is OXID:
ORDER BY category_mapping_view.OXTIME, category_view.OXID;
By the way, it is "obvious" that sorting in SQL is unstable. Why? SQL tables represent unordered sets. There is no ordering to fall back on when the keys are the same.
I have a view where I combine some normalized tables. Based on a "master" table, I join connected tables (e.g. JOIN child ON master.child_fk = child.pk). This is pretty straight forward. Now, I'd like to extend this query to perform a join on ALL child rows in some special cases, for example if the master.child_fk equals to -1.
I managed to get a working query by creating a view where I duplicate all rows and set the pk to -1 in the duplicates, but this is incredibly slow (I have quite a lot of data). The same result could be produced by iterating over all the child.pks and performing a separate join for each, but I can't imagine that being faster.
What would be the best way to go about this using MySQL? Please ask questions if something is not clear.
edit: I can add that it seems the reason why my attempt was slow was because of poor index utliziation. See attached EXPLAIN output here https://i.imgur.com/8zfT0HM.png
Replace your join condition as JOIN child ON CASE WHEN master.child_fk != -1 THEN master.child_fk = child.pk ELSE 1 END)
I'm trying to figure out the best way to get data from a MySQL database and process it. I have 2 tables 'objects', and 'objects_metadata'. rows in the objects_metadata table belong to rows in the objects table and the link is defined by a 'parent_id' column in objects_metadata that corresponds to an 'id' column in objects. (SQLFiddle below).
The Scenario
When I search against these tables I'm always looking for rows from the objects table. I sometimes have to query the objects_metadata table to get the right results. I do this by defining boundaries such as "hasMetadataWithValue". This boundary would run the following query by itself:
SELECT * FROM objects
INNER JOIN objects_metadata ON objects.id=objects_metadata.parent_id
WHERE objects_metadata.type_id = ? AND objects_metadata.value = ?
Another example boundary "notSelf" would use a query such as:
SELECT * FROM objects WHERE objects.id != ?
My scenario caters for multiple boundaries at a time. For a row from the objects table to be selected it MUST pass all boundaries. (i.e. if each boundary query was run independently the row would appear in every set of results)
I'm wondering if anyone has any thoughts on the best way to do this?
Use each boundary's query as a subquery in a single query on the database (my original goal)
Run each boundary's query as a full query and then use PHP to process the results
I would prefer to make the database do most of the work and spit out the results simply to avoid running a bunch of queries instead of a single one. Here's the tricky part, I've tried to create a full query using subqueries, but I'm not getting the hang of it at all. My latest attempt is below:
SELECT * FROM objects
WHERE type_id = 7
AND confirmed = 1
AND (SELECT * FROM objects WHERE objects.id != 1)
AND (SELECT * FROM objects LEFT JOIN objects_metadata ON objects.id=objects_metadata.parent_id WHERE objects_metadata.type_id = 8 AND objects_metadata.value ='male')
LIMIT 0,20
I can see that the way I'm trying to use these subqueries is obviously wrong, but I can't figure out what the right way is.
SQL Fiddle is here
Any insights into the best way of doing this would be much appreciated.
I think you can just put those 'boundaries' inside your joined query.
SELECT
*
FROM objects LEFT JOIN objects_metadata
ON objects.id = objects_metadata.parent_id
WHERE
objects_metadata.type_id = 8
AND objects.confirmed=1
AND ( objects.id!=1 )
AND ( objects_metadata.type_id=8 AND objects_metadata.value='male' )
LIMIT 0,20
SQL Fiddle: http://sqlfiddle.com/#!2/0ee42/34
Just mind the same column names for both tables, so you have to specify the exact table as well (e.g., objects_metadata.type_id = 8). If I completely misunderstand your question let me know! :)
I have followed the tutorial over at tizag for the MAX() mysql function and have written the query below, which does exactly what I need. The only trouble is I need to JOIN it to two more tables so I can work with all the rows I need.
$query = "SELECT idproducts, MAX(date) FROM results GROUP BY idproducts ORDER BY MAX(date) DESC";
I have this query below, which has the JOIN I need and works:
$query = ("SELECT *
FROM operators
JOIN products
ON operators.idoperators = products.idoperator JOIN results
ON products.idProducts = results.idproducts
ORDER BY drawndate DESC
LIMIT 20");
Could someone show me how to merge the top query with the JOIN element from my second query? I am new to php and mysql, this being my first adventure into a computer language I have read and tried real hard to get those two queries to work, but I am at a brick wall. I cannot work out how to add the JOIN element to the first query :(
Could some kind person take pity on a newb and help me?
Try this query.
SELECT
*
FROM
operators
JOIN products
ON operators.idoperators = products.idoperator
JOIN
(
SELECT
idproducts,
MAX(date)
FROM results
GROUP BY idproducts
) AS t
ON products.idproducts = t.idproducts
ORDER BY drawndate DESC
LIMIT 20
JOINs function somewhat independently of aggregation functions, they just change the intermediate result-set upon which the aggregate functions operate. I like to point to the way the MySQL documentation is written, which hints uses the term 'table_reference' in the SELECT syntax, and expands on what that means in JOIN syntax. Basically, any simple query which has a table specified can simply expand that table to a complete JOIN clause and the query will operate the same basic way, just with a modified intermediate result-set.
I say "intermediate result-set" to hint at the mindset which helped me understand JOINS and aggregation. Understanding the order in which MySQL builds your final result is critical to knowing how to reliably get the results you want. Generally, it starts by looking at the first row of the first table you specify after 'FROM', and decides if it might match by looking at 'WHERE' clauses. If it is not immediately discardable, it attempts to JOIN that row to the first JOIN specified, and repeats the "will this be discarded by WHERE?". This repeats for all JOINs, which either add rows to your results set, or remove them, or leaves just the one, as appropriate for your JOINs, WHEREs and data. This process builds what I am referring to when I say "intermediate result-set". Somewhere between starting and finishing your complete query, MySQL has in it's memory a potentially massive table-like structure of data which it built using the process I just described. Only then does it begin to aggregate (GROUP) the results according to your criteria.
So for your query, it depends on what specifically you are going for (not entirely clear in OP). If you simply want the MAX(date) from the second query, you can simply add that expression to the SELECT clause and then add an aggregation spec to the end:
SELECT *, MAX(date)
FROM operators
...
GROUP BY idproducts
ORDER BY ...
Alternatively, you can add the JOIN section of the second query to the first.