Mysql subqueries or multiple queries? - mysql

I'm trying to figure out the best way to get data from a MySQL database and process it. I have 2 tables 'objects', and 'objects_metadata'. rows in the objects_metadata table belong to rows in the objects table and the link is defined by a 'parent_id' column in objects_metadata that corresponds to an 'id' column in objects. (SQLFiddle below).
The Scenario
When I search against these tables I'm always looking for rows from the objects table. I sometimes have to query the objects_metadata table to get the right results. I do this by defining boundaries such as "hasMetadataWithValue". This boundary would run the following query by itself:
SELECT * FROM objects
INNER JOIN objects_metadata ON objects.id=objects_metadata.parent_id
WHERE objects_metadata.type_id = ? AND objects_metadata.value = ?
Another example boundary "notSelf" would use a query such as:
SELECT * FROM objects WHERE objects.id != ?
My scenario caters for multiple boundaries at a time. For a row from the objects table to be selected it MUST pass all boundaries. (i.e. if each boundary query was run independently the row would appear in every set of results)
I'm wondering if anyone has any thoughts on the best way to do this?
Use each boundary's query as a subquery in a single query on the database (my original goal)
Run each boundary's query as a full query and then use PHP to process the results
I would prefer to make the database do most of the work and spit out the results simply to avoid running a bunch of queries instead of a single one. Here's the tricky part, I've tried to create a full query using subqueries, but I'm not getting the hang of it at all. My latest attempt is below:
SELECT * FROM objects
WHERE type_id = 7
AND confirmed = 1
AND (SELECT * FROM objects WHERE objects.id != 1)
AND (SELECT * FROM objects LEFT JOIN objects_metadata ON objects.id=objects_metadata.parent_id WHERE objects_metadata.type_id = 8 AND objects_metadata.value ='male')
LIMIT 0,20
I can see that the way I'm trying to use these subqueries is obviously wrong, but I can't figure out what the right way is.
SQL Fiddle is here
Any insights into the best way of doing this would be much appreciated.

I think you can just put those 'boundaries' inside your joined query.
SELECT
*
FROM objects LEFT JOIN objects_metadata
ON objects.id = objects_metadata.parent_id
WHERE
objects_metadata.type_id = 8
AND objects.confirmed=1
AND ( objects.id!=1 )
AND ( objects_metadata.type_id=8 AND objects_metadata.value='male' )
LIMIT 0,20
SQL Fiddle: http://sqlfiddle.com/#!2/0ee42/34
Just mind the same column names for both tables, so you have to specify the exact table as well (e.g., objects_metadata.type_id = 8). If I completely misunderstand your question let me know! :)

Related

mysql select field determines order of result set

I am currently experiencing a (to me) very strange behaviour for one of my mysql 5.6 queries.
I have a given system I am trying to optimize. One step is to only select the fields necessary for the next operation.
The given query looks as follows:
SELECT oxv_oxcategories_6_fr.*
FROM oxv_oxobject2category_6 AS oxobject2category
LEFT JOIN oxv_oxcategories_6_fr ON oxv_oxcategories_6_fr.oxid =
oxobject2category.oxcatnid
WHERE oxobject2category.oxobjectid = '<hashed id>'
AND oxv_oxcategories_6_fr.oxid IS NOT NULL
AND (oxv_oxcategories_6_fr.oxactive = 1
AND oxv_oxcategories_6_fr.oxhidden = '0')
ORDER BY oxobject2category.oxtime
I have taken the libery to use more sensible naming in my own query:
SELECT
category_view.*
FROM oxv_oxobject2category_6 category_mapping_view
LEFT JOIN oxv_oxcategories_6_fr category_view ON category_view.OXID =
category_mapping_view.OXCATNID
WHERE category_mapping_view.OXOBJECTID = '<hashed id>'
AND category_view.OXID IS NOT NULL
AND (category_view.OXACTIVE = 1
AND category_view.OXHIDDEN = '0')
ORDER BY category_mapping_view.OXTIME
As you can see, there is not much difference, only the naming is different. So far, everything works as expected. Now I am trying to only select the values I need. So the query looks like this:
SELECT
category_view.OXID,
category_view.OXTITLE
FROM oxv_oxobject2category_6 category_mapping_view
LEFT JOIN oxv_oxcategories_6_fr category_view ON category_view.OXID =
category_mapping_view.OXCATNID
WHERE category_mapping_view.OXOBJECTID = '<hashed id>'
AND category_view.OXID IS NOT NULL
AND (category_view.OXACTIVE = 1
AND category_view.OXHIDDEN = '0')
ORDER BY category_mapping_view.OXTIME;
This also works as expected. But, I also need the field OXPARENTID, so I change the SELECT statement to
category_view.OXID,
category_view.OXTITLE,
category_view.OXPARENTID
Now the order of the items is different and I cannot seem to find out why that is. The new as well as the original query both sort for OXTIME without that field being present in the final result set. There are about 10 entries where OXTIME is 0, and it is those items that get turned around (ordering-wise) as soon as I query for OXPARENTID.
In the original query, OXPARENTID is present as well, so why does it make a difference now? I am guessing that there is some sort of ordering logic going on I do not yet know about.
Mind, that both joined tables are actually views, maybe that has something to do with it. Also, OXID and OXPARENTID are both md5 hashed values.
Any help would be greatly appreciated.
EDIT
In order to clarify, I know that the fact that multiple entries have OXTIME equal 0 makes it impossible to predict beforehand, which entry will be the top one. However, I still expected the order of the entries to be the same every time I call the query (regardless of what I am selecting).
One answer (#GordonLinoff) explains, that
[...] the same query can return the results in different order on different runs
Where does this "randomness" come from?
Your ordering is:
ORDER BY category_mapping_view.OXTIME;
And then you state:
There are about 10 entries where OXTIME is 0, and it is those items that get turned around (ordering-wise) as soon as I query for OXPARENTID.
What you have are ties in the keys. The results can be in any order -- and the same query can return the results in different order on different runs. Technically, the ordering in SQL is unstable.
You can fix this by including another column in the ORDER BY so each row is uniquely defined by the ORDER BY keys. Perhaps that is OXID:
ORDER BY category_mapping_view.OXTIME, category_view.OXID;
By the way, it is "obvious" that sorting in SQL is unstable. Why? SQL tables represent unordered sets. There is no ordering to fall back on when the keys are the same.

select all from one table but compare to multiple tables

Hello I have the following query:
SELECT *
FROM ams.TestResultHiPot_Archive,ams.unit u
WHERE timestamp >='3/16/2017 20:39 ' AND timestamp <= '3/17/2017, 20:39' AND LOWER(line_code)=LOWER('aac04')
AND LOWER(unitmodelnumber) like LOWER('%%%') AND unitmodelnumber != 'VTI' and u.serial_num=unitserialnumber and u.date_deleted is null
this table has many fields so I want to stay away from hard coding each field AND also this structure works for multiple tables.
The only issue I am having is some of the comparison items are in the units table so I want to visit that tablet to compare and if it matches then include that record in my result from the ORIGINALTEST RESULTS ARCHIVE table NOT the units table.
the main issue I am seeing is that the select * is causing both tables to return all of their fields.
Is there a way to use select * ,and still compare 2 tables but get the columns ONLY from one table ?
I have tried right join, left join, inner join but nothing seems to work, they all return all the columns from both tables, maybe I have done it incorrectly, or maybe this can't be done?
I also thought maybe doing like a query that selects all the table fields and then storing them in an array and passing that array as my select parameters, that way I am passing the exact needed parameters without hardcoding (since I would always consult the table) but that seems like it would take longer since pgsql is slower. Any suggestions are greatly appreciated.
select ams.TestResultHiPot_Archive.*

Select records with same value of another (MySql)

I have a mysql table that has a column "my_set", a non-unique key, and another column "my_element", a unique key. Each my_sel value can correspond to multiple my_element values, whereas each my_element corresponds to only one my_set.
All values of theese two columns are integers unsigned (11).
Starting from a my_element value, in a single query and without nested selects, I need to find all other my_elements that has same my_set.
The solution I would think was nested select
select my_element
from table
where my_set = (
select my_set
from table
where my_element = <elementValue>
)
But, as I explained, I'd like to find a better, maybe faster way to do it without subselect, as performance is being an issue due to the huge number of similar queries in the db scheduled maintenance phase.
Also, better db structure advice could be appreciated, but currently db refactoring is not allowed.
I am not 100% sure what you are asking, but I will try to answer from what I understood.
I think you need to use self join to get all the elements related to given element( which are related by same my_set). Try the below query.
select t2.my_element
from table t1
join table t2 on t1.my_set = m2.my_set and t2.my_element <> t1.my_element
where t1.my_element = "element";
If it does not work. Create a sql fiddle with sample data, that would make it easy for us.
You can try the below query. If i understand it correctly, use the left outer join between two tables to get the ordered set of set and element.
select
mysetid,
myelementid
from
tableset a
left outer join
tableset b
on
a.setid = b.setid
order by
a.setid,b.elementid
Thanks.

Subquery for fetching table name

I have a query like this :
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo WHERE group='BEST'
Basically, the table name for the main query is fetched through the subquery.
I get an error that #1054 - Unknown column 'group' in 'where clause'
When I investigate (removing the where clause), I find that the query only returns the subquery result at all times.
Subquery table adm_linkedfields has structure id | name | linktable
Currently am using MySQL with PDO but the query should be compatible with major DBs (viz. Oracle, MSSQL, PgSQL and MySQL)
Update:
The subquery should return the name of the table for the main query. In this case it will return tbl_company
The table tbl_company for the main query has this structure :
id | name | group
Thanks in advance.
Dynamic SQL doesn't work like that, what you created is an inline-view, read up on that. What's more, you can't create a dynamic sql query that will work on every db. If you have a limited number of linktables you could try using left-joins or unions to select from all tables but if you don't have a good reason you don't want that.
Just select the tablename in one query and then make another one to access the right table (by creating the query string in php).
Here is an issue:
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo
WHERE group='BEST';
You are selecting from DT which contains only one column "linktable", then you cant put any other column in where clause of outer block. Think in terms of blocks the outer select is refering a DT which contains only one column.
Your problem is similar when you try to do:
create table t1(x1 int);
select * from t1 where z1 = 7; //error
Your query is:
SELECT *
FROM (SELECT linktable
FROM adm_linkedfields
WHERE name = 'company'
) cbo
WHERE group='BEST'
First, if you are interested in cross-database compatibility, do not name columns or tables after SQL reserved words. group is a really, really bad name for a column.
Second, the from clause is returning a table containing a list of names (of tables, but that is irrelevant). There is no column called group, so that is the problem you are having.
What can you do to fix this? A naive solution would be to run the subquery, run it, and use the resulting table name in a dynamic statement to execute the query you want.
The fundamental problem is your data structure. Having multiple tables with the same structure is generally a sign of a bad design. You basically have two choices.
One. If you have control over the database structure, put all the data in a single table, linktable for instance. This would have the information for all companies, and a column for group (or whatever you rename it). This solution is compatible across all databases. If you have lots and lots of data in the tables (think tens of millions of rows), then you might think about partitioning the data for performance reasons.
Two. If you don't have control over the data, create a view that concatenates all the tables together. Something like:
create view vw_linktable as
select 'table1' as which, t.* from table1 t union all
select 'table2', t.* from table2 t
This is also compatible across all databases.

Putting together a SQL Stored Proc

So I have a couple SQL commands that I basically want to make a proc, but while doing this, I'd like to optimize them a little bit more.
The first part of it is this:
select tr_reference_nbr
from cfo_daily_trans_hist
inner join cfo_fas157_valuation on fv_dh_daily_trans_hist_id = dh_daily_trans_hist_id
inner join cfo_tran_quote on tq_tran_quote_id = dh_tq_tran_quote_id
inner join cfo_transaction on tq_tr_transaction_id = tr_transaction_id
inner join cfo_fas157_project_valuation ON fpv_fas157_project_valuation_id = fv_fpv_fas157_project_valuation_id AND fpv_status_bit = 1
group by tr_reference_nbr, fv_dh_daily_trans_hist_id
having count(*)>1
This query returns to me which tr_reference_nbr's exist that have duplicate data in our system, which needs to be removed. After this is run, I run this other query, copying and pasting in the tr_reference_nbr one at a time that the above query gave me:
select
tr_reference_nbr , dh_daily_trans_hist_id ,cfo_fas157_project_valuation.*,
cfo_daily_trans_hist.* ,
cfo_fas157_valuation.*
from cfo_daily_trans_hist
inner join cfo_fas157_valuation on fv_dh_daily_trans_hist_id = dh_daily_trans_hist_id
inner join cfo_tran_quote on tq_tran_quote_id = dh_tq_tran_quote_id
inner join cfo_transaction on tq_tr_transaction_id = tr_transaction_id
iNNER JOIN cfo_fas157_project_valuation ON fpv_fas157_project_valuation_id = fv_fpv_fas157_project_valuation_id
where
tr_reference_nbr in
(
[PASTEDREFERENCENUMBER]
)
and fpv_status_bit = 1
order by dh_val_time_stamp desc
Now this query gives me a bunch of records for that specific tr_reference_nbr. I then have to look through this data and find the rows that have a matching (duplicate) dh_daily_trans_hist_id. Once this is found, I look and make sure that the following columns also match for that row so I know they are true duplicates: fpv_unadjusted_sponsor_charge, fpv_adjusted_sponsor_charge, fpv_unadjusted_counterparty_charge, and fpv_adjusted_counterparty_charge.
If THOSE all match, I then look to yet another column, fv_create_dt, and make sure that there is less then a minute difference between the two timestamps there. If there is, I run yet another query on the row that was stored EARLIER, which looks like this:
begin tran
update cfo_fas157_valuation set fpv_status_bit = 0 where fpv_fas157_project_valuation_id = [IDRECIEVEDFROMTHEOTHERTABLE]
commit
As you can see, this is still a very manual process even though we do have a few queries written, but I'm trying to find a solution to where we can just run one query, and it would basically do EVERYTHING except for the final query. So basically something that would provide to us a few fpv_fas157_project_valuation_id's that need to be updated.
From looking at these queries, do any of you guys see an easy way to combine all this? I've been working on it all day and can't seem to get something to run. I feel like I keep screwing up the joins and stuff.
Thanks!
You can combine these queries in multiple ways:
use temporary tables to store results of queries - suitable for stored procedure
use table variables to store results of queries - suitable for stored procedure
use Common Table Expressions (CTEs) to store results of queries - suitable for single query
Once You have them in separate tables/variables/CTEs You can easily join them.
Then You have to do one more thing, and that is to find difference in datetime in two consecutive rows. There is a trick to do this:
use ROW_NUMBER() to add a column with number of row partitioned by grouping fields (tr_reference_nbr, ... ) ordered by fv_create_dt
do a self join on A.ROW_NUMBER = B.ROW_NUMBER + 1
check the difference between A.fv_create_dt and B.fv_create_dt to filter the rows with difference less than a minute
Just do a good test of your self-join to make sure You filter only rows You need to filter.
If You still have problems with this, don't hesitate to leave a comment.
Interesting note: SQL Server Denali has T-SQL enhancements LEAD and LAG to access subsequent and previous row without self-joins.