Note: I am using asyncSession, so there is no "query" method attached to it.
I am trying to make following subquery (named as distant subquery):
With some_table as
(Select asset_id, {some_math_functions} as distance from table)
SELECT * from some_table where distance < threshold
After making the subquery, I want to join this subquery with other queries, and do some more operations there.
select(selections).join(
table1, table1.model_id == models.id).join(
..................).join(
..................).join(
distance_subquery, distance_subquery.asset_id == table1.asset_id)
And then perform final asyncsession.execute(selections.filter(..).order(...))
I have not found good resources where I can use subquery from raw sql and use it with other subqueries in sqlAlchemy. Any help will be highly appreciated.
Related
I have a mysql query to get all places count from an area. If I query for only one id it's really quick, if I query for two ids or more then it's really slow.
Areas.geometry and Places.location are SPATIAL indexes.
There is only 3 rows ( all have complex geometry. The row 3 is the more complex ) in areas table and 3000 rows in stores. I build up a demo sql file to import if you want to test : geospatial-exemple.sql
Some exemple :
This query is running in 260ms :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(a.geometry,p.location)
) as places_count
FROM areas a
WHERE a.id in (1)
This query is running in 320ms :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(a.geometry,p.location)
) as places_count
FROM areas a
WHERE a.id in (3)
This query is running in 50s :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(a.geometry,p.location)
) as places_count
FROM areas a
WHERE a.id in (1,3)
I also tried to hardcode the areas.geometry in the query with the more complex MULTIPOLYGON
This query is running in 380ms :
select a.name,
(
SELECT count(*)
FROM places p
WHERE ST_Contains(ST_GeomFromText("MULTIPOLYGON((...))",
4326,
'axis-order=long-lat'),p.location)
) as places_count
FROM areas a
WHERE a.id in (1,3)
So clearly it's faster to run multiple queries than only one and wait for a minutes. If somebody know if it's a mysql bug or if there is another way to do that ?
Working with Join query give the same results.
According to John Powells answer here, there is an undocumented limitation for spatial indexes:
For the Contains and Intersects functions to work properly, and for the index to be used, you need to have one of the geometries be a constant. This doesn't appear to be documented, although all the examples you will see with MySQL with Intersects/Contains work this way.
So running multiple queries with one area each would indeed be faster.
If you have the permissions to create functions, you can however use a workaround by running your subquery in a function, where areas.geometry will now act as a constant parameter for ST_Contains():
CREATE FUNCTION fn_getplacescount(_targetarea GEOMETRY)
RETURNS INT READS SQL DATA
RETURN (SELECT COUNT(*) FROM places p WHERE ST_Contains(_targetarea, p.location));
Now
SELECT a.name, fn_getplacescount(a.geometry) AS places_count
FROM areas a WHERE a.id in (1,3);
would be similar to running each area separately, and should have a similar execution time as using two separate queries.
I would try to express it as a join and see if MySQL run it faster. Not sure if MySQL has optimized spatial join, but it would be faster in the databases I worked with.
Something like this (I did not check the syntax):
SELECT areas.name, count(*) as places_count
FROM places p JOIN areas a
ON ST_Contains(a.geometry, p.location)
WHERE a.type = "city"
GROUP BY 1;
I am having trouble finding the syntax issue that is causing the following query to give me no results:
SELECT Table1.Country, Table_Data2.Part, Table_Data2.Description, Sum(Table_Data2.Quantity) AS Quantity, Table1.ship_time
FROM Table1 INNER JOIN Table_Data2 ON Table1.CodeValue = Table_Data2.CodeValue
GROUP BY Table1.Country, Table_Data2.Part, Table_Data2.Description, Table1.ship_time
HAVING (((Table_Data2.Part)="BB1234" Or (Table_Data2.Part)="BB-3454") AND ((Table1.ship_time)=Date()));
Which should successfully result in a table that looks like this:
Example of what result should look like
Instead there are no syntax issues that arise nor are there any records that load.
It seems theres a syntax issue in the code above as it does not work in mySQL as it does in MS Access
Few corrections possible:
To get current date in MySQL, use Current_Date(). Date() function in MySQL has different behaviour, as it is used to extract date part out of a date(time) expression.
Parentheses around just field names are unnecessary. Use aliasing in multi table query for code clarity and read ability.
Moreover looking at your conditions in the Having clause, they are more suited to be used in the Where clause. Because they are not aggregated values and you are grouping on these same fields as well. You query will become more performant if you shift them to Where clause, as MySQL will be aggregating on filtered (reduced) data, and thus minimizing temp table space.
Also, you can rewrite multiple OR conditions on same field as IN(...)
You can rewrite as:
SELECT
t1.Country,
t2.Part,
t2.Description,
Sum(t2.Quantity) AS Quantity,
t1.ship_time
FROM Table1 AS t1
INNER JOIN Table_Data2 AS t2
ON t1.CodeValue = t2.CodeValue
WHERE
t2.Part IN ('BB1234', 'BB-3454')
AND t1.ship_time = Current_Date()
GROUP BY
t1.Country,
t2.Part,
t2.Description,
t1.ship_time
I am doing query on view with single predicates which gives me the record in 4-7 seconds, but when i try to retrieve the record with same predicate and directly with underlying query from that view it gives me records in less then seconds. I am using MySQL.
I have tried checking the execution plan of both the query and it gives major differences if i have hundreds of thousands of records in tables.
So any clue or idea why performance is better when using query directly?
Following is my view definition
SELECT entity_info.source_entity_info_id AS event_sync_id,
entity_info.source_system_id AS source_system_id,
entity_info.target_system_id AS destination_system_id,
event_sync_info.integrationid AS integration_id,
event_sync_info.source_update_time AS last_updated,
entity_info.source_internal_id AS source_entity_internal_id,
entity_info.source_entity_project AS source_entity_project,
entity_info.target_internal_id AS destination_entity_internal_id,
entity_info.destination_entity_project AS destination_entity_project,
entity_info.source_entity_type AS source_entity_type,
entity_info.destination_entity_type AS destination_entity_type,
event_sync_info.opshub_update_time AS opshub_update_time,
event_sync_info.entity_info_id AS entity_info_id,
entity_info.global_id AS global_id,
entity_info.target_entity_info_id AS target_entity_info_id,
entity_info.source_entity_info_id AS source_entity_info_id,
(
SELECT Count(0) AS count(*)
FROM ohrv_failed_event_view_count failed_event_view
WHERE ((
failed_event_view.integration_id = event_sync_info.integrationid)
AND (
failed_event_view.entityinfo = entity_info.source_entity_info_id))) AS no_of_failures
FROM (ohrv_entity_info entity_info
LEFT JOIN ohmt_eai_event_sync_info event_sync_info
ON ((
entity_info.source_entity_info_id = event_sync_info.entity_info_id)))
WHERE (
entity_info.source_entity_info_id IS NOT NULL)
Query examples
select * from view where integration_id=10
Execution plan of this processes 142668 rows for sub query that is there in this view
select QUERY_OF_VIEW and integration_id=10
Execution plan of this looks good and only required rows are getting processed.
I think the issue is in the following query:
SELECT * FROM view WHERE integration_id = 10;
This forces MySQL to materialize an intermediate table, against which it then has to query again to apply the restriction in the WHERE clause. On the other hand, in the second version:
SELECT (QUERY_OF_VIEW with WHERE integration_id = 10)
MySQL does not have to materialize anything other than the query in the view itself. That is, in your second version MySQL just has to execute the query in the view, without any subsequent subquery.
refereeing to this link of documentation you can see,that its depend on if the MERGE algorithm can used it will , but if its not applicable so new temp table must generated to find the relations of data, also you can see this answer that talking about optimization and when to use view and when you should not .
If the MERGE algorithm cannot be used, a temporary table must be used
instead. MERGE cannot be used if the view contains any of the
following constructs:
Aggregate functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subquery in the select list
Refers only to literal values (in this case, there is no underlying
table)
About query optimizations, I'm wondering if statements like one below get optimized:
select *
from (
select *
from table1 t1
join table2 t2 using (entity_id)
order by t2.sort_order, t1.name
) as foo -- main query of object
where foo.name = ?; -- inserted
Consider that the query is taken care by a dependency object but just (rightly?) allows one to tack in a WHERE condition. I'm thinking that at least not a lot of data gets pulled in to your favorite language, but I'm having second thoughts if that's an adequate optimization and maybe the database is still taking some time going through the query.
Or is it better to take that query out and write a separate query method that has the where and maybe a LIMIT 1 clause, too?
In MySQL, no.
The predicate in an outer query does not get "pushed" down into the inline view query.
The query in the inline view is processed first, independent of the outer query. (MySQL will optimize that view query just like it would optimize that query if you submitted that separately.)
The way that MySQL processes this query: the inline view query gets run first, the result is materialized as a 'derived table'. That is, the result set from that query gets stored as a temporary table, in memory in some cases (if it's small enough, and doesn't contain any columns that aren't supported by the MEMORY engine. Otherwise, it's spun out to disk with as a MyISAM table, using the MyISAM storage engine.
Once the derived table is populated, then the outer query runs.
(Note that the derived table does not have any indexes on it. That's true in MySQL versions before 5.6; I think there are some improvements in 5.6 where MySQL will actually create an index.
Clarification: indexes on derived tables: As of MySQL 5.6.3 "During query execution, the optimizer may add an index to a derived table to speed up row retrieval from it." Reference: http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html
Also, I don't think MySQL "optimizes out" any unneeded columns from the inline view. If the inline view query is a SELECT *, then all of the columns will be represented in the derived table, whether those are referenced in the outer query or not.
This can lead to some significant performance issues, especially when we don't understand how MySQL processes a statement. (And the way that MySQL processes a statement is significantly different from other relational databases, like Oracle and SQL Server.)
You may have heard a recommendation to "avoid using views in MySQL". The reasoning behind this general advice (which applies to both "stored" views and "inline" views) is the significant performance issues that can be unnecessarily introduced.
As an example, for this query:
SELECT q.name
FROM ( SELECT h.*
FROM huge_table h
) q
WHERE q.id = 42
MySQL does not "push" the predicate id=42 down into the view definition. MySQL first runs the inline view query, and essentially creates a copy of huge_table, as an un-indexed MyISAM table. Once that is done, then the outer query will scan the copy of the table, to locate the rows satisfying the predicate.
If we instead re-write the query to "push" the predicate into the view definition, like this:
SELECT q.name
FROM ( SELECT h.*
FROM huge_table h
WHERE h.id = 42
) q
We expect a much smaller resultset to be returned from the view query, and the derived table should be much smaller. MySQL will also be able to make effective use of an index ON huge_table (id). But there's still some overhead associated with materializing the derived table.
If we eliminate the unnecessary columns from the view definition, that can be more efficient (especially if there are a lot of columns, there are any large columns, or any columns with datatypes not supported by the MEMORY engine):
SELECT q.name
FROM ( SELECT h.name
FROM huge_table h
WHERE h.id = 42
) q
And it would be even more efficient to eliminate the inline view entirely:
SELECT q.name
FROM huge_table q
WHERE q.id
I can't speak for MySQL - not to mention the fact that it probably varies by storage engine and MySQL version, but for PostgreSQL:
PostgreSQL will flatten this into a single query. The inner ORDER BY isn't a problem, because adding or removing a predicate cannot affect the ordering of the remaining rows.
It'll get flattened to:
select *
from table1 t1
join table2 t2 using (entity_id)
where foo.name = ?
order by t2.sort_order, t1.name;
then the join predicate will get internally converted, producing a plan corresponding to the SQL:
select t1.col1, t1.col2, ..., t2.col1, t2.col2, ...
from table1 t1, table2 t2
where
t1.entity_id = t2.entity_id
and foo.name = ?
order by t2.sort_order, t1.name;
Example with a simplified schema:
regress=> CREATE TABLE demo1 (id integer primary key, whatever integer not null);
CREATE TABLE
regress=> INSERT INTO demo1 (id, whatever) SELECT x, x FROM generate_series(1,100) x;
INSERT 0 100
regress=> EXPLAIN SELECT *
FROM (
SELECT *
FROM demo1
ORDER BY id
) derived
WHERE whatever % 10 = 0;
QUERY PLAN
-----------------------------------------------------------
Sort (cost=2.51..2.51 rows=1 width=8)
Sort Key: demo1.id
-> Seq Scan on demo1 (cost=0.00..2.50 rows=1 width=8)
Filter: ((whatever % 10) = 0)
Planning time: 0.173 ms
(5 rows)
... which is the same plan as:
EXPLAIN SELECT *
FROM demo1
WHERE whatever % 10 = 0
ORDER BY id;
QUERY PLAN
-----------------------------------------------------------
Sort (cost=2.51..2.51 rows=1 width=8)
Sort Key: id
-> Seq Scan on demo1 (cost=0.00..2.50 rows=1 width=8)
Filter: ((whatever % 10) = 0)
Planning time: 0.159 ms
(5 rows)
If there was a LIMIT, OFFSET, a window function, or certain other things that prevent qualifier push-down/pull-up/flattening in the inner query then PostgreSQL would recognise that it can't safely flatten it. It'd evaluate the inner query either by materializing it or by iterating over its output and feeding that to the outer query.
The same applies for a view. PostgreSQL will in-line and flatten views into the containing query where it is safe to do so.
Hi i have this query but its giving me an error of Operand should contain 1 column(s) not sure why?
Select *,
(Select *
FROM InstrumentModel
WHERE InstrumentModel.InstrumentModelID=Instrument.InstrumentModelID)
FROM Instrument
according to your query you wanted to get data from instrument and instrumentModel table and in your case its expecting "from table name " after your select * .when the subselect query runs to get its result its not finding table instrument.InstrumentModelId inorder to fetch result from both the table by matching you can use join .or you can also select perticuler fields by tableName.fieldName and in where condition use your condition.
like :
select Instrument.x,InstrumentModel.y
from instrument,instrumentModel
where instrument.x=instrumentModel.y
You can use a join to select from 2 connected tables
select *
from Instrument i
join InstrumentModel m on m.InstrumentModelID = i.InstrumentModelID
When you use subqueries in the column list, they need to return exactly one value. You can read more in the documentation
as a user commented in the documentation, using subqueries like this can ruin your performance:
when the same subquery is used several times, mysql does not use this fact to optimize the query, so be careful not to run into performance problems.
example:
SELECT
col0,
(SELECT col1 FROM table1 WHERE table1.id = table0.id),
(SELECT col2 FROM table1 WHERE table1.id = table0.id)
FROM
table0
WHERE ...
the join of table0 with table1 is executed once for EACH subquery, leading to very bad performance for this kind of query.
Therefore you should rather join the tables, as described by the other answer.