T-SQL Inner Join Optimization between View and Table Value Function - function

I have a view and table value function, and make inner join between them. There are few millions records from each side that should be joined.
I have read a lot of how to optimize joins (the most common answer is to use indexes, but my view can not be indexed view) but still did not find a way of how join between view and function should be done in a correct way and optimized.
EDIT:
To show that the problem is the inner join indeed - I made "SELECT COUNT(*)" for each query:
View - 0 seconds
Function - 18 seconds
Function inner join View - 42 seconds

Let's take a look from the other point of view - do you really need this join on such huge amount of data?
I have the similar issue and this is my situation and solution :
There was no needs in all of millions records in my case - just a specific filtered data
I've created triggers for insert\update\delete from that huge table to some other - they aimed to track data I need
Now I can make fast joins on that specific (let's say - filtered) data
This approach has his own proc and cons; main inconvenience - you should rebuild\review your DB structure: that way maybe not suitable in all cases
Anyway - this is my solution of my particular issue, and it still works fine, I've increased performance at least in 10 times

Related

Performance, Why JOIN is faster than IN

I tried to optimize some PHP code that performs a lot of queries on different tables (that include data).
The logic was to take some fields from each table by neighborhood id(s) depends whether it was city(a lot of neighborhoods ids) or specific neighborhood.
For example, assume that I have 10 tables of this format:
neighborhood_id | some_data_field
The queries were something like that:
SELECT `some_data_field`
FROM `table_name` AS `data_table`
LEFT JOIN `neighborhoods_table` AS `neighborhoods` ON `data_table`.`neighborhood_id' = `neighborhoods`.`neighborhood_id`
WHERE `neighborhood`.`city_code` = SOME_ID
Because there were like 10 queries like that I tried to optimize the code by removing the join from 10 queries and perform one query to neighborhoods table to get all neighborhood codes.
Then in each query I did WHERE IN on the neighborhoods ids.
The expected result was a better performance, but it turns out that it wasn't better.
When I perform a request to my server the first query takes 20ms the second takes more and the third takes more and so on. (the second and the third take something like 200ms) but with JOIN the first query takes 40ms but the rest of the queries take 20ms-30ms.
The first query in request shows us that where in is faster but I assume that MYSQL has some cache when dealing with JOINs.
So I wanted to know how can I improove my where in queries?
EDIT
I read the answer and comments and I understood I didn't explain well why I have 10 tables because each table categorized by property.
For example, one table contains values by floor and one by rooms and one by date
so it isn't possible to union all tables to one table.
Second Edit
I'm still misunderstood.
I don't have only one data column per table, every table has it's own amount of fields, it can be 5 fields for one table and 3 for another. and different data types or formatting types, it can be date or money present
ation ,additionally, I perform in my queries some calculations about those fields, some times it can be AVG or weighted average and in some tables it's only pure select.
Additionally I perform group by by some fields in one table it can be by rooms and in other it can be by floor
For example, assume that I have 10 tables of this format:
This is the basis of your problem. Don't store the same information in multiple tables. Store the results in a single table and let MySQL optimize the query.
If the original table had "information" -- say the month the data was generated -- then you may need to include this as an additional column.
Once the data is in a single table, you can use indexes and partitioning to speed the queries.
Note that storing the data in a single table may require changes to your ingestion processes -- namely, inserting the data rather than creating a new table. But your queries will be simpler and you can optimize the database.
As for which is faster, an IN or a JOIN. Both are doing similar things under the hood. In some circumstances, one or the other is faster, but both should make use of indexes and partitions if they are available.

SQL Optimization: how to JOIN a table with itself

I'm trying to optimize a SQL query and I am not sure if further optimization is possible.
Here's my query:
SELECT someColumns
FROM (((smaller_table))) AS a
INNER JOIN (((smaller_table))) AS b
ON a.someColumnProperty = b.someColumnProperty
...the problem with this approach is that my table has half a trillion records in it. In my query, you'll notice (((smaller_table))). I wrote that as an abbreviation for a SELECT statement being run on MY_VERY_LARGE_TABLE to reduce it's size.
(((smaller_table))) appears twice, and the code within is exactly the same both times. There's no reason for me to run the same sub-query twice. This table is several TB and I shouldn't scan through it twice just to get the same results.
Do you have any suggestions on how I can NOT run the exact same reduction twice? I tried replacing the INNER JOIN line with INNER JOIN a AS b but got an "unrecognized table a" warning. Is there any way to store the value of a so I can reuse it?
Thoughts:
Make sure there is an index on userid and dayid.
I would ask you to define better what it is you are trying to find out.
Examples:
What is the busiest time of the week?
Who are the top 25 people who come to the gym the most often?
Who are the top 25 people who utilize the gem the most? (This is different than the one above because maybe I have a user that comes 5 times a month, but stays 5 hours per session vs a user that comes 30 times a month and stays .5 hour per session.)
Maybe doing all days in a horizontal method (day1, day2, day3) would be better visually to try to find out what you are looking for. You could easily put this into excel or libreoffice and color the days that are populated to get a visual "picture" of people who come consecutively.
It might be interesting to run this for multiple months to see if what the seasonality looks like.
Alas CTE is not available in MySQL. The ~equivalent is
CREATE TABLE tmp (
INDEX(someColumnProperty)
)
SELECT ...;
But...
You can't use CREATE TEMPORARY TABLE because such can't be used twice in the same query. (No, I don't know why.)
Adding the INDEX (or PK or ...) during the CREATE (or afterwards) provides the very necessary key for doing the self join.
You still need to worry about DROPping the table (or otherwise dealing with it).
The choice of ENGINE for tmp depends on a number of factors. If you are sure it will be "small" and has no TEXT/BLOB, then MEMORY may be optimal.
In a Replication topology, there are additional considerations.

MYSQL: Save Query Results For Subsequent Joins

I am developing an application for my college where users will be able to define a filter and view posts of students that match the filter criterion.
Initially, MYSQL will query to find the user_id of all students that match the filter parameter (year, major, etc). It will then use this result to query and find the corresponding posts/events linked to those user_id's via JOIN.
QUESTION:
Since the same user_id's are used for several times for separate JOIN queries (events, posts, etc.), I was wondering if it would be possible to internally store the results in mysql to speed up subsequent JOIN queries that use the data.
REJECTED SOLUTIONS:
Use MySQL query cache - does not apply as the queries are not the same each time; the initial join sequence is the same but then a different join parameter is applied to each query.
Pull data into API (php) and then send query using a long where user_id = IN(#, #, #...). There may be 10,000 user ids to send back to MYSQL. The query would be so large it would offset the JOIN savings.
Don't solve performance problems that don't exist. That is, first try out the various queries. If they meet the performance criteria for the application, continue and do other things. Users are more interested in more features and more stability, than in squeezing microseconds out of inner loops.
That said, the normal process is a temporary table. However, if your joins are properly indexed and the result sets are small (that is, you are not doing full table scans), then the performance gain may be negligible.
create or replace view database.foo_query_view as select id from students where [match-criteria]. Do note that views are read-only. However, given that you seem to be wanting to do only selects, it should be fine.

Efficient Query, Table Bridge/Indexing and strucuture

In my PhpMyAdmin database I have 9 tables. 8 of those tables are relevant to me at this moment. I would like my queries to be executed quickly enough but I am not sure the design/structure of the tables are the most effecient. Any suggestion of merging a table with another or creating another bridge table? Also, I am struggling in building a query that will display bridge results from the following tables: semester, schedule, office_hours, faculty, section, major_minor, major_class_br, class?
TABLE STRUCTURE
Basic Query- that shows class details
SELECT class_name, class_caption, class_credit_hours, class_description
FROM class
Here's a start:
select *
from
schedule
inner join semester
on schedule.semester_id = semester.id
inner join office_hours
on office_hours.id = schedule.???
It's not clear how office_hours correlates with schedule?
'queries to be executed quickly enough'
INSERT or SELECT?
If you want your INSERT/UPDATE to be fast, normalise it to the nth degree
If you want your SELECT to be fast, denormalise it (which of course makes INSERT's UPDATE's complicated and slow)
If your main table (schedule?) has < 10,000 records and it's a decent RDBMS then it's probably running as fast as it can.
Normally the performance tuning process involves
Identifying a workload (what queries do I usually run)
Getting a baseline (how long do they take to run)
Tuning (adding indexes, changing design)
Repeat
So we would really need to have an idea of what kind of queries are performing slowly, or alternatively what kind of growth you expect in which tables.
I'm not sure what you mean by 'bridge results'. What happens when you buidl a query that joins the tables as per the physical diagram? an error or unexpected ersults?

SQL Server OUTER JOIN multiple linked fields

I am trying to query data from two tables into one tables using OUTER JOIN. The thing is that to uniquely identify the rows, three fields are needed. This brings me to query containing this expression:
FROM Data1 DB
RIGHT OUTER JOIN Data2 FT on (DB.field1 = FT.Value1
and DB.field2 = FT.field2
and DB.field3 = FT.field3)
However, the query runs for pretty much forever. To test the whole thing I used WHERE conditions and FULL OUTER JOIN and in the case of WHERE conditions it is done almost instantly whereas using the FULL OUTER JOIN I had the same trouble and usually ended up cancelling the whole thing after 5 minutes or so.
Can anyone see what I am doing wrong with my query? Thanks for any help!
Do you really need all the records back from the query? Some WHERE criteria could cut execution time down considerably.
Yes, and indexes. Check the plan and create recomended indexes.
Your best bet is to view the execution plan (and if you are comfortable with it, post a screenshot of it in your question). That'll tell you where the most expensive portion of the query is happening.