MYSQL: Save Query Results For Subsequent Joins - mysql

I am developing an application for my college where users will be able to define a filter and view posts of students that match the filter criterion.
Initially, MYSQL will query to find the user_id of all students that match the filter parameter (year, major, etc). It will then use this result to query and find the corresponding posts/events linked to those user_id's via JOIN.
QUESTION:
Since the same user_id's are used for several times for separate JOIN queries (events, posts, etc.), I was wondering if it would be possible to internally store the results in mysql to speed up subsequent JOIN queries that use the data.
REJECTED SOLUTIONS:
Use MySQL query cache - does not apply as the queries are not the same each time; the initial join sequence is the same but then a different join parameter is applied to each query.
Pull data into API (php) and then send query using a long where user_id = IN(#, #, #...). There may be 10,000 user ids to send back to MYSQL. The query would be so large it would offset the JOIN savings.

Don't solve performance problems that don't exist. That is, first try out the various queries. If they meet the performance criteria for the application, continue and do other things. Users are more interested in more features and more stability, than in squeezing microseconds out of inner loops.
That said, the normal process is a temporary table. However, if your joins are properly indexed and the result sets are small (that is, you are not doing full table scans), then the performance gain may be negligible.

create or replace view database.foo_query_view as select id from students where [match-criteria]. Do note that views are read-only. However, given that you seem to be wanting to do only selects, it should be fine.

Related

MySQL - How to maintain acceptable response time while querying with many filters (Should I use Redis?)

I have a table called users with a couple dozen columns such as height, weight, city, state, country, age, gender etc...
The only keys/indices on the table are for the columns id and email.
I have a search feature on my website that filters users based on these various columns. The query could contain anywhere from zero to a few dozen different where clauses (such as where `age` > 40).
The search is set to LIMIT 50 and ORDER BY `id`.
There are about 100k rows in the database right now.
If I perform a search with zero filters or loose filters, mysql basically just returns the first 50 rows and doesn't have to read very many more rows than that. It usually takes less than 1 second to complete this type of query.
If I create a search with a lot of complex filters (for instance, 5+ where clauses), MySQL ends up reading through the entire database of 100k rows, trying to accumulate 50 valid rows, and the resulting query takes about 30 seconds.
How can I more efficiently query to improve the response time?
I am open to using caching (I already use Redis for other caching purposes, but I don't know where to start with properly caching a MySQL table).
I am open to adding indices, although there are a lot of different combinations of where clauses that can be built. Also, several of the columns are JSON where I am searching for rows that contain certain elements. To my knowledge I don't think an index is a viable solution for that type of query.
I am using MySQL version 8.0.15.
In general you need to create indexes for the columns which are mentioned in the criteria of the WHERE clauses. And you can also create indexes for JSON columns, use generated column index: https://dev.mysql.com/doc/refman/8.0/en/create-table-secondary-indexes.html.
Per the responses in the comments from ysth and Paul, the problem was just the server capacity. After upgrading the an 8GB RAM server, to query times dropped to under 1s.

Performance, Why JOIN is faster than IN

I tried to optimize some PHP code that performs a lot of queries on different tables (that include data).
The logic was to take some fields from each table by neighborhood id(s) depends whether it was city(a lot of neighborhoods ids) or specific neighborhood.
For example, assume that I have 10 tables of this format:
neighborhood_id | some_data_field
The queries were something like that:
SELECT `some_data_field`
FROM `table_name` AS `data_table`
LEFT JOIN `neighborhoods_table` AS `neighborhoods` ON `data_table`.`neighborhood_id' = `neighborhoods`.`neighborhood_id`
WHERE `neighborhood`.`city_code` = SOME_ID
Because there were like 10 queries like that I tried to optimize the code by removing the join from 10 queries and perform one query to neighborhoods table to get all neighborhood codes.
Then in each query I did WHERE IN on the neighborhoods ids.
The expected result was a better performance, but it turns out that it wasn't better.
When I perform a request to my server the first query takes 20ms the second takes more and the third takes more and so on. (the second and the third take something like 200ms) but with JOIN the first query takes 40ms but the rest of the queries take 20ms-30ms.
The first query in request shows us that where in is faster but I assume that MYSQL has some cache when dealing with JOINs.
So I wanted to know how can I improove my where in queries?
EDIT
I read the answer and comments and I understood I didn't explain well why I have 10 tables because each table categorized by property.
For example, one table contains values by floor and one by rooms and one by date
so it isn't possible to union all tables to one table.
Second Edit
I'm still misunderstood.
I don't have only one data column per table, every table has it's own amount of fields, it can be 5 fields for one table and 3 for another. and different data types or formatting types, it can be date or money present
ation ,additionally, I perform in my queries some calculations about those fields, some times it can be AVG or weighted average and in some tables it's only pure select.
Additionally I perform group by by some fields in one table it can be by rooms and in other it can be by floor
For example, assume that I have 10 tables of this format:
This is the basis of your problem. Don't store the same information in multiple tables. Store the results in a single table and let MySQL optimize the query.
If the original table had "information" -- say the month the data was generated -- then you may need to include this as an additional column.
Once the data is in a single table, you can use indexes and partitioning to speed the queries.
Note that storing the data in a single table may require changes to your ingestion processes -- namely, inserting the data rather than creating a new table. But your queries will be simpler and you can optimize the database.
As for which is faster, an IN or a JOIN. Both are doing similar things under the hood. In some circumstances, one or the other is faster, but both should make use of indexes and partitions if they are available.

Shall I get count of records for each categories by using Count(*) or using separate count table? Or any other?

I am developing a website by using ASP.net and my DB is MYSQL.
Users can put ads for each categories. And I want to display how much ads for each category infront of the category.
Like this.
To achieve this now I am using a code similar to this
SELECT b.name, COUNT(*) AS count
FROM `vehicle_cat` a
INNER JOIN `vehicle_type` b
ON a.`type_id_ref` = b.`vehicle_type_id`
GROUP BY b.name
This is my explain result
So assume I have 200,000 records for each category.
So am I doing the right thing by considering the performance and efficiency?
What if I manage a separate table for store count for each category? If user save a record for each category I am incrementing the value for corresponding type. Assume 100,000 of users will Post records at once. Is it crash my DB?
Or is there any solutions?
Start by developing the application using the query. If performance is a problem, then create indexes on the query to optimize the query. If indexes are not sufficient, then think about partitioning.
Things not to do:
Don't create a separate table for each category.
Don't focus on performance before you have a performance problem. Do reasonable things, but get the functionality to work first.
If you do need to maintain counts in a separate table for performance reasons, you will probably have to maintain them using triggers.
You can use any caching solution, probably in memory caching like Redis or Memcached. And store your counters here. On cache initialization get them with your SQL script, later change this counters when adding or deleting ads. It will be faster then store them in SQL.
But you probably need to check if COUNT(*) is really hard operation in your SQL database. SQL engine is clever and may be this SELECT is working fast enough or you can optimize it well. If it works, you'd better do nothing until you have perfomance problems!

Return number of related records for the results of a query

I have 2 related tables, Tapes & Titles related through the Fields TapeID
Tapes.TapeID & Titles.TapeID
I want to be able to query the Tapes Table on the Column Artist and then return the number of titles for each of the matching Artist records
My Query is as follows
SELECT Tapes.Artist,COUNT(Titles.TapeID)
FROM Tapes
INNER JOIN Titles on Titles.TapeID=Tapes.TapeID
GROUP BY Tapes.Artist
HAVING TAPES.Artist LIKE <ArtistName%>"
The query appears to run then seems to go into an indefinite loop
I get no syntax errors and no results
Please point out the error in my query
Here are two likely culprits for this poor performance. The first would be the lack of index on Tapes.TapeId. Based on the naming, I would expect this to be the primary key on the Tapes table. If there are no indexes, then you could get poor performance.
The second would involve the selectivity of the having clause. As written, MySQL is going to aggregate all the data for the group by and then filter out the groups. In many cases, this would not make much of a difference. But, if you have lots of data and the condition is selective (meaning few rows match), then moving it to a where clause would make a difference.
There are definitely other possibilities. For instance, the server could be processing other queries. An update query could be locking one of the tables. Or, the columns TapeId could have different types in the two tables.
You can modify your question to include the definition of the two tables. Also, put explain before the query and include the output in the question. This indicates the execution plan chosen by MySQL.

MySQL performance on storing and returning ids

I have an API where I need to log which ids from a table that were returned in a query, and in another query, return results sorted based on the log of ids.
For example:
Tables products had a PK called id and users had a PK called id . I can create a log table with one insert/update per returned id. I'm wondering about performance and the design of this.
Essentially, for each returned ID in the API, I would:
INSERT INTO log (product_id, user_id, counter)
VALUES (#the_product_id, #the_user_id, 1)
ON DUPLICATE KEY UPDATE counter=counter+1;
.. I'd either have an id column as PK or a combination of product_id and user_id (alt. having those two as a UNIQUE index).
So the first issue is the performance of this (20 insert/updates and the effect on my select calls in the API) - is there a better/smarter way to log these IDs? Extracting from the webserver log?
Second is the performance of the select statements to include the logged data, to allow a user to see new products every request (a simplified example, I'd specify the table fields instead of * in real life):
SELECT p.*, IFNULL(
SELECT log.counter
FROM log
WHERE log.product_id = p.id
AND log.user_id = #the_user_id
, 0 ) AS seen_by_user
FROM products AS p
ORDER BY seen_by_user ASC
In our database, the products table has millions of rows, and the users table is growing rapidly. Am I right in my thinking to do it this way, or are there better ways? How do I optimize the process, and are there tools I can use?
Callie, I just wanted to flag a different perspective to keymone, and it doesn't fit into a comment hence this answer.
Performance is sensitive to the infrastructure environment: are you running in a shared hosting service (SHS), a dedicated private virtual service (PVS) or dedicate server, or even a multiserver config with separate web and database servers.
What are your transaction rates and volumetics? How many insert/updates are you doing per min in your 2 peaks trading hours in the day? What are your integrity requirements v.v the staleness of log counters?
Yes, keymone's points are appropriate if you are doing, say, 3-10 updates per second, and as you move into this domain, some form of collection process to batch up inserts to allow bulk insert becomes essential. But just as important here are Qs are choice of storage engine, transactional vs batch split and the choice of infrastructure architecture itself (in-server DB instance vs separate DB server, master/slave configurations ...).
However, if you are averaging <1/sec then INSERT ON DUPLICATE KEY UPDATE has comparable performance to the equivalent UPDATE statements and it is the better approach if doing single row insert/updates as it ensures ACID integrity of the counts.
Any form of PHP process start-up will typically take ~100mSec on your web server, so even thinking of this to do an asynchronous update is just plain crazy as the performance hit is significantly larger than the update itself.
Your SQL statement just doesn't jive with your comment that you have "millions of rows" in the products table as it will do a full fetch of the product table executing a correlated subquery on every row. I would have used a LEFT OUTER JOIN myself, with some sort of strong constraint to filter which product items are appropriate to this result set. However it runs, it will take materially longer to execute that any count update.
you will have really bad performance with such approach.
mysql is not exactly well suited for logging so here are few steps you might do to achieve good performance:
instead of maintaining stats table on fly (the update on duplicate key bit which will absolutely destroy your performance) you want to have a single raw logs table where you will just be doing inserts and every now and then(say daily) you would be running a script that aggregates data from that table into real statistics table.
instead of having single statistics table - have a daily stats, monthly stats, etc. aggregate jobs would then be building up data from already aggregated stuff - awesome for performance. it also allows you to drop(or archive) stats granularity over time - who the hell cares about daily stats in 2 years time? or at least about "real-time" access to those stats.
instead of inserting into log table use something like syslog-ng to gather such information into log files(much less load on mysql server[s]) and then aggregate data into mysql from raw text files(many choices here, you can even import raw files back into mysql if your aggregation routine really needs some SQL-flexibility)
that's about it