I am making a website and have a series of sql statements that are used over and over. I'm wondering if there's any way to optimize this process (in terms of performance) using views, procedures, or something else that I don't know about. The backend works like so:
The frontend make a request to www.api.com/{page}/{user} for the page and user that it needs data for
The backend receives the request and executes a pre-written prepared sql statement, simply passing in the user's name (each page returns the same amount/type of data, the only difference is what user's data we need to get)
The backend converts the result into json and passes it to the frontend
The mysql query ends up looking like SELECT * FROM ... WHERE user = :user for each page. Because it's essentially the same query being run over and over, is there any way to optimize this for performance using the various features of MySQL?
Views are syntactic sugar -- no performance gain.
Stored procedures are handy when you can bundle several things together. However, you can do similar stuff with an application subroutine. One difference is that the SP is all performed on the server, thereby possibly avoiding some network lag between the client and server.
Within an SP, there is PREPARE and EXECUTE. This provides only a small performance improvement.
The best help (in your one example) is to have INDEX(user) on that table.
Will you be performing a query more than a thousand times a second? If so, we need to dig deeper into more of the moving parts of the app.
"Premature optimization" comes to mind for the simple example given.
Related
My situation:
MySQL 5.5, but possible to migrate to 5.7
Legacy app is executing single MySQL query to get some data (1-10 rows, 20 columns)
Query can be modified via application configuration
Query is very complex SELECT with multiple JOINS and conditions, it's about 20KB of code
Query is well profiled, index usage fine-tuned, I spent much time on this and se no room for improvement without splitting to smaller queries
With traditional app I would split this large query to several smaller and use caching to avoid many JOINS, but my legacy app does not allow to do that. I can use only one query to return results
My plan to improve performance is:
Reduce parsing time. Parsing 20KB of SQL on every request, while only parameters values are changed seems ineffective
I'd like to turn this query into prepared statement and only fill placeholders with data
Query will be parsed once and executed multiple times, should be much faster
Problems/questions:
First of all: does above solution make sense?
MySQL prepared statements seem to be session related. I can't use that since I cannot execute any additional code ("init code") to create statements for each session
Other solution I see is to use prepared statement generated inside procedure or function. But examples I saw rely on dynamically generating queries using CONCAT() and making prepared statement executed locally inside of procedure. It seems that this kind of statements will be prepared every procedure call, so it will not save any processing time
Is there any way to declare server-wide and not session related prepared statement in MySQL? So they will survive application restart and server restart?
If not, is it possible to cache prepared statements declared in functions/procedures?
I think the following will achieve your goal...
Put the monster in a Stored Routine.
Arrange to always execute that Stored Routine from the same connection. (This may involve restructuring your client and/or inserting a "web service" in the middle.)
The logic here is that Stored Routines are compiled once per connection. I don't know whether that includes caching the "prepare". Nor do I know whether you should leave the query naked, or artificially prepare & execute.
Suggest you try some timings, plus try some profiling. The latter may give you clues into what I am uncertain about.
I was wondering when we call the perl dbi apis to query a database, are all the results return? Or do we get partially the result set and as we iterating we retrieve more and more rows from the database.
The reason I am asking is that I notice the following in a perl script.
I did a query to a database which returns a really large number of records. After getting this records I did a for loop over the results and created a hash of this data.
What I noticed is that the actual query from the database return in a reasonable amount of time (the results were a lot) but the big delay was looping over the data to create the hash.
I don't understand this. I would expect that the query would be the slow part since the for loop and the construction of the hash would be in-memory and would be cheap.
Any explanation/idea why this happens? Am I misunderstanding something basic here?
Update
I understand that MySQL caches data so when I run the same query multiple times it would be faster the second time and on. But still I would not expect the for loop over the data set in memory to be of the same (and more) time duration as the query to the MySQL DB.
Assuming you are using DBD::mysql, the default is to pull all the results from the server at once and store them in memory. This avoids tying up the server's resources and works fine for the majority of result sets as RAM is usually plentiful.
That answers your original question, but if you would like more assistance, I suggest pasting code - it's possible your hash building code is doing something wrong, or unnecessary queries are being made. See also Speeding up the DBI for tips on efficient use of the DBI API, and how to profile what DBI is doing.
I have a mysql query that is taking 8 seconds to execute/fetch (in workbench).
I won't go into the details of why it may be slow (I think GROUPBY isnt helping though).
What I really want to know is, how I can basically cache it to work more quickly because the tables only change like 5-10 times/hr, while users access the site 1000s times/hour.
Is there a way to just have the results regenerated/cached when the db changes so results are not constantly regenerated?
I'm quite new to sql so any basic thought may go a long way.
I am not familiar with such a caching facility in MySQL. There are alternatives.
One mechanism would be to use application level caching. The application would store the previous result and use that if possible. Note this wouldn't really work well for multiple users.
What you might want to do is store the report in a separate table. Then you can run that every five minutes or so. This would be a simple mechanism using a job scheduler to run the job.
A variation on this would be to have a stored procedure that first checks if the data has changed. If the underlying data has changed, then the stored procedure would regenerate the report table. When the stored procedure is done, the report table would be up-to-date.
An alternative would be to use triggers, whenever the underlying data changes. The trigger could run the query, storing the results in a table (as above). Alternatively, the trigger could just update the rows in the report that would have changed (harder, because it involves understanding the business logic behind the report).
All of these require some change to the application. If your application query is stored in a view (something like vw_FetchReport1) then the change is trivial and all on the server side. If the query is embedded in the application, then you need to replace it with something else. I strongly advocate using views (or in other databases user defined functions or stored procedures) for database access. This defines the API for the database application and greatly facilitates changes such as the ones described here.
EDIT: (in response to comment)
More information about scheduling jobs in MySQL is here. I would expect the SQL code to be something like:
truncate table ReportTable;
insert into ReportTable
select * from <ReportQuery>;
(In practice, you would include column lists in the select and insert statements.)
A simple solution that can be used to speed-up the response time for long running queries is to periodically generate summarized tables, based on underlying data refreshing or business needs.
For example, if your business don't care about sub-minute "accuracy", you can run the process once each minute and make your user interface to query this calculated table, instead of summarizing raw data online.
Straight to the Qeustion ->.
The problem : To do async bulk inserts (not necessary bulk, if MySql can Handle it) using Node.js (coming form a .NET and PHP background)
Example :
Assume i have 40(adjustable) functions doing some work(async) and each adding a record in the Table after its single iteration, now it is very probable that at the same time more than one function makes an insertion call. Can MySql handle it that ways directly?, considering there is going to be an Auto-update field.
In C#(.NET) i would have used a dataTable to contain all the rows from each function and in the end bulk-insert the dataTable into the database Table. and launch many threads for each function.
What approach will you suggest in this case,
Shall the approach change in case i need to handle 10,000 or 4 million rows per table?
ALso The DB schema is not going to change, will MongoDB be a better choice for this?
I am new to Node, NoSql and in the noob learning phase at the moment. So if you can provide some explanation to your answer, it would be awesome.
Thanks.
EDIT :
Answer : Neither MySql or MongoDB support any sort of Bulk insert, under the hood it is just a foreach loop.
Both of them are capable of handling a large number of connections simultanously, the performance will largely depend on you requirement and production environment.
1) in MySql queries are executed sequentially per connection. If you are using one connection, your 40~ functions will result in 40 queries enqueued (via explicit queue in mysql library, your code or system queue based on syncronisation primitives), not necessarily in the same order you started 40 functions. MySQL won't have any race conditions problems with auto-update fields in that case
2) if you really want to execute 40 queries in parallel you need to open 40 connections to MySQL (which is not a good idea from performance point of view, but again, Mysql is designed to handle auto-increments correctly for multiple clients)
3) There is no special bulk insert command in the Mysql protocol on the wire level, any library exposing bulk insert api in fact just doing long 'insert ... values' query.
Is it possible to fetch data from Redis from within MySQL (using a native function or some other mechanism)? I would like to be able to use this information in ORDER BY statements for paging with LIMIT. Otherwise I will have to fetch all the data from MySQL, fetch additional data for each row from Redis, sort in my application and keep the page I need.
It would be much more efficient if MySQL could say call a function for every row to get data from Redis, do the sorting and only send me the page I need.
Even if this is possible (with open source everything is technically possible), it's unlikely to improve performance much over the cleaner approach of sorting within your app. If your data set is small, returning everything is not a problem. If your data set is large, you probably need to be sorting by an indexed column to get decent performance out of sql, and you can't index a function.
Also, if the result set isn't huge the dominant performance issue is usually latency rather than processing or data transfer. One query from sql and one mget from redis should be reasonably quick.
If you really want just one query when a page is viewed you will need to have both record and sorting data in one place - either add the data from redis as a column in sql or cache your queries in redis lists.