I've got a quiz application that is running several queries to a MySQL database. The backend application is running using Java. Every time the app runs a query to the database, there are additional queries that are being executed that I am not specifying within the application. As a result, this is causing a lot of additional overhead to the database, sometimes resulting in an error.
For example, I've got a 'Questions' table that only contains regular characters, such as the below:
The application does a simple SELECT * from Questions to get the list of questions. However, when that is executed, I can see in the database logs that there are 4 additional queries that are also run (the first I assume is the connectivity to the database), which I have not specified. Those are:
Could someone tell me why this is happening? Essentially, every query that is run against the database (specified by the application) renders the same exact additional 3 queries that (to me) are coming out of nowhere.
Related
I have been trying to run a report for my CEO that shows income. Our agency management software uses FoxPro databases (it originally came out in the early '80s, I think). I have linked the .dbf files to an Access database, and I have been setting up queries based on queries to get the information I need on a live basis without having to export the data. The problem that I have run into is that I cleaned up the selection criteria in the first query, but I did not run that query (it takes about ten minutes to run each of these). When I ran the last query (with data based on the first), I still had bad data in that result.
So here's the dumb question: (a) do I need to create a macro that runs the queries (there are four of them) in sequence so that they are all updated each time, (b) is there some better way to do this, and/or (c) does Access automatically run the prior queries when I run the downstream query?
I'm working on a migration from MySQL to Postgres on a large Rails app, most operations are performing at a normal rate. However, we have a particular operation that will generate job records every 30 minutes or so. There are usually about 200 records generated and inserted after which we have separate workers that pick up the jobs and work on them from another server.
Under MySQL it takes about 15 seconds to generate the records, and then another 3 minutes for the worker to perform and write back the results, one at a time (so 200 more updates to the original job records).
Under Postgres it takes around 30 seconds, and then another 7 minutes for the worker to perform and write back the results.
The table being written to has roughly 2 million rows, and 1 sequence column under ID.
I have tried tweaking checkpoint timeouts and sizes with no luck.
The table is heavily indexed and really shouldn't be any different than it was before.
I can't post code samples as its a huge codebase and without posting pages and pages of code it wouldn't make sense.
My question is, can anyone think of why this would possibly be happening? There is nothing in the Postgres log and the process of creating these objects has not changed really. Is there some sort of blocking synchronous write behavior I'm not aware of with Postgres?
I've added all sorts of logging in my code to spot errors or transaction failures but I'm coming up with nothing, it just takes twice as long to run, which doesn't seem correct to me.
The Postgres instance is hosted on AWS RDS on a M3.Medium instance type.
We also use New Relic, and it's showing nothing of interest here, which is surprising
Why does your job queue contain 2 million rows? Are they all live or are have not moved them to an archive table to keep your reporting more simple?
Have you used EXPLAIN on your SQL from a psql prompt or your preferred SQL IDE/tool?
Postgres is a completely different RDBMS then MySQL. It allocates space differently and manipulates space differently so may need to be indexed differently.
Additionally there's a tool called pgtune that will suggest configuration changes.
edit: 2014-08-13
Also, rails comes with a profiler that might add some insight. Here's a StackOverflow thread about rails profiling.
You also want to watch your DB server at the disk IO level. Does your job fulfillment to a large number of updates? Postgres created new rows when you update a existing rows, and marks the old rows as available, instead of just overwriting the existing row. So you may be seeing a lot more IO as a result of your RDBMS switch.
The database
I'm working with a database that has pretty big tables and it's causing me problems. One in particular has more than 120k lines.
What I'm doing with it
I'm looping over this table in a MakeAverage.php file to merge them into about 1k lines in a new table in my database.
What doesn't work
Laravel doesn't allow me to process it all at once even if I try to DB::disableQueryLog() or or a take(1000) limit for example. It returns me a blank page every time even if my error reporting was enabled (kind of like this). Also, I had no Laravel log file for this. I had to look in my php_error.log (I'm using MAMP) to realize that it was actually a memory_limit problem.
What I did
I increased the amount of memory before executing my code by using ini_set('memory_limit', '512M'). (It's bad practice, I should do it in php.ini.)
What happened?
It worked! However, Laravel thrown me an error because the page didn't finished to load after 30s because of the large amount of data.
What I will do
After spending some time on this issue and looking at other people having similar problems (see: Laravel forum, 19453595, 18775510 and 12443321), I thought that maybe PHP isn't the solution.
Since, I'm only creating a Table B from the average values of the Table A, I believe that a SQL is going to fits best my needs as it's clearly faster than PHP for that type of operation (see: 6449072) and I can use functions such as SUM, AVERAGE, COUNT and GROUP_BY (Reference).
I have a mysql query that is taking 8 seconds to execute/fetch (in workbench).
I won't go into the details of why it may be slow (I think GROUPBY isnt helping though).
What I really want to know is, how I can basically cache it to work more quickly because the tables only change like 5-10 times/hr, while users access the site 1000s times/hour.
Is there a way to just have the results regenerated/cached when the db changes so results are not constantly regenerated?
I'm quite new to sql so any basic thought may go a long way.
I am not familiar with such a caching facility in MySQL. There are alternatives.
One mechanism would be to use application level caching. The application would store the previous result and use that if possible. Note this wouldn't really work well for multiple users.
What you might want to do is store the report in a separate table. Then you can run that every five minutes or so. This would be a simple mechanism using a job scheduler to run the job.
A variation on this would be to have a stored procedure that first checks if the data has changed. If the underlying data has changed, then the stored procedure would regenerate the report table. When the stored procedure is done, the report table would be up-to-date.
An alternative would be to use triggers, whenever the underlying data changes. The trigger could run the query, storing the results in a table (as above). Alternatively, the trigger could just update the rows in the report that would have changed (harder, because it involves understanding the business logic behind the report).
All of these require some change to the application. If your application query is stored in a view (something like vw_FetchReport1) then the change is trivial and all on the server side. If the query is embedded in the application, then you need to replace it with something else. I strongly advocate using views (or in other databases user defined functions or stored procedures) for database access. This defines the API for the database application and greatly facilitates changes such as the ones described here.
EDIT: (in response to comment)
More information about scheduling jobs in MySQL is here. I would expect the SQL code to be something like:
truncate table ReportTable;
insert into ReportTable
select * from <ReportQuery>;
(In practice, you would include column lists in the select and insert statements.)
A simple solution that can be used to speed-up the response time for long running queries is to periodically generate summarized tables, based on underlying data refreshing or business needs.
For example, if your business don't care about sub-minute "accuracy", you can run the process once each minute and make your user interface to query this calculated table, instead of summarizing raw data online.
I have a special security need with mysql. I need to forcibly restrict the number of rows a query returns, issuing an error if the returned rows will be over, say a million rows. Here is the setup -
Need - The data has 100s of millions of rows, and we don't want the client to run down the server or do a complete extraction (They would never need all the lines, just aggregations) The idea is, if they need it, they run into an error or the barrier, and come to us with the reason explaining why they need to pull so many rows with a query.
System - Clients can use any query tool, so we have no control over what query is generated. Thus, we cannot use Limit x which seems to be the solution suggested everywhere.
I have tried searching for a solution, and for now it seems that the only way to do it is at the application level (which we do not own).
Is there any way to achieve this?
Setting
1- We need to have SSL enabled.
2- MySQL 5.5
Thanks!
J
It seems like you might be able to get close with MySQL Proxy.
https://launchpad.net/mysql-proxy
See this page for manipulating results. Not sure if it does a buffered or unbuffered read, or if you can cancel the reading of results or not...
http://dev.mysql.com/doc/refman/5.1/en/mysql-proxy-scripting-read-query-result.html
It's open source, so you might be able to hire someone to tweak it if needed as well.
There may be other ways to restrict the overloading of your database server. Take a look at this link for more info:
MySQL - can I limit the maximum time allowed for a query to run?