Which would be more efficient, having each user create a database connection, or caching? - html

I'm not sure if caching would be the correct term for this but my objective is to build a website that will be displaying data from my database.
My problem: There is a high probability of a lot of traffic and all data is contained in the database.
My hypothesized solution: Would it be faster if I created a separate program (in java for example) to connect to the database every couple of seconds and update the html files (where the data is displayed) with the new data? (this would also increase security as users will never be connecting to the database) or should I just have each user create a connection to MySQL (using php) and get the data?
If you've had any experiences in a similar situation please share, and I'm sorry if I didn't word the title correctly, this is a pretty specific question and I'm not even sure if I explained myself clearly.

Here are some thoughts for you to think about.
First, I do not recommend you create files but trust MySQL. However, work on configuring your environment to support your traffic/application.
You should understand your data a little more (How much is the data in your tables change? What kind of queries are you running against the data. Are your queries optimized?)
Make sure your tables are optimized and indexed correctly. Make sure all your query run fast (nothing causing a long row locks.)
If your tables are not being updated very often, you should consider using MySQL cache as this will reduce your IO and increase the query speed. (BUT wait! If your table is being updated all the time this will kill your server performance big time)
Your query cache is set to "ON". Based on my experience this is always bad idea unless your data does not change on all your tables. When you have it set to "ON" MySQL will cache every query. Then as soon as they data in the table changes, MySQL will have to clear the cached query "it is going to work harder while clearing up cache which will give you bad performance." I like to keep it set to "ON DEMAND"
from there you can control which query should be cache and which should not using SQL_CACHE and SQL_NO_CACHE
Another thing you want to review is your server configuration and specs.
How much physical RAM does your server have?
What types of Hard Drives are you using? SSD is not at what speed do they rotate? perhaps 15k?
What OS are you running MySQL on?
How is the RAID setup on your hard drives? "RAID 10 or RAID 50" will help you out a lot here.
Your processor speed will make a big different.
If you are not using MySQL 5.6.20+ you should consider upgrading as MySQL have been improved to help you even more.
How much RAM does your server have? is your innodb_log_buffer_size set to 75% of your total physical RAM? Are you using innodb table?
You can also use MySQL replication to increase the read sources of the data. So you have multiple servers with the same data and you can point half of your traffic to read from server A and the other half from Server B. so the same work will be handled by multiple server.
Here is one argument for you to think about: Facebook uses MySQL and have millions of hits per seconds but they are up 100% of the time. True they have trillion dollar budget and their network is huge but the idea here is to trust MySQL to get the job done.

Related

Reduce database writes with memached

I would like to convert my stats tracking system not to write to the database directly, as we're hitting bottlenecks.
We're currently using memcached for certain aspects of the site, and I wanted to use it for storing stats and committing them to mysql DB periodically.
The issue lies however in the number of items (which is in the millions) for which potentially there could be stats collected between the cronjob runs that would commit them into the database. Other than running a SELECT * FROM data and checking for existence of every single memcache key, and then updating the table.... is there any other way to do this?
(I'm not saying below is gospel, this is just my gut feeling. As said later on, I don't have the specifics of your system :) And obviously no offence meant etc :) )
I would advice against using memcached for this. Memcached is build te quickly retrieve values that you've gotten before, not to store values. The big difference is that is your cache is getting full, you'll loose your data.
Normally, you'd just have no data in your cache, and recollect the data from the source, which is impossible in this case. That alone would be a reason for me to try an dissuade you from this.
Now you say the major problem is the mysql connection limit you are hitting. If you do simple stuff (like what we talked about in the comments: the insert delayed), it's just a case of increasing the limit. You should probably have enough power to have your scripts/users go to the database once and say "this should eventually be added", and then go away. If your users can't even open 1 connection for that, there's a serious resource problem you probably won't fix by adding extra layers of cache?
Obviously hard to say without any specs of the system, soft and hardware, but my suggestion would be to see if you can just let them open their connections by increasing the limit, and fiddle with the server variables a bit, instead of monkey-patching your system by using a memcached as an in-between layer.
I had a similar issue with statistic data. But please don't use memcached for it. You can't be sure that ALL your items will moved to DB. You can loose data and/or double process data.
You should analyse your bottleneck against how much data you are writing/reading and how many connections you need. And than switch to something scalable like Hadoop, Cassandra, Scripe and other systems.
You need to provide additional information on the platform that you are running: O/S, database (version), storage engine, RAM, CPU (if possible)?
Are you inserting into a single table or more than one table?
Can you disable the indexes on the tables you are inserting into as this slows down the insert functions.
Are you running any triggers or stored procedures to compute values as you insert the raw data?

sqlite or mysql for large datasets

I am working with large datasets (10s of millions of records, at times, 100s of millions), and want to use a database program that links well with R. I am trying to decide between mysql and sqlite. The data is static, but there are lot of queries that I need to do.
In this link to sqlite help, it states that:
"With the default page size of 1024 bytes, an SQLite database is limited in size to 2 terabytes (241 bytes). And even if it could handle larger databases, SQLite stores the entire database in a single disk file and many filesystems limit the maximum size of files to something less than this. So if you are contemplating databases of this magnitude, you would do well to consider using a client/server database engine that spreads its content across multiple disk files, and perhaps across multiple volumes."
I'm not sure what this means. When I have experimented with mysql and sqlite, it seems that mysql is faster, but I haven't constructed very rigorous speed tests. I'm wondering if mysql is a better choice for me than sqlite due to the size of my dataset. The description above seems to suggest that this might be the case, but my data is no where near 2TB.
I'd appreciate any insights into understanding this constraint of maximum file size from the filesystem and how this could affect speed for indexing tables and running queries. This could really help me in my decision of which database to use for my analysis.
The SQLite database engine stores the entire database into a single file. This may not be very efficient for incredibly large files (SQLite's limit is 2TB, as you've found in the help). In addition, SQLite is limited to one user at a time. If your application is web based or might end up being multi-threaded (like an AsyncTask on Android), mysql is probably the way to go.
Personally, since you've done tests and mysql is faster, I'd just go with mysql. It will be more scalable going into the future and will allow you to do more.
I'm not sure what this means. When I have experimented with mysql and sqlite, it seems that mysql is faster, but I haven't constructed very rigorous speed tests.
The short short version is:
If your app needs to fit on a phone or some other embedded system, use SQLite. That's what it was designed for.
If your app might ever need more than one concurrent connection, do not use SQLite. Use PostgreSQL, MySQL with InnoDB, etc.
It seems that (in R, at least), that SQLite is awesome for ad hoc analysis. With the RSQLite or sqldf packages it is really easy to load data and get started. But for data that you'll use over and over again, it seems to me that MySQL (or SQL Server) is the way to go because it offers a lot more features in terms of modifying your database (e.g., adding or changing keys).
SQL if you are mainly using this as a web service.
SQLite, if you want it to able to function offline.
SQLite generally is much much faster, as majority (or ALL) of data/indexes will be cached in memory. However, in the case of SQLite. If the data is split up across multiple tables, or even multiple SQLite database files, from my experience so far. For even millions of records (i yet to have 100's of millions though), it is far more effective then SQL (compensate the latency / etc). However that is when the records are split apart in differant tables, and queries are specific to such tables (dun query all tables).
An example would be a item database used in a simple game. While this may not sound much, a UID would be issued for even variations. So the generator soon quickly work out to more then a million set of 'stats' with variations. However this was mainly due to each 1000 sets of records being split among different tables. (as we mainly pull records via its UID). Though the performance of splitting was not properly measured. We were getting queries that were easily 10 times faster then SQL (Mainly due to network latency).
Amusingly though, we ended up reducing the database to a few 1000 entries, having item [pre-fix] / [suf-fix] determine the variations. (Like diablo, only that it was hidden). Which proved to be much faster at the end of the day.
On a side note though, my case was mainly due to the queries being lined up one after another (waiting for the one before it). If however, you are able to do multiple connections / queries to the server at the same time. The performance drop in SQL, is more then compensated, from your client side. Assuming this queries do not branch / interact with one another (eg. if got result query this, else that)

Using a MySQL database is slow

We have a dedicated MySQL server, with about 2000 small databases on it. (It's a Drupal multi-site install - each database is one site).
When you load each site for the first time in a while, it can take up to 30s to return the first page. After that, the pages return at an acceptable speed. I've traced this through the stack to MySQL. Also, when you connect with the command line mysql client, connection is fast, then "use dbname" is slow, and then queries are fast.
My hunch is that this is due to the server not being configured correctly, and the unused dbs falling out of a cache, or something like that, but I'm not sure which cache or setting applies in this case.
One thing I have tried is the innodb_buffer_pool size. This was set to the default 8M. I tried raising it to 512MB (The machine has ~ 2GB of RAM, and the additional RAM was available) as the reading I did indicated that more should give better performance, but this made the system run slower, so it's back at 8MB now.
Thanks for reading.
With 2000 databases you should adjust the table cache setting. You certainly have a lot of cache miss in this cache.
Try using mysqltunner and/or tunning_primer.sh to get other informations on potential issues with your settings.
Now drupal makes Database intensive work, check you Drupal installations, you are maybe generating a lot (too much) of requests.
About the innodb_buffer_pool_size, you certainly have a lot of pagination cache miss with a little buffer (8Mb). The ideal size is when all your data and indexes size can fit in this buffer, and with 2000 databases... well it is quite certainly a very little size but it will be hard for you to grow. Tunning a MySQL server is hard, if MySQL takes too much RAM your apache won't get enough RAM.
Solutions are:
check that you do not make the connexion with DNS names but with IP
(in case of)
buy more RAM
set MySQL on a separate server
adjust your settings
For Drupal, try to set the session not in the database but in memcache (you'll need RAM for that but it will be better for MySQL), modules for that are available. If you have Drupal 7 you can even try to set some of the cache tables in memcache instead of MySQL (do not do that with big cache tables).
edit: last thing, I hope you have not modified Drupal to use persistent database connexions, some modules allows that (or having an old drupal 5 which try to do it automatically). With 2000 database you would kill your server. Try to check mysql error log for "too many connections" errors.
Hello Rupertj as I read you are using tables type innodb, right?
innodb table is a bit slower than myisam tables, but I don't think it is a major problem, as you told, you are using drupal system, is that a kind of mult-sites, like a word-press system?
If yes, sorry about but this kind of systems, each time you install a plugin or something else, it grow your database in tables and of course in datas.. and it can change into something very very much slow. I have experiencied by myself not using Drupal but using Word-press blog system, and it was a nightmare to me and my friends..
Since then, I have abandoned the project... and my only advice to you is, don't install a lot of plugins in your drupal system.
I hope this advice help you, because it help me a lot in word-press.
This sounds like a caching issue in Drupal, not MYSQL. It seems there are a few very heavy queries, or many, many small ones, or both, that hammer the database-server. Once that is done, Drupal caches that in several caching layers. After which only one (or very few) queries are all that is needed to build up a page. Slow in the beginning, fast after that.
You will have to profile it to determine what the cause is, but the table cache seems like a likely suspect.
However, you should also be mindful of persistent connections - which should absolutely definitely, always be turned off (yes, for everyone, not just you). Apache / PHP persistent connections are a pessimisation that you and everyone else can generally do without.

How many databases can MySQL handle?

My MySql server currently has 235 databases. Should I worry?
They all have same structure with MyISAM tables.
The hardware is a virtual machine with 2 GB RAM running on a Quad-Core AMD Opteron 2.2GHz.
Recently cPanel sent me an email saying that MySql has failed and a restart has been made.
New databases are being expected to be created and I wonder if I should add more memory or if I should simply add another virtual machine.
The "databases" in mysql are really catalogues, is has no effect on its limits whether you put all the tables in one or each in its own.
The main problem is the table cache. Without tuning it, you're going to have the default table cache (=64 typically), which means you will be closing a table every time you open one. This is incredibly bad.
Except in MyISAM, it's even worse, because closing a table throws its key blocks out of the key cache, which means subsequent index lookups or scans will be reading actual blocks from disc, which is horrible and slow and really needs to be avoided.
My advice is:
If possible, immediately increase the table cache to > the total number of tables
Monitor the global status variable Opened_Tables in your monitoring; if it increases rapidly, this is bad.
Carry out performance and robustness testing on your the same hardware in a non-production environment (if you are not doing so already).
(reposting my comment for better visibility)
Thank you all for your comments. The system is something similar with Google Analytics. Users website's visits are being logged into a "master" table. A native application is monitoring the master table and processes the registered visits and writes them to users' database. Each user has its own DB. This has been decided for sharding. Various reports and statistics are being run for each user. And it is faster if it only runs on specific DB (less data) I know this is not the best setup. But we have to deal with it for a while.
I dont believe there is a hard limit, the only thing that's really limiting you will be your hardware and the traffic these databases will be getting.
You seem to have very little memory, which probably means you dont have massive numbers of connections...
You should start by profiling usage for each database (or set of databases, depending on how they are used of course).
My suggestion - MySQL (or any database server for that matter) could use more memory. You can never have enough.
You are doing it wrong.
Comment with some specifics about your databases, and we can probably fill you in on where your design went wrong.

Best storage engine for constantly changing data

I currently have an application that is using 130 MySQL table all with MyISAM storage engine. Every table has multiple queries every second including select/insert/update/delete queries so the data and the indexes are constantly changing.
The problem I am facing is that the hard drive is unable to cope, with waiting times up to 6+ seconds for I/O access with so many read/writes being done by MySQL.
I was thinking of changing to just 1 table and making it memory based. I've never used a memory table for something with so many queries though, so I am wondering if anyone can give me any feedback on whether it would be the right thing to do?
One possibility is that there may be other issues causing performance problems - 6 seconds seems excessive for CRUD operations, even on a complex database. Bear in mind that (back in the day) ArsDigita could handle 30 hits per second on a two-way Sun Ultra 2 (IIRC) with fairly modest disk configuration. A modern low-mid range server with a sensible disk layout and appropriate tuning should be able to cope with quite a substantial workload.
Are you missing an index? - check the query plans of the slow queries for table scans where they shouldn't be.
What is the disk layout on the server? - do you need to upgrade your hardware or fix some disk configuration issues (e.g. not enough disks, logs on the same volume as data).
As the other poster suggests, you might want to use InnoDB on the heavily written tables.
Check the setup for memory usage on the database server. You may want to configure more cache.
Edit: Database logs should live on quiet disks of their own. They use a sequential access pattern with many small sequential writes. Where they share disks with a random access work load like data files the random disk access creates a big system performance bottleneck on the logs. Note that this is write traffic that needs to be completed (i.e. written to physical disk), so caching does not help with this.
I've now changed to a MEMORY table and everything is much better. In fact I now have extra spare resources on the server allowing for further expansion of operations.
Is there a specific reason you aren't using innodb? It may yield better performance due to caching and a different concurrency model. It likely will require more tuning, but may yield much better results.
should-you-move-from-myisam-to-innodb
I think that that your database structure is very wrong and needs to be optimised, has nothing to do with the storage