MySQL Server Runs out of Disk Space? - mysql

Our company's web application stores a ton of data points on thousands of visitors a day, and we are anticipating the hard disks will fill up soon. Our server can not support more hard drives, and we are not interested in little tricks to free up some space to buy us a few hours worth of space.
How can we solve this issue? The database is huge, over 200GB, and our website needs to be available, so I don't believe copying it and moving it to a new, larger server is a good option for us. Furthermore, what happens when THAT server runs out of disk space?
What do large scale web sites normally do to remedy this issue?
Thanks!

You may want to investigate separating into multiple database servers as "shards. You will likely have to add some logic to your application to know where to find a set of data and how to join queries with data that originates from multiple shards. There are third-party applications that can assist you with this process.

Related

Wordpress site malfunctions when server disk is full

My company has some private servers for web development.
The server has storage space of 600GB, but at times it runs out of disk space.
At such situations one of our websites using wordpress malfunctions and some of its features can't be accessed.
Can any one tell what is the possible cause for this & how it can be prevented?
Thanks in advance....
At such situations one of our websites using wordpress malfunctions and some of its features can't be accessed.
That's what happens when the disk is full - nothing we can do about that. Temporary files can no longer be created, including session files that are necessary to track whether a user is logged in or not. MySQL may need to write temporary data even when doing only a SELECT. The web server may need some swap space on the hard disk in peak times. etc. etc.
I'm more concerned with WP malfunctioning rather than "why server disk is full".
That won't work. You need to fix the source of the problem, and that is the server's disk filling up. You can't make Wordpress work despite the disk being full.
It could be down to so many things, probably not at WordPress level. With no disk space apache can't write it's access/error logs, sessions can't be written by PHP, MySQL can't write and so on.
The only answer applicable to you at this stage is to stop maxing out the disk space.

is mysql capable of managing the data for a site which holds lots of data

is mysql capable of managing the data for a site which holds lots of data (say with hundreds of millions of users)? which database would be the most capable/beneficial?
Wikipedia is based on MySQL. I don't think it has 100M users, but it must be close by now.
No database will handle hundreds of millions of users unless you know how to set it up properly. No single server could handle that kind of traffic, so you need to know how to setup replication and load balancing. Once you reach a certain level, there is no out of the box solution, only tools you can use. MySQL being a very capable tool.
There are a couple of answers to this.
Yes, MySQL can store hundreds of millions of records; you need to know what you're doing, have a decent database schema, pretty robust hardware, but you're not pushing the limits.
When you talk about "hundreds of millions of users", you're talking about a site along the lines of Wikipedia/Facebook/Google/Amazon in scale. You need a custom, highly cached, distributed architecture to run a site at that scale - and the traditional database driven application architecture will almost certainly not be enough. You could still store your data in MySQL, but you'd need a whole bunch of additional components to make it all work - and without knowing more about the application, nobody could tell you what that might be. At that scale, none of the commonly used databases would suffice, so MySQL is no better or worse than any of the other options...
Your question is really irrelevant, because creating a product or service that hundreds of millions of customers actually want is a much bigger and more difficult challenge than choosing a database engine.
If you're starting a business from nothing, pick a technical platform you already know and go with it: productivity and quick implementation will be more important than being scalable to a level you may never reach anyway.
If you do eventually become successful enough to have to deal with hundreds of millions of customers, then you'll certainly be able to raise the cash to buy whatever expertise and hardware you need.

Cloud Database Service Latency/Performance

I am running a heavy traffic site and our server is beginning to get to its limits, at the moment the entire LAMP stack is on one box (not ideal).
I would like to move the database onto it's own box or onto a cloud service, but from my previous experience moving the database off the same box as the webserver increases the latency of reads quite dramatically slowing down the site.
Is using a cloud service for this going to overcome this problem, because as far as I can tell its essentially the same situation (as moving it onto a separate box in my control)? In which case why is there so much popularity around cloud based database services at the moment?
Are cloud based database services so quick that the latency of reads is so low that its almost like having it on the same box in the same datacentre?
Using a cloud service just for your database won't help your situation.
If you only move the database, you're physically placing it in a remote location - which will always increase latencies, no matter how powerful the hardware serving the content.
I would suggest that you will see a benefit in hosting your database on a separate machines from your web server, so long as they are physically next to each other sharing a dedicated network (as already suggested).
If you wanted to explore the benefits of cloud services, I would suggest only doing so if you can move both database and web server together. Furthermore, it's really only of benefit if you explore load balancing across multiple web-servers and/or replicated databases. (The ability to scale dynamically is a major benefit of cloud based platforms).
Clouds are about paying someone else to manage the infrastructure so you don't have to. They also come with some nice benefits about being able to rapidly acquire infrastructure since you don't have to wait for physical machines to be landed you can simply tap into the "cloud's" unused capacity. Sure people build features on top of this infrastructure to make it easier to scale (this is usually programming against a certain model).
If you are thinking about a cloud when are you planning on moving to 10 servers...or 100? Do you deal with traffic that comes in large bursts where the peaks in your traffic are very high?
Since you are talking about moving to a second box I don't think you need to have the cloud discussion yet. Just add a database server and use caching like e4c5 recommended.
There will be increased latency going across the network, but it shouldn't be that noticeable. Gigabit ethernet is pretty fast. When you have tried splitting the boxes, how did you access the other box? You should be using a local, internal IP address (i.e. 192.168.#.#). If you are not, then your requests may get routed over the internet, even though the boxes are physically next to each other.
Moving to a cloud won't solve your problems if the servers aren't networked properly.

Can splitting .MDB files into segments help with stability?

Is this a realistic solution to the problems associated with larger .mdb files:
split the large .mdb file into
smaller .mdb files
have one 'central' .mdb containing
links to the tables in the smaller
.mdb files
How easy would it be to make this change to an .mdb backed VB application?
Could the changes to the database be done so that there are no changes required to the front-end application?
Edit Start
The short answer is "No, it won't solve the problems of a large database."
You might be able to overcome the DB size limitation (~2GB) by using this trick, but I've never tested it.
Typically, with large MS Access databases, you run into problems with speed and data corruption.
Speed
Is it going to help with speed? You still have the same amount of data to query and search through, and the same algorithm. So all you are doing is adding the overhead of having to open up multiple files per query. So I would expect it to be slower.
You might be able to speed it up by reducing the time time that it takes to ge tthe information off of disk. You can do this in a few ways:
faster drives
put the MDB on a RAID (anecdotally RAID-1,0 may be faster)
split the MDB up (as you suggest) into multiple MDBs, and put them on separate drives (maybe even separate controllers).
(how well this would work in practice vs. theory, I can't tell you - if I was doing that much work, I'd still choose to switch DB engines)
Data Corruption
MS Access has a well deserved reputation for data corruption. To be fair, I haven't had it happen to me fore some time. This may be because I've learned not to use it for anything big; or it may be because MS has put a lot of work in trying to solve these problems; or more likely a combination of both.
The prime culprits in data corruption are:
Hardware: e.g., cosmic rays, electrical interference, iffy drives, iffy memory and iffy CPUs - I suspect MS Access does not have as good error handling/correcting as other Databases do.
Networks: lots of collisions on a saturated network can confuse MS Access and convince it to scramble important records; as can sub-optimally implemented network protocols. TCP/IP is good, but it's not invincible.
Software: As I said, MS has done a lot of work on MS Access over the years, if you are not up to date on your patches (MS Office and OS), get up to date. Problems typically happen when you hit extremes like the 2GB limit (some bugs are hard to test and won't manifest them selves except at the edge cases, which makes the less likely to have been seen or corrected, unless reported by a motivated user to MS).
All this is exacerbated with larger databases, because larger databases typically have more users and more workstations accessing it. Altogether the larger database and number of users multiply to provide more opportunity for corruption to happen.
Edit End
Your best bet would be to switch to something like MS SQL Server. You could start by migrating your data over, and then linking one MDB to to it. You get the stability of SQL server and most (if not all) of your code should still work.
Once you've done that, you can then start migrating your VB app(s) over to us SQL Server instead.
If you have more data than fits in a single MDB then you should get a different database engine.
One main issue that you should consider is that you can't enforce referential integrity between tables stored in different MDBs. That should be a show-stopper for any actual database.
If it's not, then you probably don't have a proper schema designed in the first place.
For reasons more adequately explained by CodeSlave the answer is No and you should switch to a proper relational database.
I'd like to add that this does not have to be SQL Server. Quite possibly the reason why you are reluctant to do this is one of cost, SQL Server being quite expensive to obtain and deploy if you are not in an educational or charitable organisation (when it's remarkably cheap and then usually a complete no-brainer).
I've recently had extremely good results moving an Access system from MDB to MySQL. At least 95% of the code functioned without modification, and of the remaining 5% most was straightforward with only a few limited areas where significant effort was required. If you have sloppy code (not closing connections or releasing objects) then you'll need to fix these, but generally I was remarkably surprised how painless this approach was. Certainly I would highly recommend that if the reason you are reluctant to move to a database backend is one of cost then you should not attempt to manipulate .mdb files and go instead for the more robust database solution.
Hmm well if the data is going through this central DB then there is still going to be a bottle neck in there. The only reason I can think why you would do this is to get around the size limit of an access mdb file.
Having said that if the business functions can be split off in the separate applications then that might be a good option with a central DB containing all the linked tables for reporting purposes. I have used this before to good effect

Scaling up from 1 Web Server + 1 DB Server

We are Web 2.0 company that built a hosted Content Management solution from the ground up using LAMP. In short, people log into our backend to manage their website content and then use our API to extract that content. This API gets plugged into templates that can be hosted anywhere on the interwebs.
Scaling for us has progressed as follows:
Shared hosting (1and1)
Dedicated single server hosting (Rackspace)
1 Web Server, 1 DB Server (Rackspace)
1 Backend Web Server, 1 API Web Server, 1 DB Server
Memcache, caching, caching, caching.
The question is, what's next for us? Every time one of our sites are dugg or mentioned in a popular website, our API server gets crushed with too many connections. Or every time our DB server gets overrun with queries, our Web server requests back up.
This is obviously the 'next problem' for any company like ours and I was wondering if you could point me in some directions.
I am currently attracted to the virtualization solutions (like EC2) but need some pointers on what to consider.
What/where/how to scale is dependent on what your issues are. Since you've been hit a few times, and you know it's the API server, you need to identify what's actually causing the issue.
Is it DB lookup times?
A volume of requests that the web server just can't handle even though they're shortlived?
API requests take too long to process? (independent of DB lookups, e.g., does the code take a bit to run)?
Once you identify WHAT the problem is, you should have a pretty clear picture of what you need to do. If it's just volume of requests, and it's the API server, you just need more web servers (and code changes to allow horizontal scaling) or a beefier web server. If it's API requests taking too long, you're looking at code optimizations. There's never a 1-shot fix when it comes to scalability.
The most common scaling issues have to do with slow (2-3 seconds) execution of the actual code for each request, which in turn leads to more web servers, which leads to more database interactions (for cross-server sessions, etc.) which leads to database performance issues. High performance, server independent code with memcache (I actually prefer a wrapper around memcache so the application doesn't know/care where it gets the data from, just that it gets it and the translation layer handles DB/memcache lookups as well as populating memcache).
Depends really if your bottleneck is reads or writes. Scaling writes is much harder than reads.
It also depends on how much data you have in the database.
If your database is small, but cannot cope with the read load, you can deploy enough ram that it fits in ram. If it still cannot cope, you can add read-replicas, possibly on the same box as your web servers, this will give you good read-scalability - the number of slaves from one MySQL master is quite high and will depend chiefly on the write workload.
If you need to scale writes, that's a totally different game. To do that you'll need to split your data out, either horizontally (partitioning / sharding) or vertically (functional partitioning etc) so that you can spread the workload over several write servers which do not need to do each others' work.
I'm not sure what EC2 can do for you, it essentially offers slow, high latency machines with nonpersistent discs and low IO performance on the end of a more-or-less nonexistent SLA. I guess it might be useful in your case as you can provision them relatively quickly - provided you're just using them as read-replicas and you don't have too much data (remember they have nonpersistent discs and sucky IO)
What is the level of scaling you are looking for? Is it a stop-gap solution e.g. scale vertically? If it is a more strategic scaling project, does your current architecture support scaling horizontally?