I was asked to do some PHP scripts on MySQL DB to show some data when I noticed the strange design they had.
They want to perform a study that would require collecting up to 2000 records per user and they are automatically creating a new table for each user that registers. It's a pilot study at this stage so they have around 30 tables but they should have 3000 users for the real study.
I wanted to suggest gathering all of them in a single table but since there might be around 1500 INSERTs per minute to that database during the study period, I wanted to ask this question here first. Will that cause table locks in MySQL?
So, Is it one table with 1500 INSERTs per minute and a maximum size of 6,000,000 records or 3000 tables with 30 INSERTs per minute and a maximum size of 2000 records. I would like to suggest the first option but I want to be sure that it will not cause any issues.
I read that InnoDB has row-level locks. So, will that have a better performance combined with the one table option?
This is a huge loaded question. In my experience performance is not really measured accurately by table size alone. It comes down to design. Do you have the primary keys and indexes in place? Is it over indexed? That being said, I have also found that almost always one trip to the DB is faster than dozens. How big is the single table (columns)? What kind of data are you saving (larger than 4000K?). It might be that you need to create some prototypes to see what performs best for you. The most I can recommend is that you carefully judge the size of the data you are collecting and allocate accordingly, create indexes (but not too many, don't over index), and test.
Related
I am a making a car tracking system and i want to store data that each car sends after every 5 seconds in a MySql database. Assuming that i have 1000 cars transmitting data to my system after 5 seconds, and the data is stored in one table. At some point i would want to query this table to generate reports for specific vehicle. I am confused between logging all the vehicles data in one table or creating a table for each vehicle (1000 tables). Which is more efficient?
OK 86400 seconds per day / 5 = 17280 records per car and day.
Will result in 17,280,000 records per day. This is not an issue for MYSQL in general.
And a good designed table will be easy to query.
If you go for one table for each car - what is, when there will be 2000 cars in future.
But the question is also: how long do you like to store the data?
It is easy to calculate when your database is 200 GB, 800GB, 2TB,....
One table, not one table per car. A database with 1000 tables will be a dumpster fire when you try to back it up or maintain it.
Keep the rows of that table as short as you possibly can; it will have many records.
Index that table both on timestamp and on (car_id, timestamp) . The second index will allow you to report on individual cars efficiently.
Read https://use-the-index-luke.com/
This is the "tip of the iceberg". There are about 5 threads here and on dba.stackexchange relating to tracking cars/trucks. Here are some further tips.
Keep datatypes as small as possible. Your table(s) will become huge -- threatening to overflow the disk, and slowing down queries due to "bulky rows mean that fewer rows can be cached in RAM".
Do you keep the "same" info for a car that is sitting idle overnight? Think of how much disk space this is taking.
If you are using HDD disks, plain on 100 INSERTs/second before you need to do some redesign of the ingestion process. (1000/sec for SSDs.) There are techniques that can give you 10x, maybe 100x, but you must apply them.
Will you be having several servers collecting the data, then doing simple inserts into the database? My point is that that may be your first bottleneck.
PRIMARY KEY(car_id, ...) so that accessing data for one car is efficient.
Today, you say the data will be kept forever. But have you computed how big your disk will need to be?
One way to shrink the data drastically is to consolidate "old" data into, say, 1-minute intervals after, say, one month. Start thinking about what you want to keep. For example: min/max/avg speed, not just instantaneous speed. Have an extra record when any significant change occurs (engine on; engine off; airbag deployed; etc)
(I probably have more tips.)
We Have a case where we have to record millions of status updates in a day on a table having records up to the tune of 146000000. I am not sure if MySQL is up to it. Here is the complete scenario:
On day one Table will have 1 million records which will eventually grow to up to 150 million in 2 years.
At a given point in time at max 1.4 million records will be live i.e. even after we have 150 million records.
Records should not grow beyond 150 millions as we look to archive data older than 2 years.
The live 1.4 million records will get updates for their statuses which we need to update in the same table. The updates will be up to the tune of 20 million a day on these 1.4 million records.
I would be OK to not have any foreign key constraints on the table if they come in way of so many updates in the table.
We are using MySQL 5.5.
My concern and question is - will MySQL be able to hold upto our requirements out of the box (I have a feeling that we may see deadlocks when updates would be happening at above mentioned pace)? And in case it can't, what should we do that we can build all that is said above?
Thanks in advance,
I would suggest you have to go with any nosql database.. Like cassandra which will increase performance by 30 to 40% compare to mysql/
Is it possible to split the table into multiple ones? E.g. one table for each hour/day/week/month
If yes you can create a separate table for the desired time inteval and store the updates separately.
Or you can try to separate the data into multiple tables by another sign e.g. by customer.
It's hard to advice without more info.
Are you goint to gather some statisticas from the table(s)?
There are a lot of nosql databases available in the market for the same use case which you are having.
You can have all your problems resolved via that.
Oh my....
I still remember in 2002 when we replaced an Oracle database as the requirement was to insert/update 50M records a day and keep the records for at least 7 years with a need for simultaneous reporting/monitoring ;-) (one NMS application for mobile networks)
Before saying that go for NoSQL or NewSQL or what-so-ever, it would be helpful to understand the bigger picture. Do you f.ex. need real-time reporting? Replication? on-line backups and other features that are needed for mission critical applications?
How experienced your team is in DBMS technology? Have you workjed with anythin else but MySQL?
You mention dead-locks? Do you see them now? I would say, don't want to hurt anybody's feelings though, that the dead-locks are mostly due bad design! There are 100's of ways to avoid or swiftly resolve dead-locks starting from queuing, transaction encapsulation, transaction monitors, optimistic concurrency etc.
To me those numbers you provided aren't that high, but the performance can be ruined by design what ever DBMS you are using. Without understanding the full set of needs in ACID, SQL compatibility and other needs, I will NOT recommend jumping to XyzzySQL bandwagon WITHOUT fully understanding where the true requirements and related issues are.
So, in short: Your numbers aren't that worrying. I have seen bigger systems being implemented (using InnoDB = money). I have seen MySQL easily to digest with steady 24h load something like 15M inserts/updates mixed load a day with no hick-ups and tricks.
If your needs are relaxed, you could take the "black hole" or in-memory approach and write to non-permanent tables which would be replicated (over the time) to reliable storage within MySQL files. This is a good approach when it does not matter if you loose "few seconds or minutes" of inserts/updates.
The situation is very different if you have a steady laod with limited peaks or, as in many real-life applications, a huge peak load over certain hours of the day.
Cheers, //Jari
Folks, I'm a developer of a social game and there are already 700k players in the game, and about 7k new players are registered every day, about 5k players are constantly online.
The DB server is running on a pretty powerful hardware: 16 cores CPU, 24 Gb RAM, RAID-10 with BBU built on 4 SAS disks. I'm using Percona server(patched MySQL-5.1) and currently InnoDB buffer pool is 18Gb(although according to innotop only a few free buffers available). The DB server is performing pretty well(2k QPS, iostat %util is 10-15%, almost always 0 processes in "b" state in vmstat, loadavg is 5-6). However from time to time(every few minutes) I'm getting about 10-100 slow queries(where each may last about 5-6 seconds).
There is one big InnoDB table in the MySQL database which occupies the most space. It has about 300 millions rows, it's size is about 20 Gb. Of course, this table is gradually growing... I'm starting to worry it's affecting the overall performance of the database in a negative way. In the nearest future I'll have to do something about it, but I'm not sure what exactly.
Basically question boils down to whether to shard or simply add more RAM. The latter is simpler, of course. Looks like I can add up to 256 Gb RAM. But the question is whether I should invest more time implementing sharding instead since it's more scalable?
Sharding seems reasonable if you need to have all 300m+ rows. It may be a pain to change now but when your table grows and grows there will be a point when no amount of ram will solve your problem. With such massive amounts of data it may be worth using something like couch db as you could store documents of data rather than rows ie 1 document could contain all records for an individual user.
Sounds to me like your main database table could use some normalization. Does all your information belong in that one table, or can you split it out to smaller tables? Normalization may invoke a small performance hit now, but as your table grows, that will be overwhelmed by the extra processing involved in accessing a huge, monolithic table.
I'm getting about 10-100 slow queries(where each may last about 5-6 seconds).
Quote of a comment: Database is properly normalized. The database has many tables, one of them is really huge and has nothing to do with normalization.
When im reading this i would say it has to do with your queries.. has nothing to do with your hardware.. Average companies would dream about kind of server you have!
If you write bad queries doesn't matter how good your tables are normalized, it will be slow.
maybe you got something about this, its almost a similar question with an answer(database is slow and stuff like that).
Also thought about archiving some stuff? For example from those 300 million it started with ID 1 so is that ID still get used? if not why not archive it to a other database or table(i would recommend database). I also believe that not every 700k users are logged in every day(if you got respect! but i don't believe that).
You also said 'This table contains player specific items' what kind of specific items?
Another question, can you post some of your 'slow' queries?
You also considered about a caching system from some data? that maybe changed once a month, like gear other game stuff?
I'm designing a system, and by going deep into numbers, I realize that it could reach a point where there could be a table with 54,240,211,584 records/year (approximately). WOW!!!!
So, I brook it down & down to 73,271,952 records/year (approximately).
I got the numbers by making some excel running on what would happen if:
a) no success = 87 users,
b) low moderated success = 4300 users,
c) high moderated success = 13199 users,
d) success = 55100 users
e) incredible success = nah
Taking into account that the table is used for SELECT, INSERT, UPDATE & JOIN statements and that these statements would be executed by any user logged into the system hourly/daily/weekly (historical data is not an option):
Question 1: is 2nd quantity suitable/handy for the MySQL engine, such that performance would suffer little impact???
Question 2: I set the table as InnoDB but, given the fact that I handle all of the statements with JOINS & that I'm willing to run into the 4GB limit problem, is InnoDB useful???
Quick overview of the tables:
table #1: user/event purchase. Up to 15 columns, some of them VARCHAR.
table #2: tickets by purchase. Up to 8 columns, only TINYINT. Primary Key INT. From 4 to 15 rows inserted by each table #1 insertion.
table #3: items by ticket. 4 columns, only TINYINT. Primary Key INT. 3 rows inserted by each table #2 insertion. I want to keep it as a separated table, but if someone has to die...
table #3 is the target of the question. The way I reduced to 2nd quantity was by making each table #3's row be a table #2's column.
Something that I dont want to do, but I would if necessary, is to partition the tables by week and add more logic to application.
Every answer helps, but it would be more helpful something like:
i) 33,754,240,211,584: No, so lets drop the last number.
ii) 3,375,424,021,158: No, so lets drop the last number.
iii) 337,542,402,115: No, so lets drop the last number. And so on until we get something like "well, it depends on many factors..."
What would I consider "little performance impact"??? Up to 1,000,000 records, it takes no more than 3 seconds to exec the queries. If 33,754,240,211,584 records will take around 10 seconds, that's excellent to me.
Why don't I just test it by myself??? I think I'm not capable of doing such a test. The stuff I would do is just to insert that quantity of rows and see what happens. I prefer FIRST the point of view of someone who has already known of something similar. Remember, I'm still in design stage
Thanks in advance.
54,240,211,584 is a lot. I only have experience with mysql tables up to 300 million rows, and it handles that with little problem. I'm not sure what you're actually asking, but here's some notes:
Use InnoDB if you need transaction support, or are doing a lot of inserts/updates.
MyISAM tables are bad for transactional data, but ok if you're very read heavy and only do bulk inserts/updates every now and then.
There's no 4Gb limit with mysql if you're using recent releases/recen't operating systems. My biggest table is 211Gb now.
Purging data in large tables is very slow. e.g. deleting all records for a month takes me a few hours. (Deleting single records is fast though).
Don't use int/tinyint if you're expecting many billions of records, they'll wrap around.
Get something working, fix the scaling after the first release. An unrealized idea is pretty much useless, something that works(for now) might be very useful.
Test. There's no real substitute - your app and db usage might be wildely different from someone elses huge database.
Look into partitioned tables, a recent feature in MySQL that can help you scale in may ways.
Start at the level you're at. Build from there.
There are plenty of people out there who will sell you services you don't need right now.
If $10/month shared hosting isn't working anymore, then upgrade, and eventually hire someone to help you get around the record limitations of your DB.
There is no 4Gb limit, but of course there are limits. Don't plan too far ahead. If you're just starting up and you plan to be the next Facebook, that's great but you have no resources.
Get something working so you can show your investors :)
I'm writing an app with a MySQL table that indexes 3 columns. I'm concerned that after the table reaches a significant amount of records, the time to save a new record will be slow. Please inform how best to approach the indexing of columns.
UPDATE
I am indexing a point_value, the
user_id, and an event_id, all required
for client-facing purposes. For an
instance such as scoring baseball runs
by player id and game id. What would
be the cost of inserting about 200 new
records a day, after the table holds
records for two seasons, say 72,000
runs, and after 5 seasons, maybe a
quarter million records? Only for
illustration, but I'm expecting to
insert between 25 and 200 records a
day.
Index what seems the most logical (that should hopefully be obvious, for example, a customer ID column in the CUSTOMERS table).
Then run your application and collect statistics periodically to see how the database is performing. RUNSTATS on DB2 is one example, I would hope MySQL has a similar tool.
When you find some oft-run queries doing full table scans (or taking too long for other reasons), then, and only then, should you add more indexes. It does little good to optimise a once-a-month-run-at-midnight query so it can finish at 12:05 instead of 12:07. However, it's a huge improvement to reduce a customer-facing query from 5 seconds down to 2 seconds (that's still too slow, customer-facing queries should be sub-second if possible).
More indexes tend to slow down inserts and speed up queries. So it's always a balancing act. That's why you only add indexes in specific response to a problem. Anything else is premature optimization and should be avoided.
In addition, revisit the indexes you already have periodically to see if they're still needed. It may be that the queries that caused you to add those indexes are no longer run often enough to warrant it.
To be honest, I don't believe indexing three columns on a table will cause you to suffer unless you plan on storing really huge numbers of rows :-) - indexing is pretty efficient.
After your edit which states:
I am indexing a point_value, the user_id, and an event_id, all required for client-facing purposes. For an instance such as scoring baseball runs by player id and game id. What would be the cost of inserting about 200 new records a day, after the table holds records for two seasons, say 72,000 runs, and after 5 seasons, maybe a quarter million records? Only for illustration, but I'm expecting to insert between 25 and 200 records a day.
My response is that 200 records a day is an extremely small value for a database, you definitely won't have anything to worry about with those three indexes.
Just this week, I imported a days worth of transactions into one of our database tables at work and it contained 2.1 million records (we get at least one transaction per second across the entire day from 25 separate machines). And it has four separate composite keys which is somewhat more intensive than your three individual keys.
Now granted, that's on a DB2 database but I can't imagine IBM are so much better than the MySQL people that MySQL can only handle less than 0.01% of the DB2 load.
I made some simple tests using my real project and real MySql database.
My results are: adding average index (1-3 columns in an index) to a table - makes inserts slower by 2.1%. So, if you add 20 indexes, your inserts will be slower by 40-50%. But your selects will be 10-100 times faster.
So is it ok to add many indexes? - It depends :) I gave you my results - You decide!
Nothing for select queries, though updates and especially inserts will be order of magnitudes slower - which you won't really notice before you start inserting a LOT of rows at the same time...
In fact at a previous employer (single user, desktop system) we actually DROPPED indexes before starting our "import routine" - which would first delete all records before inserting a huge number of records into the same table...
Then when we were finished with the insertion job we would re-create the indexes...
We would save 90% of the time for this operation by dropping the indexes before starting the operation and re-creating the indexes afterwards...
This was a Sybase database, but the same numbers apply for any database...
So be careful with indexes, they're FAR from "free"...
Only for illustration, but I'm expecting to insert between 25 and 200 records a day.
With that kind of insertion rate, the cost of indexing an extra column will be negligible.
Without some more details about expected usage of the data in your table worrying about indexes slowing you down smells a lot like premature optimization that should be avoided.
If you are really concerned about it, then setup a test database and simulate performance in the worst case scenarios. A test proving that is or is not a problem will probably be much more useful then trying to guess and worry about what may happen. If there is a problem you will be able to use your test setup to try different methods to fix the issue.
The index is there to speed retrieval of data, so the question should be "What data do I need to access quickly?". Without the index, some queries will do a full table scan (go through every row in the table) in order to find the data that you want. With a significant amount of records this will be a slow and expensive operation. If it is for a report that you run once a month then maybe thats okay; if it is for frequently accessed data then you will need the index to give your users a better experience.
If you find the speed of the insert operations are slow because of the index then this is a problem you can solve at the hardware level by throwing more CPUs, RAM and better hard drive technology at the problem.
What Pax said.
For the dimensions you describe, the only significant concern I can imagine is "What is the cost of failing to index multiple db columns?"