Benefit to duplicate mysql table to dispatch overload - mysql

case 1: i have a table A with 1 insert/per seconde .
From my admin i need to make some heavy read and delets on this table to perform some statistic and maintenance .
Is it make sense to insert incoming data in 2 differents tables A and B , and use the table B for my administration. Goal is to not overload table A .
case 2 :
Another exemple to fully understand the logic , i have a table (tmpA) dedicated to fill search result . Each time there is a search , result is insert into this table and help for pagination.The night , olds results are delet .
actually i have 5 request per second for this table , so aproximativly 500 rows * 5 = 2500 rows /per second .
Is it make sens to creat more tables (tmpA , tmpB , tmpC ,etc..) to dispatch insert and avoid overload ?
for case 1 , if make sens to duplicate ,
whats is the difference with inserting "manualy" incoming data in 2 (or more) differentes
tables between use the mysql replication ?
Thanks to you,
jess

This is kinda difficult to answer, as it depends on your setup hardware-wise.
An insert per second isn't that much. A properly setup server should be able to handle it.
Reads on a table are non-blocking. so gathering info to do statistics (and assuming you don do the calculations for the statistics in the database) shouldn't influence the performance of your database.
Deletes on the other hand are blocking, and will add up to load on a table with heavy inserts.
For Case 1, I do not understand how you would want to split the load on different tables. Generally speaking, there's a database-server load, and not specifically a table load (unless we define blocking processes as table load).
I gather from the comments that Case 1 are user signups/registration. splitting user information over two tables is horrid from a maintenance perspective, plus the coupling of two tables that inevitably needs happen only increases overhead -load-, instead of decreasing it. Deleting data (users?) is also a major issue if the data is divided over two tables. Can you explain how you see administering your data if this is divided over two tables? I'm probably missing something.
Looking at the above, I do not recommend splitting this data between tables.
What I do recommend is:
Use InnoDB as a table type. It has smaller locking than MyISAM (which does table locking?)
Optimize your RAM/memory usage for MySQL. Proper memory settings allow for very quick reads and writes.
Optimize your indexes. the EXPLAIN statement can show which ones are used for each query
Case 2
I don't fully understand the use case, but it might make sense to spit this data up into several tables. Depending on why you want to push the data in these temp tables, splitting might happen per user, keyword or other significant features.
Depending on the use case try limiting the search results (and thus utilizing pagination) through LIMIT BY statements. You don´t need store results for pagination that way, or store the results at all. Can you explain why you want to store these results? 2500 rows/sec is a lot.
Replication is a whole other topic, much more complicated and not achieved by copying tables, but by copying servers. I can't help you with that, never done it, as I never needed it. (my largest MySQL server was aprox. 80Gb large, 350 million rows, with inserts peaking at 224 rows per second)
Can you paste the architecture of your tables you currently use, and some sample data? That might makes the cases at tad more clear.

Related

Mysql what if too much data in a table

Data is increasing in one table everyday, it might lower the performance . I was thinking if I can create a trigger which move table A into A1 and create a new table A every a period of time, so that insert or update could be faster in table A. Is this the right way to save performance ? If not, what should I do ?
(for example, insert or update 1000 rows per second in table A, how is the performance after 3 years ?)
We are designing softwares for a factory. There are product lines which pcb boards are made on. We need to insert almost 60 pcb records per second for years. (1000 rows seem to be exaggerated)
First, you are talking about several terabytes for a single table. Is your disk that big? Yes, MySQL can handle that big a table.
Will it slow down? It depends on
The indexes. If you have 'random' indexes, the INSERTs will slow down to about 1 insert per disk hit. On a spinning HDD, that is only about 100 per second. SSD might be able to handle 1000/sec. Please provide SHOW CREATE TABLE.
Does the table have an AUTO_INCREMENT? If so, it needs to be BIGINT, not INT. But, if possible, get rid of it all together (to save space). Again, let's see the SHOW.
"Point" queries (load one row via an index) are mostly unaffected by the size of the table. They will be about twice as slow in a trillion-row table as in a million-row table. A point query will take milliseconds or tens of milliseconds; no big deal.
A table scan will take hours or days; hopefully you are not doing that.
A billion-row scan of part of the table will take days or weeks unless you are using the PRIMARY KEY or have a "covering" index. Let's see the queries and the SHOW.
The best technique is not to store the data. Summarize it as it arrives, save the summaries, then toss the raw data. (OK, you might store the raw in a csv file just in case you need to build a new summary table or fix a bug in an existing one.)
Having a few summary tables instead of the raw data would shrink the data to under 1TB and allow the relevant queries to run 10 times as fast. (OK, point queries would be only slightly faster.)
PARTITIONing (or otherwise splitting up the table)? It depends. Let's see the queries and the SHOW. In many situations, PARTITIONing does not speed up anything.
Will you be deleting or modifying existing rows? I hope not. That adds more dimensions of problems. If, on the other hand, you need to purge 'old' data, then that is an excellent use for PARTITIONing. For 3 years' worth of data, I would PARTITION BY RANGE(TO_DAYS(..)) and have monthly partitions. Then a monthly DROP PARTITION would be very fast.
Very Huge data may decrease the performance of server, So there is a way to handle this :
1) you have to create another table to store archive data ( old data ) using Archive storage mechanism . ( https://dev.mysql.com/doc/refman/8.0/en/archive-storage-engine.html )
2) create MySQL job/scheduler to move older records to archive table. schedule in timeslot
when server is maximum idle.
3) after moving older records to archive table, re-index the original table.
this will serve the purpose of performance.
It is unlikely that 1000 row tables perform sufficiently poorly that doing a table copy every once in a while is an overall net gain. And anyway, what would the new table have that the old one did not which would improve performance?
The key to having tables perform efficiently is intelligent table design and management of indexes. That is how zillion row tables are effective in geospatial work, library catalogs, astronomy, and how internet search engines find useful data, etc.
Each index defined does cause more mysql impact especially at row insert time. Assuming there are more reads than inserts, this is an advantage because most queries are rapidly completed thanks to a suitable index.
Indexes are best defined with a thorough understanding of the queries made against the table—both in quality and quantity. And, if there is any tendency for the nature of the queries to trend over months or years, then the indexes would need additions, modifications, or—yes—even deletions.
It seems to me there is something inherently wrong with the way you are using MySQL to begin with.
A database system is supposed to manage data that is required by your application in order for it to work. If you think flushing the table every so often is something acceptable, then that doesn't seem to be the case.
Perhaps you are better off just using log files. Split them by date, delete old ones if and when you decide they are no longer relevant or need the disk space. It's even safer to do that way from a recovery perspective.
If you need a better suggestion, then improve your question to include exactly what you are trying to accomplish so we can help you with it.

Join 10 tables on a single join id called session_id that's stored in session table. Is this good/bad practice?

There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!

Does MySQL table size matters when doing JOINs?

I'm currently trying to design a high-performance database for tracking clicks and then displaying analytics of these clicks.
I expect at least 10M clicks to be coming in per 2 weeks time.
There are a few variables (each of them would need a unique column) that I'll allow people to use when using the click tracking - but I don't want to limit them to a number of these variables to 5 or so. That's why I thought about creating Table B where I can store these variables for each click.
However each click might have like 5-15+ of these variables depending on how many are they using. If I store them in a separate table that will multiple the 10M/2 weeks by the variables that the user might use.
In order to display analytics for the variables, I'll need to JOIN the tables.
Looking at both writing and most importantly reading performance, is there any difference if I JOIN a 100M rows table to a:
500 rows table OR to a 100M rows table?
Anyone recommends denormalizing it, like having 20 columns and store NULL vaules if they're not in use?
is there any difference if I JOIN a 100M rows table to a...
Yes there is. A JOIN's performance matters solely on how long it takes to find matching rows based on your ON condition. This means increasing row size of a joined table will increase the JOIN time, since there's more rows to sift through for matches. In general, a JOIN can be thought of as taking A*B time, where A is the number of rows in the first table and B is the number of rows in the second. This is a very broad statement as there are many optimization strategies the optimizer may take to change this value, but this can be thought of as a general rule.
To increase a JOIN's efficiency, for reads specifically, you should look into indexing. Indexing allows you to mark a column that the optimizer should index, or keep a running track of to allow quicker evaluation of the values. This increases any write operation since the data needs to modify an encompassing data structure, usually a B-Tree, but decreases the time read operations since the data is presorted in this data structure allowing for quick look ups.
Anyone recommends denormalizing it, like having 20 columns and store NULL vaules if they're not in use?
There's a lot of factors that would go into saying yes or no here. Mainly, would storage space be an issue and how likely is duplicate data to appear. If the answers are that storage space is not an issue and duplicates are not likely to appear, then one large table may be the right decision. If you have limited storage space, then storing the excess nulls may not be smart. If you have many duplicate values, then one large table may be more inefficient than a JOIN.
Another factor to consider when denormalizing is if another table would ever want to access values from just one of the previous two tables. If yes, then the JOIN to obtain these values after denormalizing would be more inefficient than having the two tables separate. This question is really something you need to handle yourself when designing the database and seeing how it is used.
First: There is a huge difference between joining 10m to 500 or 10m to 10m entries!
But using a propper index and structured table design will make this manageable for your goals I think. (at least depending on the hardware used to run the application)
I would totally NOT recommend to use denormalized tables, cause adding more than your 20 values will be a mess once you have 20m entries in your table. So even if there are some good reasons which might stand for using denormalized tables (performance, tablespace,..) this is a bad idea for further changes - but in the end your decison ;)

Very big data in mysql table. Even select statements take much time

I am working on a database and its a pretty big one with 1.3 billion rows and around 35 columns. Here is what i get after checking the status of the table:
Name:Table Name
Engine:InnoDB
Version:10
Row_format:Compact
Rows:12853961
Avg_row_length:572
Data_length:7353663488
Max_data_length:0
Index_length:5877268480
Data_free:0
Auto_increment:12933138
Create_time:41271.0312615741
Update_time:NULL
Check_time:NULL
Collation:utf8_general_ci
Checksum:NULL
Create_options:
Comment:InnoDB free: 11489280 kB
The Problem I am facing that even a single select query takes too much time to process for example a query Select * from Table_Name limit 0,50000 takes around 2.48 minutes
Is that expected?
I have to make a report in which I have to use the whole historical data, that is whole 1.3 bil rows. I could do this batch by batch but then I would have to run queries which are taking too much time many times again and again.
When the simple query is taking so much time I am not able to do any other complex query which needs joins and case statements.
A common practice is, if you have huge amount of data, you ...
should not SELECT * : You should only select the columns you want
should Limit your fetch range to a smaller number: I bet you won't handle 50000 records at the same time. Try to fetch it batch by batch.
A common problem many database administrators face. The solution: Caching.
Break the Queries into more simpler and small queries. Use Memcached or other caching techniques and tools Memcached saves key vaue pairs, check for a data in memcache..if available, use it. If not fetch it from database and then use and cach. Next tie the data will be available from cahe.
You will have to develop own logic and change some queries. Memcached is available here:
http://memcached.org/
Many tutorials are available on the Web
enable in your my.conf the slow queries up to N seconds, then execute some queries and watch this log, this gives you some clues and maybe you could add some indexes to this table.
or do some queries with EXPLAIN. http://hackmysql.com/case1
A quick note that is usually an easy win ...
If you have any columns that are large text blobs, try selecting everything except for those fields. I've seen varchar(max) fields absolutely kill query efficiency.
You have a very wide average row size and 35 columns. You could try vertically partitioning the table, that is, split the table up into smaller tables that are related to each other 1:1 with a subset of columns from the table. InnoDB stores rows in pages and is not efficient for very wide rows.
If the data is append-only consider looking at ICE.
You might also look at TokuDB because it supports good compression.
You can consider using partitioning and Shard-Query (http://code.google.com/p/shard-query) to access data in parallel. You can also split data over more than one server for parallelism using Shard-Query.
Try adding WHERE clause: WHERE 1=1
If it doesn't give any effect then you should change your engine type to MyISAM.

MySQL - why not index every field?

Recently I've learned the wonder of indexes, and performance has improved dramatically. However, with all I've learned, I can't seem to find the answer to this question.
Indexes are great, but why couldn't someone just index all fields to make the table incredibly fast? I'm sure there's a good reason to not do this, but how about three fields in a thirty-field table? 10 in a 30 field? Where should one draw the line, and why?
Indexes take up space in memory (RAM); Too many or too large of indexes and the DB is going to have to be swapping them to and from the disk. They also increase insert and delete time (each index must be updated for every piece of data inserted/deleted/updated).
You don't have infinite memory. Making it so all indexes fit in RAM = good.
You don't have infinite time. Indexing only the columns you need indexed minimizes the insert/delete/update performance hit.
Keep in mind that every index must be updated any time a row is updated, inserted, or deleted. So the more indexes you have, the slower performance you'll have for write operations.
Also, every index takes up further disk space and memory space (when called), so it could potentially slow read operations as well (for large tables).
Check this out
You have to balance CRUD needs. Writing to tables becomes slow. As for where to draw the line, that depends on how the data is being acessed (sorting filtering, etc.).
Indexing will take up more allocated space both from drive and ram, but also improving the performance a lot. Unfortunately when it reaches memory limit, the system will surrender the drive space and risk the performance. Practically, you shouldn't index any field that you might think doesn't involve in any kind of data traversing algorithm, neither inserting nor searching (WHERE clause). But you should if otherwise. By default you have to index all fields. The fields which you should consider unindexing is if the queries are used only by moderator, unless if they need for speed too
It is not a good idea to indexes all the columns in a table. While this will make the table very fast to read from, it also becomes much slower to write to. Writing to a table that has every column indexed would involve putting the new record in that table and then putting each column's information in the its own index table.
this answer is my personal opinion based I m using my mathematical logic to answer
the second question was about the border where to stop, First let do some mathematical calculation, suppose we have N rows with L fields in a table if we index all the fields we will get a L new index tables where every table will sort in a meaningfull way the data of the index field, in first glance if your table is a W weight it will become W*2 (1 tera will become 2 tera) if you have 100 big table (I already worked in project where the table number was arround 1800 table ) you will waste 100 times this space (100 tera), this is way far from wise.
If we will apply indexes in all tables we will have to think about index updates were one update trigger all indexes update this is a select all unordered equivalent in time
from this I conclude that you have in this scenario that if you will loose this time is preferable to lose it in a select nor an update because if you will select a field that is not indexed you will not trigger another select on all fields that are not indexed
what to index ?
foreign-keys : is a must based on
primary-key : I m not yet sure about it may be if someone read this could help on this case
other fields : the first natural answer is the half of the remaining filds why : if you should index more you r not far from the best answer if you should index less you are not also far because we know that no index is bad and all indexed is also bad.
from this 3 points I can conclude that if we have L fields composed of K keys the limit should be somewhere near ((L-K)/2)+K more or less by L/10
this answer is based on my logic and personal prictices
First of all, at least in SAP - ABAP and in background database table, we can create one index table for all required index fields, we will have their addresses only. So other SQL related software-database system can also use one table for all fields to be indexed.
Secondly, what is the writing performance? A company in one day records 50 sales orders for example. And let assume there is a table VBAK sales order header table with 30 fields for example each has 20 CHAR length..
I can write to real table in seconds, but other index table can work in the background, and at the same time a report is tried to be run, for this report while index table is searched, ther can be a logic- for database programming- a index writing process is contiuning and wait it for ending ( 5 sales orders at the same time were being recorded for example and take maybe 5 seconds) ..so , a running report can wait 5 seconds then runs 5 seconds total 10 seconds..
without index, a running report does not wait 5 seconds for writing performance..but runs maybe 40 seconds...
So, what is the meaning of writing performance no one writes thousands of records at the same time. But reading them.
And reading a second table means that : there were all ready sorted fields.I have 3 fields selected and I can find in which sorted sets I need to search these data, then I bring them...what RAM, what memory it is just a copied index table with only one data for each field -address data..What memory?
I think, this is one of the software company secrets hide from customers, not to wake them up , otherwise they will not need another system in the future with an expensive price.