This is a bit ambiguous and I think relies more on your own personal experience so any input is welcome.
I have a database X and within X is table Y. Table Y has become very large(1.1 million rows) and it can not be refactored anymore than it already has.
So... within your experience how much further can this table grow before I begin seeing problems(if any) occurring and what are those problems likely to be?
Why would a mere 1.1 million rows cause problems? Most (if not all) RDBMS'es can handle many, many more (like billions) as long as storage etc. suffices ofcourse and as long as the partition can handle files of considerable size (e.g. Fat32 only supports up to 2GB per file for example).
Also; you need to be more specific on what you're referring to when saying "before I begin seeing problems(if any)". What kind of problems? You might already have problems if you're not using correct indices for example which might slow queries down. That might be a problem but can, in some cases, also be fine.
Another issue that might actualy be a problem is stuff like an autoincrement primary key field of type (unsigned) int which might overflow at values around 2.1 (signed) or 4.2 billion rows (unsigned) but since you're at 1.1 million rows currently that is way outside of what to worry about now. (Exact values are, ofcourse, 231-1 and 232-1 respectively for signed and unsigned int). In that case you'll have to think about using types like bigint or others (maybe even (var)char etc.) for your PK.
The only thing interesting here, for MySQL specifically, could be: are you using InnoDB or MyISAM? I don't know the exact details since I'm not usually working with MySQL but I seem to remember that MyISAM can cause trouble (probably in old(er) versions like <5.0 or something). Correct me if I'm wrong. Edit: read up here. MyISAM supports a max. of 232 rows apparently, unless compiled with specific options.
It depends on the operating system being used. For older systems the typical issue is the maximum file size. The maximum 32 bit addressing of filesystems (e.g. FAT32) employed by older versions of operating systems could not seek past 2GB.
See Maximum table size documentation.
Related
I'm actually building a database on phpMyAdmin and I'm asking my self if something is possible and how could I implement it?
The fact is that I'm building lists through a website and then saving it inside of my database, but this list is only composed of items I already have stored in my database on another table.
I thought than a column with a SET datatype and all the selected items would be a memory gain and a clarity improvement instead of creating x lines linked to the created list by an ID column.
So, the question I'm asking is, can I create this kind of set for a column, which will update by it-self when I'll add items in the other table ? If yes, can I do it through phpMyAdmin interface or do I have to work on the MySQL server itself.
Finally, it won't be possible to use the SET datatype in my application because it can only store up to 64 items and I'll be manipulating around a thousand.
I'm still interested if any of you guys have an idea on how to do it because a table with x times(ID,wordID#) (see my situation, explained a bit higher in this post, on the answers part) doesn't seem really optimized and a light-weighted option.
Have a nice day :)
It is possible to simulate a SET in a BLOB (max 64K bits) or MEDIUMBLOB (max 16M bits), but it takes a bit of code -- find the byte, modify it using & or |, stuff it back in.
Before MySQL 8.0 bitwise operations (eg ANDing two SETs, etc) was limited to 64 bits. With 8.0, BLOBs can be operated on that way.
If your SETs tend to be sparse, then a list of bit numbers (in a commalist or in a table) may be more compact. However, "dictionary" implies to me that your SETs are likely to be somewhat dense.
If you are doing some other types of operations, clue us in.
I am considering a problem.
In C language, we are suggested that the size of struct should be multiples of 2-byte.
e.q.:
struct text{
int index;//assume int is 4 byte.
char [8] word;
}//alought text is only 12 bytes, compiler would assign 16 bytes for this struct
Therefore, I am wondering does the record size(thanks of Gordon Linoff) of MySQL encounter the same problem?
Moreover, how can I optimize MySQL via controlling table size?
First, you are referring to a record size and not the table size.
Second, databases do not work the way that procedural languages do. Records are stored on pages, which are filled up until no more fit. Then additional pages are used. Typically, there are many records on a page.
You can get an idea of what a page looks like here. They are complicated but basically hidden from the user.
It sounds like you are attempting "premature optimization". This isn't quite the root of all evil, but it is a major distraction to getting things accomplished. In other words, define the record as you need it defined. Do what you want to do. If you have performance problems, then fix those when they arise.
The size of a record is going to be the least of your problems. Databases perform I/O in units of pages, so the difference between 12 and 16 bytes is meaningless for a single record. You still have to read the entire page (which is much larger).
I am in the process of writing a web app backed up by a MySQL database where one of the tables has a potential for getting very large (order of gigabytes) with a significant proportion of table operations being writes. One of the table columns needs to store a string sequence that can be quite big. In my tests thus far it has reached a size of 289 bytes but to be on the safe side I want to design for a maximum size of 1 kb. Currently I am storing that column as a MySQL MediumBlob field in an InnoDB table.
At the same time I have been googling to establish the relative merits and demerits of BLOBs vs other forms of storage. There is a plethora of information out there, perhaps too much. What I have gathered is that InnoDB stores the first few bytes (789 if memory serves me right) of the BLOB in the table row itself and the rest elsewhere. I have also got the notion that if a row has more than one BLOB (which my table does not) per column then the "elsewhere" is a different location for each BLOB. That apart I have got the impression that accessing BLOB data is significantly slower than accessing row data (which sounds reasonable).
My question is just this - in light of my BLOB size and the large potential size of the table should I bother with a blob at all? Also, if I use some form of inrow storage instead will that not have an adverse effect on the maximum number of rows that the table will be able to accommodate?
MySQL is neat and lets me get away with pretty much everything in my development environment. But... that ain't the real world.
I'm sure you've already looked here but it's easy to overlook some of the details since there is a lot to keep in mind when it comes to InnoDB limitations.
The easy answer to one of your questions (maximum size of a table) is 64TBytes. Using variable size types to move that storage into a separate file would certainly change the upper limit on number of rows but 64TBytes is quite a lot of space so the ratio might be very small.
Having a column with a 1KByte string type that is stored inside the table seems like a viable solution since it's also very small compared to 64TBytes. Especially if you have very strict requirements for query speed.
Also, keep in mind that the InnoDB 64TByte limit might be pushed down by the the maximum file size for the OS you're using. You can always link several files together to get more space for your table but then it's starting to get a bit more messy.
if the BLOB data is more then 250kb it is not worth it. In your case i wouldn't bother myself whit BLOB'n. Read this
I think this question was asked many times, but searching for it I have found only some notes in some responses.
I know that files are generally handled by OS+filesystem, but there are (or should be) methods to change this. Other big DB systems use at least file preallocation and growing by adding up big chunks. As I know, MySQL lacks this type of features (or is my knowledge outdated?) and it can offer only OPTIMIZE TABLE which defragments the records inside the files, but the files themselves could be very fragmented.
As a specific problem, I have a table that should act as a stack: a lot of INSERTs and DELETEs, the data has a short life (from several seconds to hours), the maximum size of the stack is known. The table will be modified often - thousands of times per day, and there are others active tables too, so this scenario will cause a very fragmented layout of the disk over time. My current idea is to preallocate the entire stack table and then use a top index and just UPDATEs.
Anyway, besides my specific problem (it would be nice to have a solution for it too), What are the used methods to diminish or even eliminate defragmantation (if possible) of MySQL data files, both MyISAM and InnoDB, preferably on *nix systems? and is raw devices a solution at least for InnoDB?
If you do not need the stack to survive server/mysql reboot, and it fits in memory, create the table as a MEMORY table.
--HHS
How much of a difference does using tinyint or smallint (when applicable) instead of just int do? Or restricting a char field to the minimum characters needed?
Do these choices affect performance or just allocated space?
On an Indexed field with a significantly large table the size of your field can make a large affect on performance. On a nonindexed field its not nearly as important bit it still has to write the extra data.
That said, the downtime of a resize of a large table can be several minutes or several hours even, so don't make them smaller than you'd imagine ever needing.
Yes it affects performance too.
If the indexes are larger, it takes longer to read them from disk, and less can be cached in memory.
I've frequently seen these three schema design defects causing problems:
A varchar(n) field was created with n only big enough for the sample of data that the designer had pulled in, not the global population: fine in unit tests, silent truncations in the real world.
A varchar(n) used where the data is fixed size. This masks data bugs.
A char(n) used for variable length data. This provides performance improvements (by enabling the data to sit in-line in the row on disc, but all the client code (and various stored procs/views etc) need to cope with whitespace padding issues (and often they don't). Whitespace padding can be difficult to track down, because spaces don't show up too well, and various libraries/sql clients suppress them.
I've never seen a well intentioned (i.e. not just using varchar(255) for all cols) but conservative selection of the wrong data size cause a significant performance problems. By significant, I mean factor of 10. I regularly see algorithmic design flaws (missing indexes, sending too much data over the wire etc.) causing much bigger performance hits.
Both, in some cases. But imo, it's more of a question of design than performance and storage considerations. The reason you don't make everything varchar(...) is because that doesn't accurately reflect what sort of data should be stored there, and it reduces your data's integrity and type-safety.