Innodb + 8K length row limit - mysql

I've a Innodb table with, amongst other fields, a blob field that can contain up to ~15KB of data. Reading here and there on the web I found that my blob field can lead (when the overall fields exceed ~8000 bytes) to a split of some records into 2 parts: on one side the record itself with all fields + the leading 768 bytes of my blob, and on another side the remainder of the blob stored into multiple (1-2?) chunks stored as a linked list of memory pages.
So my question: in such cases, what is more efficient regarding the way data is cached by MySQL ? 1) Let the engine deal with the split of my data or 2) Handle the split myself in a second table, storing 1 or 2 records depending on the length of my blob data and having these records cached by MySQL ? (I plan to allocate as much memory as I can for this to occur)

Unless your system's performance is so terrible that you have to take immediate action, you are better off using the internal mechanisms for record splitting.
The people who work on MySQL and its forks (e.g. MariaDB) spend a lot of time implementing and testing optimizations. You will be much happier with simple application code; spend your development and test time on your application's distinctive logic rather than trying to work around internals issues.

Related

Using Redis to cache data that is in use in a Real TIme Single Page App

I've got a web application, it has the normal feature, user settings etc these are all stored in MYSQL with the user etc.....
A particular part of the application a is a table of data for the user to edit.
I would like to make this table real time, across multiple users. Ie multiple users can open the page edit the data and see changes in real time done by other users editing the table.
My thinking is to cache the data for the table in Redis, then preform all the actions in redis like keeping all the clients up to date.
Once all the connection have closed for a particular table save the data back to mysql for persistence, I know Redis can be used as a persistent NoSQL database but as RAM is limited and all my other data is stored in MYSQL, mysql seems a better option.
Is this a correct use case for redis? Is my thinking correct?
It depends on the scalability. The number of records you are going to deal with and the structure you are going to use for saving it.
I will discuss about pros and crons of using redis. The decision is up to you.
Advantages of using redis:
1) It can handle heavy writes and reads in comparison with MYSQL
2) It has flexible structures (hashmap, sorted set etc) which can
localise your writes instead of blocking the whole table.
3) Read queries will be much faster as it is served from cache.
Disadvantages of using redis:
1) Maintaining transactions. What happens if both users try to access a
particular cell at a time? Do you have a right data structure in redis to
handle this case?
2) What if the data is huge? It will exceed the memory limit.
3) What happens if there is a outage?
4) If you plan for persistence of redis. Say using RDB or AOF. Will you
handle those 5-10 seconds of downtime?
Things to be focussed:
1) How much data you are going to deal with? Assume for a table of 10000 rows wit 10 columns in redis takes 1 GB of memory (Just an assumption actual memory will be very much less). If your redis is 10GB cluster then you can handle only 10 such tables. Do a math of about how many rows * column * live tables you are going to work with and the memory it consumes.
2) Redis uses compression for data within a range http://redis.io/topics/memory-optimization. Let us say you decide to save the table with a hashmap, you have two options, for each column you can have a hashmap or for each row you can have a hashmap. Second option will be the optimal one. because storing 1000 (hashmaps -> rows) * 20 (records in each hash map -> columns) will take 10 time less memory than storing in the other way. Also in this way if a cell is changed you can localize in hashmap of within 20 values.
3) Loading the data back in your MYSQL. how often will this going to happen? If your work load is high then MYSQL begins to perform worse for other operations.
4) How are you going to deal with multiple clients on notifying the changes? Will you load the whole table or the part which is changes? Loading the changed part will be the optimal one. In this case, where will you maintain the list of cells which have been altered?
Evaluate your system with these questions and you will find whether it is feasible or not.

The effect of the fields length on the querying time

I have a mysql database in which I keep information of item and also I keep description.
The thing is that the description column can hold up to 150 chars which I think is long and I wondered if it slows the querying time. Also I wanted to know if its recommended to shorten the size of the int I mean if I have a price which is normally not that big should I limit the column to small/medium int?
The columns are something like this:
id name category publisher mail price description
Thanks in advance.
Store your character data as varchar() and not as char() and read up on the MySQL documentation on these data types (here). This only stores the characters actually in the description, plus a few more bytes of overhead.
As for whether or not the longer fields imply worse-performing queries. That is a complicated subject. Obviously, at the extreme, having the maximum size records is going to slow things down versus a 10-byte record. The reason has to do with I/O performance. MySQL reads in pages and a page can contain one or more records. The records on the page are then processed.
The more records that fit on the page, the fewer the I/Os.
But then it gets more complicated, depending on the hardware and the storage engine. Disks, nowadays, do read-aheads as do operating systems. So, the next read of a page (if pages are not fragmented and are adjacent to each other) may be much faster than the read of the initial page. In fact, you might have the next page in memory before processing on the first page has completed. At that point, it doesn't really matter how many records are on each page.
And, 200 bytes for a record is not very big. You should worry first about getting your application working and second about getting it to meet performance goals. Along the way, make reasonable choices, such as using varchar() instead of char() and appropriately sized numerics (you might consider fixed point numeric types rather than float for monetary values).
It is only you that considers 150 long - the database most likely does not, as they're designed to handle much more at once. Do not consider sacrificing your data for "performance". If the nature of your application requires you to store up to 150 characters of text at once, don't be afraid to do so, but do look up optimization tips.
Using proper data types, though, can help you save space. For instance, if you have a field which is meant to store values 0 to 20, there's no need for an INT field type. A TINYINT will do.
The documentation lists the data types and provides information on how much space they use and how they're managed.

To BLOB or not to BLOB

I am in the process of writing a web app backed up by a MySQL database where one of the tables has a potential for getting very large (order of gigabytes) with a significant proportion of table operations being writes. One of the table columns needs to store a string sequence that can be quite big. In my tests thus far it has reached a size of 289 bytes but to be on the safe side I want to design for a maximum size of 1 kb. Currently I am storing that column as a MySQL MediumBlob field in an InnoDB table.
At the same time I have been googling to establish the relative merits and demerits of BLOBs vs other forms of storage. There is a plethora of information out there, perhaps too much. What I have gathered is that InnoDB stores the first few bytes (789 if memory serves me right) of the BLOB in the table row itself and the rest elsewhere. I have also got the notion that if a row has more than one BLOB (which my table does not) per column then the "elsewhere" is a different location for each BLOB. That apart I have got the impression that accessing BLOB data is significantly slower than accessing row data (which sounds reasonable).
My question is just this - in light of my BLOB size and the large potential size of the table should I bother with a blob at all? Also, if I use some form of inrow storage instead will that not have an adverse effect on the maximum number of rows that the table will be able to accommodate?
MySQL is neat and lets me get away with pretty much everything in my development environment. But... that ain't the real world.
I'm sure you've already looked here but it's easy to overlook some of the details since there is a lot to keep in mind when it comes to InnoDB limitations.
The easy answer to one of your questions (maximum size of a table) is 64TBytes. Using variable size types to move that storage into a separate file would certainly change the upper limit on number of rows but 64TBytes is quite a lot of space so the ratio might be very small.
Having a column with a 1KByte string type that is stored inside the table seems like a viable solution since it's also very small compared to 64TBytes. Especially if you have very strict requirements for query speed.
Also, keep in mind that the InnoDB 64TByte limit might be pushed down by the the maximum file size for the OS you're using. You can always link several files together to get more space for your table but then it's starting to get a bit more messy.
if the BLOB data is more then 250kb it is not worth it. In your case i wouldn't bother myself whit BLOB'n. Read this

Maximum number of rows in an MS Access database engine table?

We know the MS Access database engine is 'throttled' to allow a maximum file size of 2GB (or perhaps internally wired to be limited to fewer than some power of 2 of 4KB data pages). But what does this mean in practical terms?
To help me measure this, can you tell me the maximum number of rows that can be inserted into a MS Access database engine table?
To satisfy the definition of a table, all rows must be unique, therefore a unique constraint (e.g. PRIMARY KEY, UNIQUE, CHECK, Data Macro, etc) is a requirement.
EDIT: I realize there is a theoretical limit but what I am interested in is the practical (and not necessarily practicable), real life limit.
Some comments:
Jet/ACE files are organized in data pages, which means there is a certain amount of slack space when your record boundaries are not aligned with your data pages.
Row-level locking will greatly reduce the number of possible records, since it forces one record per data page.
In Jet 4, the data page size was increased to 4KBs (from 2KBs in Jet 3.x). As Jet 4 was the first Jet version to support Unicode, this meant that you could store 1GB of double-byte data (i.e., 1,000,000,000 double-byte characters), and with Unicode compression turned on, 2GBs of data. So, the number of records is going to be affected by whether or not you have Unicode compression on.
Since we don't know how much room in a Jet/ACE file is taken up by headers and other metadata, nor precisely how much room index storage takes, the theoretical calculation is always going to be under what is practical.
To get the most efficient possible storage, you'd want to use code to create your database rather than the Access UI, because Access creates certain properties that pure Jet does not need. This is not to say there are a lot of these, as properties set to the Access defaults are usually not set at all (the property is created only when you change it from the default value -- this can be seen by cycling through a field's properties collection, i.e., many of the properties listed for a field in the Access table designer are not there in the properties collection because they haven't been set), but you might want to limit yourself to Jet-specific data types (hyperlink fields are Access-only, for instance).
I just wasted an hour mucking around with this using Rnd() to populate 4 fields defined as type byte, with composite PK on the four fields, and it took forever to append enough records to get up to any significant portion of 2GBs. At over 2 million records, the file was under 80MBs. I finally quit after reaching just 700K 7 MILLION records and the file compacted to 184MBs. The amount of time it would take to get up near 2GBs is just more than I'm willing to invest!
Here's my attempt:
I created a single-column (INTEGER) table with no key:
CREATE TABLE a (a INTEGER NOT NULL);
Inserted integers in sequence starting at 1.
I stopped it (arbitrarily after many hours) when it had inserted 65,632,875 rows.
The file size was 1,029,772 KB.
I compacted the file which reduced it very slightly to 1,029,704 KB.
I added a PK:
ALTER TABLE a ADD CONSTRAINT p PRIMARY KEY (a);
which increased the file size to 1,467,708 KB.
This suggests the maximum is somewhere around the 80 million mark.
As others have stated it's combination of your schema and the number of indexes.
A friend had about 100,000,000 historical stock prices, daily closing quotes, in an MDB which approached the 2 Gb limit.
He pulled them down using some code found in a Microsoft Knowledge base article. I was rather surprised that whatever server he was using didn't cut him off after the first 100K records.
He could view any record in under a second.
It's been some years since I last worked with Access but larger database files always used to have more problems and be more prone to corruption than smaller files.
Unless the database file is only being accessed by one person or stored on a robust network you may find this is a problem before the 2GB database size limit is reached.
We're not necessarily talking theoretical limits here, we're talking about real world limits of the 2GB max file size AND database schema.
Is your db a single table or
multiple?
How many columns does each table have?
What are the datatypes?
The schema is on even footing with the row count in determining how many rows you can have.
We have used Access MDBs to store exports of MS-SQL data for statistical analysis by some of our corporate users. In those cases we've exported our core table structure, typically four tables with 20 to 150 columns varying from a hundred bytes per row to upwards of 8000 bytes per row. In these cases, we would bump up against a few hundred thousand rows of data were permissible PER MDB that we would ship them.
So, I just don't think that this question has an answer in absence of your schema.
It all depends. Theoretically using a single column with 4 byte data type. You could store 300 000 rows. But there is probably alot of overhead in the database even before you do anything. I read some where that you could have 1.000.000 rows but again, it all depends..
You can also link databases together. Limiting yourself to only disk space.
Practical = 'useful in practice' - so the best you're going to get is anecdotal. Everything else is just prototyping and testing results.
I agree with others - determining 'a max quantity of records' is completely dependent on schema - # tables, # fields, # indexes.
Another anecdote for you: I recently hit 1.6GB file size with 2 primary data stores (tables), of 36 and 85 fields respectively, with some subset copies in 3 additional tables.
Who cares if data is unique or not - only material if context says it is. Data is data is data, unless duplication affects handling by the indexer.
The total row counts making up that 1.6GB is 1.72M.
When working with 4 large Db2 tables I have not only found the limit but it caused me to look really bad to a boss who thought that I could append all four tables (each with over 900,000 rows) to one large table. the real life result was that regardless of how many times I tried the Table (which had exactly 34 columns - 30 text and 3 integer) would spit out some cryptic message "Cannot open database unrecognized format or the file may be corrupted". Bottom Line is Less than 1,500,000 records and just a bit more than 1,252,000 with 34 rows.

I can store a lot of data (<=4GB) in one table column. But is it a good idea?

To make a long story short, one part of the application I'm working on needs to store a somewhat large volume of data in a database, for another part of the application to pick up later on. Normally this would be < 2000 rows, but can occasionally exceed 300,000 rows. The data needs to be temporarily stored and can be deleted afterwards.
I've been playing around with various ideas and one thing came to mind today. The LONGTEXT datatype can store a maximum of 2^32 bytes, which equates to 4 GB. Now, that's a lot of stuff to cram into one table row. Mind you, the data would probably not exceed 60-80 MB at the very most. But my question is, is it a good idea to actually do that?
The two solutions I'm currently toying with using are something like this:
Inserting all data as individual rows into a "temporary" table that would be truncated after finish.
Inserting all data as a serialized string into a LONGTEXT column in a row that would be deleted after finish.
Purely from a performance perspective, would it be better to store the data as potentially >300,000 individual rows, or as a 60 MB LONGTEXT entry?
If it's a wash, I will probably go with the LONGTEXT option, as it would make the part of the application that picks up the data easier to write. It would also tie in better with yet another part, which would increase the overall performance of the application.
I would appreciate any thoughts on this.
Serializing all that data into a LONGTEXT... blasphemy!! :)
Seriously though, it occurs to me that if you do that, you would have no choice than to extract it all in one, giant, piece. If you spread it into individual rows, on the other hand, you can have your front-end fetch it in smaller batches.
At least giving yourself that option seems the smart thing to do. (Keep in mind that underestimating the future size requirements of once data can be a fatal error!)
And if you design your tables right, I doubt very much that 60MiB of data spread over 300.000 rows would be any less efficient than fetching 60MiB of text and parsing that on the front-end.
Ultimately the question is: do you think your front-end can parse the text more efficiently than MySQL can fetch it?
This should be fine as long as you use a memory storage engine. In MySQL, this means using the MEMORY storage engine instead of InnoDB or MyISAM. Otherwise disk usage will bring your app to its knees.
What kind of data and how it will be used? Probably it will be much better to store and process it in memory of your application. At least, it will be much faster and will not load DB engine.
You could always store it in the database as the 300,000 row format and use memcached to cache the data so you don't have to do it again. Please note that memcached stores it in the memory of the machine so if your using a lot of this data you may way to set a low expire on it. But memcached significantly speeds up the time to fetch data because you dont have to do queries every page load.
If you're going to just be writing a large, temporary BLOB you might consider writing to a temporary file on a shared file system instead.