Too Many Columns in MySQL - error 1117 - mysql

I just added a new field to my table in mysql and it came back with a warning of "1117: too many columns"
The table has (gasp) 1449 columns. I know, I know it's a ridiculous number of columns and we are in the process of refactoring the schema but I need to extend this architecture just a bit more. That said, this doesn't seem to be reaching the theoretical limit of 3398 as per the mysql documentation. We are also not close to the 64K limit per row as we are in the 50K range right now.
The warning does not prevent me from adding fields to the schema so not sure how it fails if at all. How do I interpret this error given that it does not seem to cause any issues?

Perhaps some of these factors are adding to the total byte count:
http://dev.mysql.com/doc/refman/5.5/en/column-count-limit.html
e.g., if a column allows nulls, that adds to the total or if unicode is used, that more than triples the space required by character columns, etc...
for MyISAM:
row length =
1 +
(sum of column lengths) +
(number of NULL columns + delete_flag + 7)/8 +
(number of variable-length columns)
you could check if it's indeed a row size issue or a column count issue, by adding just a tinyint not null column, then dropping that and adding a char(x) column until you get the error.

If it is a warning as you say, then you should remember that warnings are exactly that: warnings. It means you're okay for now but, if you continue with the behaviour that elicited the warning, you will probably be punished in one form or another.
However, it's more likely that this is an error in that it's refused to actually let you add more columns. Even if it does let you add more columns, that's unlikely to last for long.
So, regardless of whether it's a warning or error, the right response is to listen to what it's telling you, and fix it.
If you need a quick'n'dirty fix while you're thinking about the best way to fix it properly, you can split the row across two tables with a common identifier.
This will make your queries rather ugly but will at least allow you to add more columns to the "table" (quoted because it's actually two tables with a common key).
Don't use this as a final solution, not least of all because it breaks normalisation rules. But, to be honest, with more than a thousand columns, there's a good chance they're already broken :-)
I'm finding it very hard to imagine an item that would have thousands of attributes that couldn't be organised into a better hierarchy.

Related

MySQL / MariaDB / InnoDB: Moving variable width columns to a secondary table

I have a MariaDB InnoDB table with several million rows, but with short, fixed-width rows consisting of numbers and timestamps only.
We usually search, filter and sort the rows using any of the existing columns.
We want to add a column to store an associated "url" to each row. Ideally every row will have it's url.
We know for a fact that we won't be sorting, searching and filtering by the url column.
We don't mind truncating the URL to it's first 255 bytes, so we are going to give it the VARCHAR type.
But of course that column's width would be variable. The whole record will become variable-width and the width of the original record will double in many cases.
We were considering the alternative of using a different, secondary table for storing the varchar.
We could join them when querying the data, or even more efficiently -probably- just fetch the url's for the page we are showing.
Would this approach be advisable?
Is there a better alternative that would also allow us to preserve performance?
Update: As user Bill Karwin noted in one comment below, InnoDB does not benefit from fixed width as much as MyISAM does, so the real issue here is about the size of the row and not so much about the fixed versus variable width discussion.
Assuming you have control over how the URL is generated, you may want to change it to a fixed-length state. Youtube videos' URIs, for instance, are always 11 characters long and base-64. This fixes the variable length problem and avoids joining tables.
If changing URI generation is not an option, you have a few alternatives to make it fixed-length:
You could fill in the blanks with a special character to force every url to be 255 within the database, and removing it just before returning it. This is not a clean solution but makes DQL operations faster than joining.
You could fetch the url as you have stated, but beware that two http requests may be more time consuming than any other option with just one request.
You could join with another table only when the user requires it, as opposed to it being the default.
Consider that having variable length may not be as big a problem, depending on your needs. The only issue might be if you're grossly oversizing fields, but it doesn't seem to be your case.

Calculate and minimize row size to get around effective MySQL column limit

I've read the more nuanced responses to the question of how many columns is too many and would like to ask a follow up.
I inherited a pretty messy project (a survey framework), but one could argue that the DB design is actually properly normalized, i.e. a person really has as many attributes as there are questions.
I wouldn't defend that notion in a debate, but the more pressing point is that I have very limited time, I'm trying to help users of this framework out and the quickest fix I can think of right now is reducing the row size. I doubt I have the skill to change the DB model in the time I have.
The column number now is 4661, but they can hopefully reduce it to at least 3244, probably less (by reducing the actual number of questions).
The hard column limit is 4096, but right now I don't even succeed in adding 2500 columns, presumably because of the row size limit which is 65,535 bytes.
However, when I calculate my row size I end up with a much lower value, because nearly all columns are TINYINTs (survey responses ranging from 1-12).
It doesn't even work with 2000 TINYINTs (an example query that fails).
Using the formula given in the documentation I get 4996 bytes or less.
column.lengths = tinyints * 1
null.columns = length(all.columns)
variable.lengths.columns = 0
(row.length = 1+
(column.lengths)+
(null.columns + 7)/8+
(variable.lengths.columns)
)
## 4996
What did I misunderstand in that row length calculation?
I overlooked this paragraph
Thus, using long column names can reduce the maximum number of
columns, as can the inclusion of ENUM or SET columns, or use of column
or table comments.
I had long column names, replacing them with sequential numbers allowed me to have more columns (ca. 2693), I'll have to see if the increase is sufficient. I don't know how they're saved, presumably as strings, so maybe I can reduce them even further using letters.

Mysql: deleting rows vs. using a "removed" column

I was previously under the impression that deleting rows in an autoincremented table can harm SELECT performance, and so I've been using a tinyint column called "removed" to mark whether an item is removed or not.
My SELECT queries are something like this:
SELECT * FROM items WHERE removed = 0 ORDER BY id DESC LIMIT 25
But I'm wondering whether it does, in fact, make sense to just delete those rows instead. Less than 1% of rows are marked as "removed" so it seems dumb for mysql to have to check whether removed = 0 for each row.
So can deleting rows harm performance in any way?
That depends a lot on your use case - and on your users. Marking the row as deleted can help you in various situations:
if a user decides "oh, I did need that item after all", you don't need to go through the backups to restore it - just flip the "deleted" bit again (note potential privacy implications)
with foreign keys, you can't just go around deleting rows, you'd break the relationships in the database; same goes for security/audit logs
you aren't changing the number of rows (which may decrease index efficiency if the removed rows add up)
Moreover, when properly indexed, in my measurements, the impact was always insignificant (note that I wrote "measurements" - go and profile likewise, don't just blindly trust some people on the Internet). So, my advice would be "use the removed column, it has significant benefits and no significant negative impact".
I don't think deleting rows harm on select query. Normally peoples takes an extra column named deleted [removed in your case] to provide a restore like functionality. So if you are not providing restore functionality then you can delete the row it will not affect the select query as far as I know. But while deleting keep relationships in mind they should also get deleted or will result in error or provide wrong results.
You just fill the table with more and more records which you don't need. If you don't plan to use them in the future, I don't think you need to store them at all. If you want to keep them anyway, but don't plan to use them often, you can just create a temp table to hold your "removed" records.

SQL Server maximum 8KB per row?

I just happened to read the Maximum Capacity Specification for SQL Server 2008 and saw a maximum of 8060bytes per row? What the... Only 8KB per row allowed? (Yes, I saw "row-overflow storage" special handling, I'm talking about standard behavior)
Did I misunderstand something here? I'm sure I have, because I'm sure I saw binary objects with several MB sizes stored inside SQL Server databases. Does this ominous per row really mean a table row as in one row, multiple columns?
So when I have three nvarchar columns with each 4000 characters in there (suppose three legal documents written in textboxes...) - the server spits out a warning?
Yes, you'll get a warning on CREATE TABLE, an error on INSERT or UPDATE
LOB types (nvarchar(max), varchar(max) and varbinary(max) allow 2Gb-1 bytes which is how you'd store large chunks of data and is what you'd have seen before.
For a single field > 4000 characters/8000 bytes I'd use nvarchar(max)
For 3 x nvarchar(4000) in one row I'd consider one of:
my design is wrong
nvarchar(max) for one or more column
1:1 child table for the "least populated" columns
2008 will handle the overflow while in 2000, it would simply refuse to insert a record that overflowed. However, it is still best to design with this in mind because a significant number of records overflowed might cause some performance issues in querying. In the case you described, I might consider a related table with a column for document type, a large field for document and and a foreign key to the intial table. If however it is unlikey that all three columns would be filled in the same record or at the max values, then the design might be fine. You have to know your data to determine which is best. Another consideration is to continue as you have now until you have problems and then replace with a separate document table. You could even refactor by renaming the existing table and creating a new one and then creating a view with the existing tablename that pulls the data from the new structure. This could keep alot of your code from breaking although you would still have to adjust any insert or update statements.

Is there any harm in resetting the auto-increment?

I have a 100 million rows, and it's getting too big.
I see a lot of gaps. (since I delete, add, delete, add.)
I want to fill these gaps with auto-increment.
If I do reset it..is there any harM?
If I do this, will it fill the gaps?:
mysql> ALTER TABLE tbl AUTO_INCREMENT = 1;
Potentially very dangerous, because you can get a number again that is already in use.
What you propose is resetting the sequence to 1 again. It will just produce 1,2,3,4,5,6,7,.. and so on, regardless of these numbers being in a gap or not.
Update: According to Martin's answer, because of the dangers involved, MySQL will not even let you do that. It will reset the counter to at least the current value + 1.
Think again what real problem the existence of gaps causes. Usually it is only an aesthetic issue.
If the number gets too big, switch to a larger data type (bigint should be plenty).
FWIW... According to the MySQL docs applying
ALTER TABLE tbl AUTO_INCREMENT = 1
where tbl contains existing data should have no effect:
To change the value of the
AUTO_INCREMENT counter to be used for
new rows, do this:
ALTER TABLE t2 AUTO_INCREMENT = value;
You cannot reset the counter to a
value less than or equal to any that
have already been used. For MyISAM, if
the value is less than or equal to the
maximum value currently in the
AUTO_INCREMENT column, the value is
reset to the current maximum plus one.
For InnoDB, if the value is less than
the current maximum value in the
column, no error occurs and the
current sequence value is not changed.
I ran a small test that confirmed this for a MyISAM table.
So the answers to you questions are: no harm, and no it won't fill the gaps. As other responders have said: a change of data type looks like the least painful choice.
Chances are you wouldn't gain anything from doing this, and you could easily screw up your application by overwriting rows, since you're going to reset the count for the IDs. (In other words, the next time you insert a row, it'll overwrite the row with ID 1, and then 2, etc.) What will you gain from filling the gaps? If the number gets too big, just change it to a larger number (such as BIGINT).
Edit: I stand corrected. It won't do anything at all, which supports my point that you should just change the type of the column to a larger integer type. The maximum possible value for a BIGINT is 2^64, which is over 18 quintillion. If you only have 100 million rows at the moment, that should be plenty for the foreseeable future.
I agree with musicfreak... The maximum for an integer (int(10)) is 4,294,967,295 (unsigned ofcoarse). If you need to go even higher, switching to BIGINT brings you up to 18,446,744,073,709,551,615.
Since you can't change the next auto-increment value, you have other options. The datatype switch could be done, but it seems a little unsettling to me since you don't actually have that many rows. You'd have to make sure your code can handle IDs that large, which may or may not be tough for you.
Are you able to do much downtime? If you are, there are two options I can think of:
Dump/reload the data. You can do this so it won't keep the ID numbers. For example you could use a SELECT ... INTO to copy the data, sans-IDs, to a new table with identical DDL. Then you drop the old table and rename the new table to the old name. Depending on how much data there is, this could take a noticeable about of time (and temporary disk space).
You could make a little program to issue UPDATE statements to change the IDs. If you let that run slowly, it would "defragment" your IDs over time. Then you could temporarily stop the inserts (just a minute or two), update the last IDs, then restart it. After updating the last IDs you can change the AUTO_INCREMENT value to be the next number and your hole will be gone. This shouldn't cause any real downtime (at least on InnoDB), but it could take quite a while depending on how aggressive your program is.
Of course, both of these ignore referential integrity. I'm assuming that's not a problem (log statements that aren't used as foreign keys, or some such).
Does it really matter if there are gaps?
If you really want to go back and fill them, you can always turn off auto increment, and manually scan for the next available id every time you want to insert a row -- remembering to lock the table to avoid race conditions, of course. But it's a lot of work to do for not much gain.
Do you really need a surrogate key anyway? Depending on the data (you haven't mentioned a schema) you can probably find a natural key.