mysql table specification for numerous varchar fields - mysql

I have a table with over 100 columns imported from Access into mysql. the table will be displayed in the typical shared hosting apache environment using PHP and HTML. Almost all fields were coming in as varchar(255). This caused errors of oversize row length on import so i switched many of them to text(0) for import. I would like to make these fields have a size and varchar type so I can index them for search speed. Each of the fields will only contain maybe at most 10 words.
I need to the fields to be set to as small as they can be so I don't push past mysql row maximum.
How do I calculate the size I need for the varchar?
I am a noob at mysql so if I am asking something wrong or lack understanding please explain.

SELECT MAX(LENGTH(field1)) length_field1,
MAX(LENGTH(field2)) length_field2,
-- ..........
MAX(LENGTH(fieldN)) length_fieldN
FROM source_table;

Related

Updating partially a blob or binary type in MySQL

I am currently involved in a project that requires to store a chunk of bits in a column of a table on MySQL. Originally, the type of the column was MySQL BIT, and I was able to do for example something like:
UPDATE table SET column = column | b'0001'
in order to set a bit 1 in the column in a certain position.
However, because of the limitation of the size (maximum 64 bits), and due requeriments of the project, I have tried to find another column type, like a BLOB, or BINARY that allows us to achieve something like what I exposed in the example.
So, the question is: how can be set a certain bit in the value of a column in MySQL?
Thanks in advance,

Creation of a MySQL table with a field name that may contain an INT and VARCHAR

I am extremely new to MySQL database/table creation.
I am trying to set up a SQL table in MySQL. I have all of the commands down, thus far. The one thing I noticed in my data set is that there are certain pieces of data (vendors) that have both integers and varchars in their names.
How can I work around this issue? I cannot edit the data, as there are certain vendors with names such as M5 and things along those lines.
Also, I am on a Mac, and using phpMyAdmin, if that helps!
Thank you in advance!
Use varchar, it allows ascii characters which include 0-9.

Find all characters in a table column of MySQL database?

Is there any easy way to find out all characters used in a specific column of a table in MySQL?
For example, these records:
"title"
"DP&E"
"UI/O"
"B,B#M"
All the characters used in the "title" column would be: DPEUIOBM&/#,
I'm not aware of any means to do this easily using MySQL. The best you'll be able to do is to test each potential character one by one with exists statements. This will be very slow, too, since it'll lead to reading your whole table as many times as there are characters that are not present.
If you've the possibility, create a temporary table that aggregates your needed data into a huge text field, dump it, and populate a compatible table in PostgreSQL. This will allow you to extract the needed data using a query that looks like this:
select distinct regexp_split_to_table(yourfield, '') as letter
from yourtable;
It'll still be very slow, but at least you'll go through the data a single time.

SQL Server maximum 8KB per row?

I just happened to read the Maximum Capacity Specification for SQL Server 2008 and saw a maximum of 8060bytes per row? What the... Only 8KB per row allowed? (Yes, I saw "row-overflow storage" special handling, I'm talking about standard behavior)
Did I misunderstand something here? I'm sure I have, because I'm sure I saw binary objects with several MB sizes stored inside SQL Server databases. Does this ominous per row really mean a table row as in one row, multiple columns?
So when I have three nvarchar columns with each 4000 characters in there (suppose three legal documents written in textboxes...) - the server spits out a warning?
Yes, you'll get a warning on CREATE TABLE, an error on INSERT or UPDATE
LOB types (nvarchar(max), varchar(max) and varbinary(max) allow 2Gb-1 bytes which is how you'd store large chunks of data and is what you'd have seen before.
For a single field > 4000 characters/8000 bytes I'd use nvarchar(max)
For 3 x nvarchar(4000) in one row I'd consider one of:
my design is wrong
nvarchar(max) for one or more column
1:1 child table for the "least populated" columns
2008 will handle the overflow while in 2000, it would simply refuse to insert a record that overflowed. However, it is still best to design with this in mind because a significant number of records overflowed might cause some performance issues in querying. In the case you described, I might consider a related table with a column for document type, a large field for document and and a foreign key to the intial table. If however it is unlikey that all three columns would be filled in the same record or at the max values, then the design might be fine. You have to know your data to determine which is best. Another consideration is to continue as you have now until you have problems and then replace with a separate document table. You could even refactor by renaming the existing table and creating a new one and then creating a view with the existing tablename that pulls the data from the new structure. This could keep alot of your code from breaking although you would still have to adjust any insert or update statements.

varchar(max) everywhere?

Is there any problem with making all your Sql Server 2008 string columns varchar(max)? My allowable string sizes are managed by the application. The database should just persist what I give it. Will I take a performance hit by declaring all string columns to be of type varchar(max) in Sql Server 2008, no matter what the size of the data that actually goes into them?
By using VARCHAR(MAX) you are basically telling SQL Server "store the values in this field how you see best", SQL Server will then choose whether to store values as a regular VARCHAR or as a LOB (Large object). In general if the values stored are less than 8,000 bytes SQL Server will treat values as a regular VARCHAR type.
If the values stored are too large then the column is allowed to spill off the page in to LOB pages, exactly as they do for other LOB types (text, ntext and image) - if this happens then additional page reads are required to read the data stored in the additional pages (i.e. there is a performance penatly), however this only happens if the values stored are too large.
In fact under SQL Server 2008 or later data can overflow onto additional pages even with the fixed length data types (e.g. VARCHAR(3,000)), however these pages are called row overflow data pages and are treated slightly differently.
Short version: from a storage perspective there is no disadvantage of using VARCHAR(MAX) over VARCHAR(N) for some N.
(Note that this also applies to the other variable-length field types NVARCHAR and VARBINARY)
FYI - You can't create indexes on VARCHAR(MAX) columns
Indexes can not be over 900 bytes wide for one. So you can probably never create an index. If your data is less then 900 bytes, use varchar(900).
This is one downside: because it gives
really bad searching performance
no unique constraints
Simon Sabin wrote a post on this some time back. I don't have the time to grab it now, but you should search for it, because he comes up with the conclusion that you shouldn't use varchar(max) by default.
Edited: Simon's got a few posts about varchar(max). The links in the comments below show this quite nicely. I think the most significant one is http://sqlblogcasts.com/blogs/simons/archive/2009/07/11/String-concatenation-with-max-types-stops-plan-caching.aspx, which talks about the effect of varchar(max) on plan caching. The general principle is to be careful. If you don't need it to be max, then don't use max - if you need more than 8000 characters, then sure... go for it.
For this question specifically a few points I don't see mentioned.
On 2005/2008/2008 R2 if a LOB column is included in an index this will block online index rebuilds.
On 2012 the online index rebuild restriction is lifted but LOB columns cannot participate in the new functionality Adding NOT NULL Columns as an Online Operation.
Locks can be taken out longer on rows containing columns of this data type. (more)
A couple of other reasons are covered in my answer as to why not varchar(8000) everywhere.
Your queries may end up requesting huge memory grants not justified by the size of data.
On table with triggers it can prevent an optimisation where versioning tags are not added.
I asked the similar question earlier. got some interesting replies. check it out here
There was one site that had a guy talking about the detriment of using wide columns, however if your data is limited in the application, my testing disproved it.
The fact you can't create indexes on the columns means I wouldn't use them all the time (personally i wouldn't use them that much at all, but i'm a bit of a purist in that regard).
However if you know there isn't much stored in them, i don't think they are that bad.
If you do any sorting on columns a recordset with a varchar(max) in it (or any wide column being char or varchar), then you could suffer performance penalties. these could be resolved (if required) by indexes, but you can't put indexes on varchar(max).
If you want to future proof your columns, why not just put them to something reasonable. eg a name column be 255 characters instead of max... that kinda thing.
There is another reason to avoid using varchar(max) on all columns. For the same reason we use check constraints (to avoid filling tables with junk caused by errant software or user entries), we would want to guard against any faulty process that adds much more data than intended. For example, if someone or something tried to add 3,000 bytes into a City field, we would know for certain that something is amiss and would want to stop the process dead in its tracks to debug it at the earliest possible point. We would also know that a 3000-byte city name could not possibly be valid and would mess up reports and such if we tried to use it.
Ideally, you should only allow what you need. Meaning if you're certain a particular column (say a username column) is never going to be more than 20 characters long, using a VARCHAR(20) vs. a VARCHAR(MAX) lets the database optimize queries and data structures.
From MSDN:
http://msdn.microsoft.com/en-us/library/ms176089.aspx
Variable-length, non-Unicode character data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes.
Are you really going ever going to come close to 2^31-1 bytes for these columns?