Storing a binary array in MySQL - mysql

I have an array of values called A, B... X, Y, Z. Fun though it would be to have 26 columns in the table I can't help but feel there is a better way. I have considered creating a second table with the id value of row from the first table, the id of the item in the array and then the boolean value but it seems clunky and confusing.
Is there a better way?

Short answer, no. Long answer, it depends.
You can store binary data in a bunch of ways - abusing a number, using a BINARY OR VARBINARY, using a BLOB or TINYBLOB, etc. BINARY types will generally be faster than BLOB types, provided your data is a known size.
However, relational databases aren't designed for doing anything intelligent with binary data. On a project I used to work on, there was a table where each record had as specific binary pattern - stored as some sort of integer - and searching required a lot of ANDs, ORs, XORs and NOTs. It never really worked very well, performance sucked, and it held the whole project down. Looking back, I would have taken a completely different approach.
So if you just want to drop the data in and pull it out again, great. If you want to use it for anything intelligent, tough.
The situation may be different on other database vendors. In fact, have you considered using something else in place of the database? Some sort of object persistence?

Are your possible array values static?
If so, try using MySQL's SET data type.

You can try storing it as a TINYBLOB, or even an UNSIGNED INT, but you'll have to do bit masking in your code.

You can store it as a string and use text manipulation functions to (re)create your array.

Related

Do I need to provide a reference table for boolean columns in SQL?

If I created a table about a user with a tinyint column with boolean states such as "is_vegan", do I need to provide a reference table to explain these values or is it already self-explanatory?
If you are using actual boolean values (setting the data type as bit is a good way to enforce this), then you can make a field fairly instructive through extremely descriptive naming. For example, if the table is named website_user and the field is named confirms_currently_vegan, then you might interpret this as "my website user confirmed on such and such date that they were indeed a vegan". However, you should be wary of how you interpret False values in boolean tables when your fields are not descriptive. If it is false, does that mean they eat meat? Or was that just the default value of your field?
That doesn't mean you shouldn't use booleans, just that you should think about the variety of scenarios that either state might represent.
You can go a long way with descriptive naming in your database, but invariably you will reach a point where you want to document something in a way that the database isn't robust enough to support. I would suggest the use of a formatted data dictionary rather than trying to store all of your descriptions in the database, as this simply clutters your data model and makes the purpose your data less clear (in my opinion).

MySQL best way to store long strings

I'm looking for some advice on the best way to store long strings of data from the mySQL experts.
I have a general purpose table which is used to store any kind of data, by which I mean it should be able to hold alphanumeric and numeric data.
Currently, the table structure is simple with an ID and the actual data stored in a single column as follows:
id INT(11)
data VARCHAR(128)
I now have a requirement to store a larger amount of data (up to 500 characters) and am wondering whether the best way would be to simply increase the varchar column size, or whether I should add a new column (a TEXT type column?) for the times I need to store longer strings.
If any experts out there has any advice I'm all ears!
My preferred method would be to simply increase the varchar column, but that's because I'm lazy.
The mySQL version I'm running is 5.0.77.
I should mention the new 500 character requirement will only be for the odd record; most records in the table will be not longer than 50 characters.
I thought I'd be future-proofing by making the column 128. Shows how much I knew!
Generally speaking, this is not a question that has a "correct" answer. There is no "infinite length" text storage type in MySQL. You could use LONGTEXT, but that still has an (absurdly high) upper limit. Yet if you do, you're kicking your DBMS in the teeth for having to deal with that absurd blob of a column for your 50-character text. Not to mention the fact that you hardly do anything with it.
So, most futureproofness(TM) is probably offered by LONGTEXT. But it's also a very bad method of resolving the issue. Honestly, I'd revisit the application requirements. Storing strings that have no "domain" (as in, being well-defined in their application) and arbitrary length is not one of the strengths of RDBMS.
If I'd want to solve this on the "application design" level, I'd use NoSQL key-value store for this (and I'm as anti-NoSQL-hype as they get, so you know it's serious), even though I recognize it's a rather expensive change for such a minor change. But if this is an indication of what your DBMS is eventually going to hold, it might be more prudent to switch now to avoid this same problem hundred times in the future. Data domain is very important in RDBMS, whereas it's explicitly sidelined in non-relational solutions, which seems to be what you're trying to solve here.
Stuck with MySQL? Just increase it to VARCHAR(1000). If you have no requirements for your data, it's irrelevant what you do anyway.
Careful if using text. TEXT data is not stored in the database server’s memory, therefore, whenever you query TEXT data, MySQL has to read from it from the disk, which is much slower in comparison with CHAR and VARCHAR as it cannot make use of indexes.The better way to store long string will be nosql databases
We can use varchar(<maximum_limit>). The maximum limit that we can pass is 65535 bytes.
Note: This maximum length of a VARCHAR is shared among all columns except TEXT/BLOB columns and the character set used.

Is it OK to use Hex's as database IDs?

I've just been handed a database schema that looks a little odd to me. The database sits behind a Soap web service and I've noticed that all the table ID's are strings in hex format
eg: 0x1D283F
I'm going to have to duplicate the data into a MySQL database. I've never used Hex's as IDs, so I've no idea wether it's a good idea / bad idea or doesn't matter either way. I'm guessing Auto Increment wouldn't work here, which makes me think this is going to be a bad idea.
I could convert them to integers, or leave them as they are, but what are the implications.
First of all, whether a number is 'hexadecimal' or 'decimal' is just a matter of outer representation, it doesn't change the storage format. Using strings of numbers as database indices is obviously inefficient and may prevent certain db features and optimizations, though.
Therefore, convert the hex strings to plain integers and use them as database indices and you'll be fine.
Broadly speaking, you want your primary keys to be meaningless, and (obviously) unique. Hex is just one way of representing a number - underneath, it's still an integer. So, I'd convert to int before storing in the DB.
Bigger question is - are you sure they're unique? If not, you can't use them as your primary key....
It's a bad idea. SELECTs will be faster with integer keys. There might be advantages to using string keys in some instances, but using numeric values stored as strings as keys is just silly.

Best way to store 'extra' user data in MySQL?

I'm adding a new feature to my user module for my CMS and I've hit a road block... Or I guess, a fork in the road, and I wanted to get some opinions from stackoverflow before I commit to anything.
Basically I want to allow admins to add new, 'extra' user fields that users can fill out on registration, edit in their profile, and/or be controlled by other modules. An example of this would be a birthday field, a lengthy description of themselves, or maybe points the user has earned on the site. Needless to say, the data stored will be varied and can range from large amounts of text, to a small integer value. To make matters worse - I want there to be the option to search this data.
With that out of the way - what would be the best way to do this? Right now I'm leaning towards having a table with the following columns.
userid, refFieldID, varchar, tinyint, smallint, int, text, date, datetime, etc.
I would prefer this as it would make searching significantly faster, and the reference table (Which holds all of the field's data, such as the name of the field, whether it's searchable or not, etc.) can reference which column should be used when storing data for that field.
The other idea, which was suggested to me and I've seen used in other solutions (vBulletin being one, although I have seen others whose names escape me at the moment), where you just have the userid, reference id, and a medtext field. I don't know enough about MySQL to say this with any certainty, but this method seems like it would be slower to search, and possibly have a larger overhead.
So which method would be 'best'? Is there another method I'm missing? Whichever method I end up using, it needs to be fast to search, not massive (A tiny bit of overhead is fine), and preferably allow complex queries used against the data.
I agree that a key-value table is probably the best solution. My first inclination would be to just store a text column, like vBulletin did. But, if you wanted to add the ability for the data store to be a bit more extensible and searchable like you've laid out, I might suggest:
1 medium/longtext or medium/longblob field for arbitrary text/binary storage (whatever is stored + overhead of 3-4 bytes for string length). Only reason to choose medium over long is to limit what can be stored to 2^24 bytes (16.7 MB) versus 2^32 bytes (2 GB).
1 integer (4 bytes) or bigint (8 bytes)
1 datetime (8 bytes)
Perhaps 1 float or double (4-8 bytes) for floating point storage
These fields will allow you to store nearly any type of data in the table but without inflating the width of table** (like a varchar would) and avoid any redundant storage (like having tinyint and mediumint etc). The text stored in the longtext field can still be reasonably searched using a fulltext index or a regular limited length index (e.g. index longtext_storage(8)).
** all blob values, such as longtext, are stored independently from the main table.
One technique that might work for you is to store this arbitrary data as text, in some notation like JSON, XML, or YAML. This decision depends on how you'll need to access the data: if you only look up each user's full chunk of user data, it could be ideal. If you need to run SQL queries on specific fields in the user data, you'll need to use a pure SQL or a hybrid approach.
Many of the newer, highly scalable "NoSQL" systems seem to favor JSON data (eg, MongoDB, CouchDB, and Project Voldemort). It's nice and terse, and you can create arbitrarily complex structures including maps (JSON objects) and lists (JSON arrays).

When setting MySQL schema, why use certain types?

When I'm setting up a MySQL table, it asks me to define the name of the column, type of input, and length. My assumption, without having read anything about it, is that it's for minimization. Specify the smallest possible int/smallint/tinyint for your needs, and it will reduce overhead of some sort. If it's all positives, make it unsigned to double your space, etc.
What happens if I just make every field a varchar-200 characters? When/why is this bad, what will I miss out on, and when will any inefficiencies manifest themselves? 100k records?
I think about this every time I set up a DB, but I haven't built anything to scale enough where I've ever had my scheme setup inappropriately, either too "strict/small" or "loose/big". Can someone confirm that I'm making good assumptions about speed and efficiency?
Thanks!
Data types not only optimize storage, but how data is indexed. As your databases get bigger, it will become apparent that it's quicker to search for all the records that have a 1 in an integer field than those that have a "1" in a varchar field. This becomes especially important when you're joining data from more than one table and your database engine is having to do this sort of thing repeatedly. (Daren also rightly points out below that it's important that the types of the fields you're matching on are identical as well.)
The level at which these inefficiencies become an issue depends greatly on your hardware and your application design. We have big enough iron these days that if you're building moderate-scale apps, you may not see an appreciable difference. (Aside from feeling a little bit guilty about your database design!) But establishing good habits on small projects makes the bigger ones easier when they come along.
If you have two columns as varchar and put in the values 10 and 20 and add them, you'll get 1020 instead of 30 which you'd likely expect.
Sure, you could save everything as VARCHAR strings. But you'd be giving up a lot of functionality provided by the database engine.
You should choose the database type that most closely matches the intended use of the column. For example, using DATE or DATETIME to store dates provides you with all sorts of date/time functions that you don't get with basic VARCHAR types.
Likewise, fields used to count things or provide simple unique IDs should be INT or one of its related types. Also bear in mind that an INT occupies only 4 bytes, whereas a 9-digit string uses at least 9 bytes.
For character data, it's wise to use NVARCHAR for internationalized values that users in any locale are going to enter (esp. names and locations). If you know the text is limited to US or internal use only, VARCHAR is safe.