I need to store some flags for user records in a MySQL table (I'm using InnoDB):
---------------------------
| UserId | Mask |
| -------------------------
| 1 | 00000...001 |
| 2 | 00000...010 |
---------------------------
The number of flags is bigger than 64, so I can't use a BIGINT or BIT type to store the value.
I don't want to use many-to-many association tables, because each user can have more than one profile, each one with its set of flags and it would grow too big very quickly.
So, my question is, is it possible to store these flags in a VARCHAR, BLOB or TEXT type column and still do bitwise operations on them? If yes, how?
For now I just need one operation: given a mask A with X bits set to 1 what users have at least those X bits set to 1.
Thanks!
EDIT
To anyone reading this, I've found a solution (for me, at least). I'm using a VARCHAR for the mask field and when searching for a specific mask I use this query:
select * from my_table where mask like '__1__1'
Every record that has the 3rd and last bit set to on will be returned. The "_" symbol is a SQL placehoder for "any single character" (mySQL only, perhaps?).
In terms of speed is doing fine right now, will have to check later when my user base grows.
Anyway, thanks for your input. Other ideas welcomed, of course.
Related
I have the following scenario:
A form with many checkboxes, around 100.
I have 2 ideas on how to save them in database:
1. Multicolumn
I create a table looking like this:
id | box1 | box2 | ... | box100 | updated| created
id: int
box1: bit(1)
SELECT * FROM table WHERE box1 = 1 AND box22 = 1 ...
2. Single data column
Table is simply:
id | data | updated | created
data: varchar(100)
SELECT * FROM table WHERE data LIKE '_______1___ ... ____1____1'
where data looks like 0001100101010......01 each character representing if value was checked or not.
Considering that the table will have 200k+ rows, which is a more scalable solution?
3. Single data column of type JSON
I have no good information about this yet.
Or...
4. A few SETs
5. A few INTs
These are much more compact: about 8 checkboxes per byte.
They are a bit messy to set/test.
Since they are limited to 64 bits, you would need more than one SET or INT. I recommend grouping the bits is some logical way, based on the app.
Be aware of FIND_IN_SET().
Be aware of (1 << $n) for creating the value 2^n.
Be aware of | and & Operators.
Which of the 5 is best? That depends on the queries you need to run -- for searching (if necessary?), for inserting, for updating (if necessary?), and for selecting.
An example: For INTs , WHERE (bits & 0x2C08) = 0x2C08 would simultaneously check for 4 flags being 'ON'. That constant could either be constructed in app code, or ((1<<13) | (1<<11) | (1<<10) | (1<<3)) for bits 3,10,11,13. Meanwhile, the other flags are ignored. If you need them to be 'OFF', the test would be WHERE bits ^ 0x2C08 = 0. If either of these kind of test is your main activity, then Choice 5 is probably the best for both performance and space, though it is somewhat cryptic to read.
When adding another option, SET requires an ALTER TABLE. INT usually has some spare bits (TINYINT UNSIGNED has 8 bits, ... BIGINT UNSIGNED has 64). So, about one time in 8, you would need an ALTER to get a bigger INT or add another INT. Deleting an option: suggest just abandoning that SET element or bit of INT.
'customer_data' table:
id - int auto increment
user_id - int
json - TEXT field containing json object
tags - varchar 200
* id + user_id are set as index.
Each customer (user_id) may have multiple lines.
"json" is text because it may be very large with many keys or or not so big with few keys containing short values.
I usually search for the json for user_id.
Problem: with over 100,000 lines and it takes forever to complete a query. I understand that TEXT field are very wasteful and mysql does not index them well.
Fix 1:
Convert the "json" field to multiple columns in the same table where some columns may be blank.
Fix 2:
Create another table with user_id|key|value, but I may go into huge "joins" and will that not be much slower? Also the key is string but value may be int or text and various lengths. How to I reconcile that?
I know this is a pretty regular usecase, what are the "industry standards" for this usecase?
UPDATE
So I guess Fix 2 is the best option, how would I query this table and get one row result, efficiently?
id | key | value
-------------------
1 | key_1 | A
2 | key_1 | D
1 | key_2 | B
1 | key_3 | C
2 | key_3 | E
result:
id | key_1 | key_2 | key_3
---------------------------
1 | A | B | C
2 | D | | E
This answer is a bit outside the box defined in your question, but I'd suggest:
Fix 3: Use MongoDB instead of MySQL.
This is not to criticize MySQL at all -- MySQL is a great structured relational database implementation. However, you don't seem interested in using either the structured aspects or the relational aspects (either because of the specific use case and requirements or because of your own programming preferences, I'm not sure which). Using MySQL because relational architecture suits your use case (if it does) would make sense; using relational architecture as a workaround to make MySQL efficient for your use case (as seems to be the path you're considering) seems unwise.
MongoDB is another great database implementation, which is less structured and not relational, and is designed for exactly the sort of use case you describe: flexibly storing big blobs of json data with various identifiers, and storing/retrieving them efficiently, without having to worry about structural consistency between different records. JSON is Mongo's native document representation.
I'm trying to find a way to compare two DNA-like strings with MySQL, stored functions are no problem. Also the string may be changed, but needs to have the following format: [code][id]-[value] like C1-4. (- may be changed aswell)
Example of the string:
C1-4,C2-5,C3-9,S5-2,S8-3,L2-4
If a value not exists in the other string, for example S3-1 it will score 10 (max value). If the asked string has C1-4 and the given string has C1-5 the score has to be 4 - 5 = -1 and if the asked string is C1-4 and the given string has C1-2 the score has to be 4 - 2 = 2.
The reason for a this is that my realtime algorithm is getting slow with 10.000 results. (already optimized with stored functions, indexes, query optimalizations) Because 10.000 x small and quick queries will make a lot.
And the score has to be calculated before I can order my query and get the right limit.
Thanks and if you have any questions let me know by comment.
** EDIT **
I'm thinking that it's also possible to not use a string but a table where the DNA-bits are stored as a 1-n relation table.
ID | CODE | ID | VALUE
----------------------
1. | C... | 2. | 4....
I am facing a problem regarding a string comparison in MySQL.
I have the following table,
res_id | image_min_allowed_dimension | canvas_dimension
1 400x500 8x10
2 800x600 11x14
As you can see in this table,
image_min_allowed_dimension column has 2 sets of record. Ans also canvas_dimension has 2 sets
Now, my goal is to get these 2 sets of record with a given value for image_min_allowed_dimension.
Say, if I give 1024x768 for image_min_allowed_dimension in the PHP script it will give me the 2 sets of record from canvas_dimension field.
The probable algo would be,
Fetch All Records as canvas_dimension
IF image_min_allowed_dimension is Less than or equal to a given value(i.e, 1024x768)
ELSE IF the given value is greater than image_min_allowed_dimension then return nothing.
But as the fields are varchar, how can I achieve that.?
Please help.
Refactor your schema to store your resolutions in a sane manner.
res_id | image_min_allowed_width | image_min_allowed_height | canvas_width | canvas_height
Your future self will thank you for the extra effort.
I have two tables:
Avatars:
Id | UserId | Name | Size
-----------------------------------------------
1 | 2 | 124.png | Large
2 | 2 | 124_thumb.png | Thumb
Profiles:
Id | UserId | Location | Website
-----------------------------------------------
1 | 2 | Dallas, Tx | www.example.com
These tables could be merged into something like:
User Meta:
Id | UserId | MetaKey | MetaValue
-----------------------------------------------
1 | 2 | location | Dallas, Tx
2 | 2 | website | www.example.com
3 | 2 | avatar_lrg | 124.png
4 | 2 | avatar_thmb | 124_thumb.png
This to me could be a cleaner, more flexible setup (at least at first glance). For instance, if I need to allow a "user status message", I can do so without touching the database.
However, the user's avatars will be pulled far more than their profile information.
So I guess my real questions are:
What king of performance hit would this produce?
Is merging these tables just a really bad idea?
This is almost always a bad idea. What you are doing is a form of the Entity Attribute Value model. This model is sometimes necessary when a system needs a flexible attribute system to allow the addition of attributes (and values) in production.
This type of model is essentially built on metadata in lieu of real relational data. This can lead to referential integrity issues, orphan data, and poor performance (depending on the amount of data in question).
As a general matter, if your attributes are known up front, you want to define them as real data (i.e. actual columns with actual types) as opposed to string-based metadata.
In this case, it looks like users may have one large avatar and one small avatar, so why not make those columns on the user table?
We have a similar type of table at work that probably started with good intentions, but is now quite the headache to deal with. This is because it now has 100s of different "MetaKeys", and there is no good documentation about what is allowed and what each does. You basically have to look at how each is used in the code and figure it out from there. Thus, figure out how you will document this for future developers before you go down that route.
Also, to retrieve all the information about each user it is no longer a 1-row query, but an n-row query (where n is the number of fields on the user). Also, once you have that data, you have to post-process each of those based on your meta-key to get the details about your user (which usually turns out to be more of a development effort because you have to do a bunch of String comparisons). Next, many databases only allow a certain number of rows to be returned from a query, and thus the number of users you can retrieve at once is divided by n. Last, ordering users based on information stored this way will be much more complicated and expensive.
In general, I would say that you should make any fields that have specialized functionality or require ordering to be columns in your table. Since they will require a development effort anyway, you might as well add them as an extra column when you implement them. I would say your avatar pics fall into this category, because you'll probably have one of each, and will always want to display the large one in certain places and the small one in others. However, if you wanted to allow users to make their own fields, this would be a good way to do this, though I would make it another table that can be joined to from the user table. Below are the tables I'd suggest. I assume that "Status" and "Favorite Color" are custom fields entered by user 2:
User:
| Id | Name |Location | Website | avatarLarge | avatarSmall
----------------------------------------------------------------------
| 2 | iPityDaFu |Dallas, Tx | www.example.com | 124.png | 124_thumb.png
UserMeta:
Id | UserId | MetaKey | MetaValue
-----------------------------------------------
1 | 2 | Status | Hungry
2 | 2 | Favorite Color | Blue
I'd stick with the original layout. Here are the downsides of replacing your existing table structure with a big table of key-value pairs that jump out at me:
Inefficient storage - since the data stored in the metavalue column is mixed, the column must be declared with the worst-case data type, even if all you would need to hold is a boolean for some keys.
Inefficient searching - should you ever need to do a lookup from the value in the future, the mishmash of data will make indexing a nightmare.
Inefficient reading - reading a single user record now means doing an index scan for multiple rows, instead of pulling a single row.
Inefficient writing - writing out a single user record is now a multi-row process.
Contention - having mixed your user data and avatar data together, you've forced threads that only one care about one or the other to operate on the same table, increasing your risk of running into locking problems.
Lack of enforcement - your data constraints have now moved into the business layer. The database can no longer ensure that all users have all the attributes they should, or that those attributes are of the right type, etc.