Is there an indexable way to store several bitfields in MySQL? - mysql

I have a MySQL table which needs to store several bitfields...
notification.id -- autonumber int
association.id -- BIT FIELD 1 -- stores one or more association ids (which are obtained from another table)
type.id -- BIT FIELD 2 -- stores one or more types that apply to this notification (again, obtained from another table)
notification.day_of_week -- BIT FIELD 3 -- stores one or more days of the week
notification.target -- where to send the notification -- data type is irrelevant, as we'll never index or sort on this field, but
will probably store an email address.
My users will be able to configure their notifications to trigger on one or more days, in one or more associations, for one or more types. I need a quick, indexable way to store this data.
Bit fields 1 and 2 can expand to have more values than they do presently. Currently 1 has values as high as 125, and 2 has values as high as 7, but both are expected to go higher.
Bit field 3 stores days of the week, and as such, will always have only 7 possible values.
I'll need to run a script frequently (every few minutes) that scans this table based on type, association, and day, to determine if a given notification should be sent. Queries need to be fast, and the simpler it is to add new data, the better. I'm not above using joins, subqueries, etc as needed, but I can't imagine these being faster.
One last requirement -- if I have 1000 different notifications stored in here, with 125 association possibilities, 7 types, and 7 days of the week, the combination of records is too high for my taste if just using integers, and storing multiple copies of the row, instead of using bit fields, so it seems like using bit fields is a requirement.
However, from what I've heard, if I wanted to select everything from a particular day of the week, say Tuesday (b0000100 in a bit field, perhaps), bit fields are not indexed such that I can do...
SELECT * FROM \`mydb\`.\`mytable\` WHERE \`notification.day_of_week\` & 4 = 4;
This, from my understanding, would not use an index at all.
Any suggestions on how I can do this, or something similar, in an indexable fashion?
(I work on a pretty standard LAMP stack, and I'm looking for specifics on how the MySQL indexing works on this or a similar alternative.)
Thanks!

There's no "good" way (that I know of) to accomplish what you want to.
Note that the BIT datatype is limited to a size of 64 bits.
For bits that can be statically defined, MySQL provides the SET datatype, which is in some ways the same as BIT, and in other ways it is different.
For days of the week, for example, you could define a column
dow SET('SUN','MON','TUE','WED','THU','FRI','SAT')
There's no builtin way (that I know of of getting the internal bit represntation back out, but you can add a 0 to the column, or cast to unsigned, to get a decimal representation.
SELECT dow+0, CONVERT(dow,UNSIGNED), dow, ...
1 1 SUN
2 2 MON
3 3 SUN,MON
4 4 TUE
5 5 SUN,TUE
6 6 MON,TUE
7 7 SUN,MON,TUE
It is possible for MySQL to use a "covering index" to satisfy a query with a predicate on a SET column, when the SET column is the leading column in the index. (i.e. EXPLAIN shows 'Using where; Using index') But MySQL may be performing a full scan of the index, rather than doing a range scan. (And there may be differences between the MyISAM engine and the InnoDB engine.)
SELECT id FROM notification WHERE FIND_IN_SET('SUN',dow)
SELECT id FROM notification WHERE (dow+0) MOD 2 = 1
BUT... this usage is non-standard, and can't really be recommended. For one thing, this behavior is not guaranteed, and MySQL may change this behavior in a future release.

I've done a bit more research on this, and realized there's no way to get the indexing to work as I outlined above. So, I've created an auxiliary table (somewhat like the WordPress meta table format) which stores entries for day of week, etc. I'll just join these tables as needed. Fortunately, I don't anticipate having more than ~10,000 entries at present, so it should join quickly enough.
I'm still interested in a better answer if anyone has one!

Related

How to design a database/table which adds a lot of rows every minute

I am in the situation where I need to store data for 1900+ cryptocurrencies every minute, i use MySQL innoDB.
Currently, the table looks like this
coins_minute_id | coins_minute_coin_fk | coins_minute_usd | coins_minute_btc | coins_minute_datetime | coins_minute_timestamp
coins_minute_id = autoincrement id
coins_minute_coin_fk = medium int unsigned
coins_minute_usd = decimal 20,6
coins_minute_btc = decimal 20,8
coins_minute_datetime = datetime
coins_minute_timestamp = timestamp
The table grew incredibly fast in the matter of no time, every minute 1900+ rows are added to the table.
The data will be used for historical price display as a D3.js line graph for each cryptocurrency.
My question is how do i optimize this database the best, i have thought of only collecting the data every 5 minutes instead of 1, but it will still add up to a lot of data in no time, i have also thought if it was better to create a unique table for each cryptocurrency, does any of you who loves to design databases know some other very smart and clever way to do stuff like this?
Kindly Regards
(From Comment)
SELECT coins_minute_coin_fk, coins_minute_usd
FROM coins_minutes
WHERE coins_minute_datetime >= DATE_ADD(NOW(),INTERVAL -1 DAY)
AND coins_minute_coin_fk <= 1000
ORDER BY coins_minute_coin_fk ASC
Get rid of coins_minute_ prefix; it clutters the SQL without providing any useful info.
Don't specify the time twice -- there are simple functions to convert between DATETIME and TIMESTAMP. Why do you have both 'created' and an 'updated' timestamps? Are you doing UPDATE statements? If so, then the code is more complicated than simply "inserting". And you need a unique key to know which row to update.
Provide SHOW CREATE TABLE; it is more descriptive that what you provided.
30 inserts/second is easily handled. 300/sec may have issues.
Do not PARTITION the table without some real reason to do so. The common valid reason is that you want to delete 'old' data periodically. If you are deleting after 3 months, I would build the table with PARTITION BY RANGE(TO_DAYS(...)) and use weekly partitions. More discussion: http://mysql.rjweb.org/doc.php/partitionmaint
Show us the queries. A schema cannot be optimized without knowing how it will be accessed.
"Batch" inserts are much faster than single-row INSERT statements. This can be in the form of INSERT INTO x (a,b) VALUES (1,2), (11,22), ... or LOAD DATA INFILE. The latter is very good if you already have a CSV file.
Does your data come from a single source? Or 1900 different sources?
MySQL and MariaDB are probably identical for your task. (Again, need to see queries.) PDO is fine for either; no recoding needed.
After seeing the queries, we can discuss what PRIMARY KEY to have and what secondary INDEX(es) to have.
1 minute vs 5 minutes? Do you mean that you will gather only one-fifth as many rows in the latter case? We can discuss this after the rest of the details are brought out.
That query does not make sense in multiple ways. Why stop at "1000"? The output is quite large; what client cares about that much data? The ordering is indefinite -- the datetime is not guaranteed to be in order. Why specify the usd without specifying the datetime? Please provide a rationale query; then I can help you with INDEX(es).

Generate future reservation dates in MySQL

EDIT: As to avoid confusion because of the word "tables" having two meanings: Every time I refer to "100 tables", I'm refering to 100 physical tables in a single business available for booking each day.
I've come to the conclusion that for a table-booking system such as the one I'm trying to develop, a single MySQL table with a unique index made up of tableid and date will suffice, meaning I can have my table reservations in a single table and according to my research store at least +100 years into the future without any performance issues. Please correct me if I'm wrong.
Further explained: I have a set of let's say 100 bookable tables just to not run out of tables (for this project I will rarely require more than 30, but you never know). Each table is numbered 1-100 and the combination of table number (tableid) and the date is a unique entry in the database. I.e. you can only have the row of table 4 on date 2014-06-18 once. That's fine, and I can just generate 100 rows for each day for the next 100 years, yes?
I use a BIGINT as Primary Key for each row with Auto-Increment starting at 1.
Now - what is the easiest solution to generating all these rows in MySQL? Each row just needs to have INSERT INTO tables (TABLEID,DATE) VALUES ([id],[date]) as the rest of the fields populate by default. Dates can just start from today as this project has no business in the past. I did some Googling but cannot really figure out the difference between a script and a stored procedure, and the variable declaration for each seems to be different and confuses me a bit.
This should be fairly simply though, and the question is also aimed at whether this approach is a good practice or not.
Thanks

DATABASE optimization insert and search

I was having an argument with a friend of mine. Suppose we have a db table with a userid and some other fields. This table might have a lot of rows. Let's suppose also that by design we limit the records for each userid in the table to about 50.My friend suggested that if I under every row for each userid one after another the lookup would be faster e.g
userid otherfield
1 .........
1 .........
.....until 50...
2 ........
etc. So when a user id 1 is created I pre-popopulate the 50 table's rows to with null values...etc. The idea is that if I know the amount of rows and find the first row with userid =1 I just have to look the next 49 an voila I don't have to search the whole table. Is this correct?can this be done without indexing? Is the pre-population an expensive process?Is there a performance difference if I just inserted in old fashioned way like
1 ........
2 ........
2 ........
1 ........
etc?
To answer a performance question like this, you should run performance tests on the different configurations.
But, let me make a few points.
First, although you might know that the records for a given id are located next to each other, the database does not know this. So, if you are searching for one user -- without an index -- then the engine needs to search through all the records (unless you have a limit clause in the query).
Second, if the data is fixed length (numeric and dates), the populating it with values after populating it with NULL values will occupy the same space on the page. But, if the data is variable length, then a given page will be filled with empty records. When you modify the records with real values, you will get page split.
What you are trying to do is to outsmart the database engine. This isn't necessary, because MySQL provides indexes, which provide almost all the benefits that you are describing.
Now, having said that, there is some performance benefit from having all the records for a user being co-located. If a user has 50 records, then reading the records with an index would typically require loading 50 pages into memory. If the records are co-located, then only one or two records would need to be read. Typically, this would be a very small performance gain, because most frequently accessed tables fit into memory. There might be some circumstances where the performance gain is worth it.

mySQL SELECT rows where a specific bit of an integer is set

i have to do a select query in a posting table where a specific bit of an integer is set.
The integer represents a set of categories in a bitmask:
E.g.
1 => health
2 => marketing
3 => personal
4 => music
5 => video
6 => design
7 => fashion
8 => ......
Data example:
id | categories | title
1 | 11 | bla bla
2 | 48 | blabla, too
I need a mysql query that selects postings, that are marked with a specific category.
Let's say "all video postings"
This means i need a result set of postings where the 5th bit of the catgories column is set (e.g. 16,17,48 ....)
SELECT * FROM postings WHERE ....????
Any ideas ?
You can use bitwise operators like this. For video (bit 5):
WHERE categories & 16 = 16
Substitute the value 16 using the following values for each bit:
1 = 1
2 = 2
3 = 4
4 = 8
5 = 16
6 = 32
7 = 64
8 = 128
This goes from least significant bit to highest, which is opposite of the way most programmers think. They also start at zero.
How about
SELECT * FROM postings WHERE (categories & 16) > 0; -- 16 is 5th bit over
One issue with this is you probably won't hit an index, so you could run into perf issues if it's a large amount of data.
Certain databases (such as PostgreSQL) let you define an index on an expression like this. I'm not sure if mySQL has this feature. If this is important, you might want to consider breaking these out into separate Boolean columns or a new table.
SQL (not just mySQL) is not suitable for bitwise operations. If you do a bitwise AND you will force a table scan as SQL will not be able to use any index and will have to check each row one at a time.
It would be better if you created a separate "Categories" table and a properly indexed many-to-many PostingCategories table to connect the two.
UPDATE
For people insisting that bitmap fields aren't an issue, it helps to check Joe Celko's BIT of a Problem.  At the bottom of the article is a list of serious problems caused by bitmaps.
Regarding the comment that a blanket statement can't be right, note #10 - it breaks 1NF so yes, bitmap fields are bad:
The data is unreadable. ...
Constraints are a b#### to write....
You are limited to two values per field. That is very restrictive; even the ISO sex code cannot fit into such a column...
There is no temporal element to the bit mask (or to single bit flags). For example, a flag “is_legal_adult_flg” ... A DATE for the birth date (just 3 bytes) would hold complete fact and let us compute what we need to know; it would always be correct, too. ...
You will find out that using the flags will tend to split the status of an entity over multiple tables....
Bit flags invite redundancy. In the system I just mentioned, we had “is_active_flg” and “is_completed_flg” in in the same table. A completed auction is not active and vice verse. It is the same fact in two flags. Human psychology (and the English language) prefers to hear an affirmative wording (remember the old song “Yes, we have no bananas today!” ?).
All of these bit flags, and sequence validation are being replaced by two sets of state transition tables, one for bids and one for shipments. For details on state transition constraints. The history of each auction is now in one place and has to follow business rules.
By the time you disassemble a bit mask column, and throw out the fields you did not need performance is not going to be improved over simpler data types. 
Grouping and ordering on the individual fields is a real pain. Try it.
You have to index the whole column, so unless you luck up and have them in the right order, you are stuck with table scans.
Since a bit mask is not in First Normal Form (1NF), you have all the anomalies we wanted to avoid in RDBMS.
I'd also add, what about NULLs? What about missing flags? What if something is neither true or false?
Finally, regarding the compression claim, most databases pack bit fields into bytes and ints internally. The bitmap field doesn't offer any kind of compression in this case. Other databases (eg PostgreSQL) actually have a Boolean type that can be true/false/unknown. It may take 1 byte but that's not a lot of storage and transparent compression is available if a table gets too large.
In fact, if a table gets large the bitmap fields problems become a lot more serious. Saving a few MBs in a GB table is no gain if you are forced to use table scans, or if you lose the ability to group

SQL - Date Between (xxxx-06-21 and xxxx-09-21)

How is it possible to use the BETWEEN function in MySQL, to search for dates in any year but between a specific day/month to a specific day/month? Or if it's not possible using BETWEEN, how else could I accomplish it?
To be more descriptive, I am trying to add a seasonal search to my photo archive website. So if a user chose to search for "summer" photos, it would search photos taken between 21 June and 21 September, but from any year.
If Carlsberg made SQL, I think it would be :)
WHERE date BETWEEN 'xxxx-06-21' AND 'xxxx-09-21'
Many thanks
The solution given elsewhere, WHERE date.MONTH() || date.DAY() BETWEEN '0621' AND '0921', is a good one (and it's well worth upvoting since it's fine for most MySQL databases) but I'd like to point out that it (and most queries that involve per-row functions) won't scale that well to large tables.
Granted, my experience is that MySQL is not used that often for the sizes where it would make a huge difference but, in case it is, you should also consider the following.
A trick we've used in the past is to combine extra columns with insert/update triggers so that the cost of calculation is only incurred when necessary (when the data changes), rather than on every select.
Since the vast majority of databases are read far more often than written, this amortises that cost over all selects.
For example, add a new column CHAR(4) called MMDD and whack an index on it. Then set up insert/update triggers on the table so that the date column is used to set this new one, based on the formula already provided, date.MONTH() || date.DAY().
Then, when doing your query, skip the per-row functions and instead use:
WHERE MMDD BETWEEN '0621' AND '0921'
The fact that it's indexed will keep the speed blindingly fast, at the small cost of a trigger during insert/update and an extra column.
The cost of the trigger is irrelevant since it's less than the cost of doing it for every select operation. The extra storage required for a column is a downside but, if you examine all the questions people ask about databases, the ratio of speed problems to storage problems is rather high :-)
And, though this technically "breaks" 3NF in that you duplicate data, it's a time honoured tradition to do so for performance reasons, if you know what you're doing (ie, the triggers mitigate the "damage").
I'd think the easy way is to take the dates, turn them into strings, take the postfix, and run it through the BETWEEN:
WHERE date.MONTH() || date.DAY() BETWEEN '0621' AND '0921'
the RIGHT function may help
select
current_date,
right(current_date,5),
right(current_date,5) between '06-21' and '09-21' `IsTodaySummer?`,
'09-21' between '06-21' and '09-21' `Is 09-21 Summer?`;
+--------------+-----------------------+-----------------+-------------------+
| current_date | right(current_date,5) | IsTodaySummer? | Is 09-21 Summer? |
+--------------+-----------------------+-----------------+-------------------+
| 2011-09-29 | 09-29 | 0 | 1 |
+--------------+-----------------------+-----------------+-------------------+
but as ajeal said, add another column is better to use index.
If you really need to have this format of search,
I guess is better to break the date column into two,
year int(4) unsigned (which is the year)
month_day int(4) unsigned (which is mmdd)
build a composite index on month_day, year,
the idea is make index work better
example query like
where month_day between 621 and 921 -- make use on index
another example
where year=2011 and month_day between 900 and 930 -- make use on index too
There should be an equivalent of Oracle's to_date() function in MySQL.
You could pull the dates apart with the month and day functions:
where month(date_column) between 6 and 9
and if(month(date_column) = 6 and day(date_column) < 21, 0, 1)
and if(month(date_column) = 9 and day(date_column) > 21, 0, 1)
The if stuff is only needed because your boundaries don't line up nicely with the months.
If you're going to be doing a lot of this sort of thing then you might be better off precomputed the month-day version of the full date so that you can index it.
Since you don't need the year, subtract it from the date.
In MySQL, you can do something like this:
select * from some_table
where date_sub(date_col, interval year(date_col) year)
between '0000-06-21' and '0000-09-21';