SQL - Date Between (xxxx-06-21 and xxxx-09-21) - mysql

How is it possible to use the BETWEEN function in MySQL, to search for dates in any year but between a specific day/month to a specific day/month? Or if it's not possible using BETWEEN, how else could I accomplish it?
To be more descriptive, I am trying to add a seasonal search to my photo archive website. So if a user chose to search for "summer" photos, it would search photos taken between 21 June and 21 September, but from any year.
If Carlsberg made SQL, I think it would be :)
WHERE date BETWEEN 'xxxx-06-21' AND 'xxxx-09-21'
Many thanks

The solution given elsewhere, WHERE date.MONTH() || date.DAY() BETWEEN '0621' AND '0921', is a good one (and it's well worth upvoting since it's fine for most MySQL databases) but I'd like to point out that it (and most queries that involve per-row functions) won't scale that well to large tables.
Granted, my experience is that MySQL is not used that often for the sizes where it would make a huge difference but, in case it is, you should also consider the following.
A trick we've used in the past is to combine extra columns with insert/update triggers so that the cost of calculation is only incurred when necessary (when the data changes), rather than on every select.
Since the vast majority of databases are read far more often than written, this amortises that cost over all selects.
For example, add a new column CHAR(4) called MMDD and whack an index on it. Then set up insert/update triggers on the table so that the date column is used to set this new one, based on the formula already provided, date.MONTH() || date.DAY().
Then, when doing your query, skip the per-row functions and instead use:
WHERE MMDD BETWEEN '0621' AND '0921'
The fact that it's indexed will keep the speed blindingly fast, at the small cost of a trigger during insert/update and an extra column.
The cost of the trigger is irrelevant since it's less than the cost of doing it for every select operation. The extra storage required for a column is a downside but, if you examine all the questions people ask about databases, the ratio of speed problems to storage problems is rather high :-)
And, though this technically "breaks" 3NF in that you duplicate data, it's a time honoured tradition to do so for performance reasons, if you know what you're doing (ie, the triggers mitigate the "damage").

I'd think the easy way is to take the dates, turn them into strings, take the postfix, and run it through the BETWEEN:
WHERE date.MONTH() || date.DAY() BETWEEN '0621' AND '0921'

the RIGHT function may help
select
current_date,
right(current_date,5),
right(current_date,5) between '06-21' and '09-21' `IsTodaySummer?`,
'09-21' between '06-21' and '09-21' `Is 09-21 Summer?`;
+--------------+-----------------------+-----------------+-------------------+
| current_date | right(current_date,5) | IsTodaySummer? | Is 09-21 Summer? |
+--------------+-----------------------+-----------------+-------------------+
| 2011-09-29 | 09-29 | 0 | 1 |
+--------------+-----------------------+-----------------+-------------------+
but as ajeal said, add another column is better to use index.

If you really need to have this format of search,
I guess is better to break the date column into two,
year int(4) unsigned (which is the year)
month_day int(4) unsigned (which is mmdd)
build a composite index on month_day, year,
the idea is make index work better
example query like
where month_day between 621 and 921 -- make use on index
another example
where year=2011 and month_day between 900 and 930 -- make use on index too

There should be an equivalent of Oracle's to_date() function in MySQL.

You could pull the dates apart with the month and day functions:
where month(date_column) between 6 and 9
and if(month(date_column) = 6 and day(date_column) < 21, 0, 1)
and if(month(date_column) = 9 and day(date_column) > 21, 0, 1)
The if stuff is only needed because your boundaries don't line up nicely with the months.
If you're going to be doing a lot of this sort of thing then you might be better off precomputed the month-day version of the full date so that you can index it.

Since you don't need the year, subtract it from the date.
In MySQL, you can do something like this:
select * from some_table
where date_sub(date_col, interval year(date_col) year)
between '0000-06-21' and '0000-09-21';

Related

Making a groupby query faster

This is my data from my table:
I mean i have exactly one million rows so it is just a snippet.
I would like to make this query faster:
Which basically groups the values by time (ev represents year honap represents month and so on.). It has one problem that it takes a lot of time. I tried to apply indexes as you can see here:
but it does absolutely nothing.
Here is my index:
I have tried also to put the perc (which represents minute) due to cardinality but mysql doesnt want to use that. Could you give me any suggestions?
Is the data realistic? If so, why run the query -- it essentially delivers exactly what was in the table.
If, on the other hand, you had several rows per minute, then the GROUP BY makes sense.
The index you have is not worth using. However, the Optimizer seemed to like it. That's a bug.
In that case, I would simply this:
SELECT AVG(konyha1) AS 'avg',
LEFT(time, 16) AS 'time'
FROM onemilliondata
GROUP BY LEFT(time, 16)
A DATE or TIME or DATETIME can be treated as such a datatype or as a VARCHAR. I'm asking for it to be a string.
Even in this case, no index is useful. However, this would make it a little faster:
PRIMARY KEY(time)
and the table would have only 2 columns: time, konyha1.
It is rarely beneficial to break a date and/or time into components and put them into columns.
A million points will probably choke a graphing program. And the screen -- which has a resolution of only a few thousand.
Perhaps you should group by hour? And use LEFT(time, 13)? Performance would probably be slightly faster -- but only because less data is being sent to the client.
If you are collecting this data "forever", consider building and maintaining a "summary table" of the averages for each unit of time. Then the incremental effort is, say, aggregating yesterday's data each morning.
You might find MIN(konyha1) and MAX(konyha1) interesting to keep on an hourly or daily basis. Note that daily or weekly aggregates can be derived from hourly values.

How to design a database/table which adds a lot of rows every minute

I am in the situation where I need to store data for 1900+ cryptocurrencies every minute, i use MySQL innoDB.
Currently, the table looks like this
coins_minute_id | coins_minute_coin_fk | coins_minute_usd | coins_minute_btc | coins_minute_datetime | coins_minute_timestamp
coins_minute_id = autoincrement id
coins_minute_coin_fk = medium int unsigned
coins_minute_usd = decimal 20,6
coins_minute_btc = decimal 20,8
coins_minute_datetime = datetime
coins_minute_timestamp = timestamp
The table grew incredibly fast in the matter of no time, every minute 1900+ rows are added to the table.
The data will be used for historical price display as a D3.js line graph for each cryptocurrency.
My question is how do i optimize this database the best, i have thought of only collecting the data every 5 minutes instead of 1, but it will still add up to a lot of data in no time, i have also thought if it was better to create a unique table for each cryptocurrency, does any of you who loves to design databases know some other very smart and clever way to do stuff like this?
Kindly Regards
(From Comment)
SELECT coins_minute_coin_fk, coins_minute_usd
FROM coins_minutes
WHERE coins_minute_datetime >= DATE_ADD(NOW(),INTERVAL -1 DAY)
AND coins_minute_coin_fk <= 1000
ORDER BY coins_minute_coin_fk ASC
Get rid of coins_minute_ prefix; it clutters the SQL without providing any useful info.
Don't specify the time twice -- there are simple functions to convert between DATETIME and TIMESTAMP. Why do you have both 'created' and an 'updated' timestamps? Are you doing UPDATE statements? If so, then the code is more complicated than simply "inserting". And you need a unique key to know which row to update.
Provide SHOW CREATE TABLE; it is more descriptive that what you provided.
30 inserts/second is easily handled. 300/sec may have issues.
Do not PARTITION the table without some real reason to do so. The common valid reason is that you want to delete 'old' data periodically. If you are deleting after 3 months, I would build the table with PARTITION BY RANGE(TO_DAYS(...)) and use weekly partitions. More discussion: http://mysql.rjweb.org/doc.php/partitionmaint
Show us the queries. A schema cannot be optimized without knowing how it will be accessed.
"Batch" inserts are much faster than single-row INSERT statements. This can be in the form of INSERT INTO x (a,b) VALUES (1,2), (11,22), ... or LOAD DATA INFILE. The latter is very good if you already have a CSV file.
Does your data come from a single source? Or 1900 different sources?
MySQL and MariaDB are probably identical for your task. (Again, need to see queries.) PDO is fine for either; no recoding needed.
After seeing the queries, we can discuss what PRIMARY KEY to have and what secondary INDEX(es) to have.
1 minute vs 5 minutes? Do you mean that you will gather only one-fifth as many rows in the latter case? We can discuss this after the rest of the details are brought out.
That query does not make sense in multiple ways. Why stop at "1000"? The output is quite large; what client cares about that much data? The ordering is indefinite -- the datetime is not guaranteed to be in order. Why specify the usd without specifying the datetime? Please provide a rationale query; then I can help you with INDEX(es).

In MySql, is it worthwhile creating more than one multi-column indexes on the same set of columns?

I am new to SQL, and certainly to MySQL.
I have created a table from streaming market data named trade that looks like
date | time |instrument|price |quantity
----------|-----------------------|----------|-------|--------
2017-09-08|2017-09-08 13:16:30.919|12899586 |54.15 |8000
2017-09-08|2017-09-08 13:16:30.919|13793026 |1177.75|750
2017-09-08|2017-09-08 13:16:30.919|1346049 |1690.8 |1
2017-09-08|2017-09-08 13:16:30.919|261889 |110.85 |50
This table is huge (150 million rows per date).
To retrieve data efficiently, I have created an index date_time_inst (date,time,instrument) because most of my queries will select a specific date
or date range and then a time range.
But that does not help speed up a query like:
select * from trade where date="2017-09-08", instrument=261889
So, I am considering creating another index date_inst_time (date, instrument, time). Will that help speed up queries where I wish to get the time-series of one or a few instruments out of the thousands?
In additional database write-time due to index update, should I worry too much?
I get data every second, and take about 100 ms to process it and store in a database. As long as I continue to take less than 1 sec I am fine.
To get the most efficient query you need to query on a clustered index. According the the documentation this is automatically set on the primary key and can not be set on any other columns.
I would suggest ditching the date column and creating a composite primary key on time and instrument
A couple of recommendations:
There is no need to store date and time separately if time corresponds to time of the same date. You can instead have one datetime column and store timestamps in it
You can then have one index on datetime and instrument columns, that will make the queries run faster
With so many inserts and fixed format of SELECT query (i.e. always by date first, followed by instrument), I would suggest looking into other columnar databases (like Cassandra). You will get faster writes and reads for such structure
First, your use case sounds like two indexes would be useful (date, instrument) and (date, time).
Given your volume of data, you may want to consider partitioning the data. This involves storing different "shards" of data in different files. One place to start is with the documentation.
From your description, you would want to partition by date, although instrument is another candidate.
Another approach would be a clustered index with date as the first column in the index. This assumes that the data is inserted "in order", to reduce movement of the data on inserts.
You are dealing with a large quantity of data. MySQL should be able to handle the volume. But, you may need to dive into more advanced functionality, such as partitioning and clustered indexes to get the functionality you need.
Typo?
I assume you meant
select * from trade where date="2017-09-08" AND instrument=261889
^^^
Optimal index for such is
INDEX(instrument, date)
And, contrary to other Comments/Answers, it is better to have the date last, especially if you want more than one day.
Splitting date and time
It is usually a bad idea to split date and time. It is also usually a bad idea to have redundant data; in this case, the date is repeated. Instead, use
WHERE `time` >= "2017-09-08"
AND `time` < "2017-09-08" + INTERVAL 1 DAY
and get rid of the date column. Note: This pattern works for DATE, DATETIME, DATETIME(3), etc, without messing up with the midnight at the end of the range.
Data volume?
150M rows? 10 new rows per second? That means you have about 5 years' data? A steady 10/sec insertion rate is rarely a problem.
Need to see SHOW CREATE TABLE. If there are a lot of indexes, then there could be a problem. Need to see the datatypes to look for shrinking the size.
Will you be purging 'old' data? If so, we need to talk about partitioning for that specific purpose.
How many "instruments"? How much RAM? Need to discuss the ramifications of an index starting with instrument.
The query
Is that the main SELECT you use? Is it always 1 day? One instrument? How many rows are typically returned.
Depending on the PRIMARY KEY and whatever index is used, fetching 100 rows could take anywhere from 10ms to 1000ms. Is this issue important?
Millisecond resolution
It is usually folly to think that any time resolution is not going to have duplicates.
Is there an AUTO_INCREMENT already?
SPACE IS CHEAP. Indexes take time creating/inserting (once), but shave time retrieving (Many many times)
My experience is to create as many indexes with all the relevant fields in all orders. This way, Mysql can choose the best index for your query.
So if you have 3 relevant fields
INDEX 1 (field1,field2,field3)
INDEX 2 (field1,field3)
INDEX 3 (field2,field3)
INDEX 4 (field3)
The first index will be used when all fields are present. The others are for shorter WHERE conditions.
Unless you know that some combinations will never be used, this will give MySQL the best chance to optimize your query. I'm also assuming that field1 is the biggest driver of the data.

MySql date inside varchar (select correct date)?

I have "varchar" field in my database where I have stored records like:
(11.1.2015) Log info 1
(17.4.2015) Log info 2
(22.5.2015) Log info 3
(25.5.2015) Log info 3
...
Now I would like to make SELECT WHERE date inside () is the same or larger than today and select the first one (so in this sample and todays date I should get 22.5.205). I just can't figure out how to do that, so I need some help.
In principle I agree with Pekka웃 on this one.
You should always strive to use proper data types for your data.
This also means never use one column to store 2 different data segments.
However, from the comments to Pekka웃's answer I understand that changing the table is not possible, so here's my attempt to do it.
Assuming your dates are always at the start of the varchar, and always surrounded by parenthesis, you can probably do something like this:
SELECT *
FROM (
SELECT
CAST(SUBSTR(Log_data, 2, LOCATE(')', Log_data)-1) as date) LogDate,
SUBSTR(Log_data, LOCATE(')', Log_data)+1, CHAR_LENGTH(Log_data)) LogData
FROM logs_table
) NormalizedLogTable
WHERE LogDate >= CURDATE()
Limit 1
See sql fiddle here.
Note #1: This is a workaround for your specific situation.
If you ever get the chance, you should normalize your table.
Note #2 I'm not a MySql guy. Most of my sql experience is with Sql server.
You can probably find a better way to convert strings to date then just using cast, to overcome the ambiguity of values like 1.3.2015.
This is likely to be impossible to do with a varchar field, or hellishly complex. While you can theoretically use Regex functions in mySQL to match patterns, you are looking for a date range. Even if it were possible to build a query somehow, it would be a gigantic pile of work and there is no way mySQL could optimize for any aspect of it.
The normal, fast, and straightforward way to go about this is normalizing your table.
In your case, you would probably create a new table named "logs" and connect it to your existing table through an ID field that shows which parent record each log entry belongs to.
Querying for a certain date range for log entries belonging to a specific parent then become as easy as
SELECT log_date FROM logs WHERE parent = 155 AND log_date >= CURDATE()
It's painful to do at first (as you have to rebuild parts of your structure and likely, your app) and makes some everyday queries more complex, but cases like this become much easier and faster.

Is there an indexable way to store several bitfields in MySQL?

I have a MySQL table which needs to store several bitfields...
notification.id -- autonumber int
association.id -- BIT FIELD 1 -- stores one or more association ids (which are obtained from another table)
type.id -- BIT FIELD 2 -- stores one or more types that apply to this notification (again, obtained from another table)
notification.day_of_week -- BIT FIELD 3 -- stores one or more days of the week
notification.target -- where to send the notification -- data type is irrelevant, as we'll never index or sort on this field, but
will probably store an email address.
My users will be able to configure their notifications to trigger on one or more days, in one or more associations, for one or more types. I need a quick, indexable way to store this data.
Bit fields 1 and 2 can expand to have more values than they do presently. Currently 1 has values as high as 125, and 2 has values as high as 7, but both are expected to go higher.
Bit field 3 stores days of the week, and as such, will always have only 7 possible values.
I'll need to run a script frequently (every few minutes) that scans this table based on type, association, and day, to determine if a given notification should be sent. Queries need to be fast, and the simpler it is to add new data, the better. I'm not above using joins, subqueries, etc as needed, but I can't imagine these being faster.
One last requirement -- if I have 1000 different notifications stored in here, with 125 association possibilities, 7 types, and 7 days of the week, the combination of records is too high for my taste if just using integers, and storing multiple copies of the row, instead of using bit fields, so it seems like using bit fields is a requirement.
However, from what I've heard, if I wanted to select everything from a particular day of the week, say Tuesday (b0000100 in a bit field, perhaps), bit fields are not indexed such that I can do...
SELECT * FROM \`mydb\`.\`mytable\` WHERE \`notification.day_of_week\` & 4 = 4;
This, from my understanding, would not use an index at all.
Any suggestions on how I can do this, or something similar, in an indexable fashion?
(I work on a pretty standard LAMP stack, and I'm looking for specifics on how the MySQL indexing works on this or a similar alternative.)
Thanks!
There's no "good" way (that I know of) to accomplish what you want to.
Note that the BIT datatype is limited to a size of 64 bits.
For bits that can be statically defined, MySQL provides the SET datatype, which is in some ways the same as BIT, and in other ways it is different.
For days of the week, for example, you could define a column
dow SET('SUN','MON','TUE','WED','THU','FRI','SAT')
There's no builtin way (that I know of of getting the internal bit represntation back out, but you can add a 0 to the column, or cast to unsigned, to get a decimal representation.
SELECT dow+0, CONVERT(dow,UNSIGNED), dow, ...
1 1 SUN
2 2 MON
3 3 SUN,MON
4 4 TUE
5 5 SUN,TUE
6 6 MON,TUE
7 7 SUN,MON,TUE
It is possible for MySQL to use a "covering index" to satisfy a query with a predicate on a SET column, when the SET column is the leading column in the index. (i.e. EXPLAIN shows 'Using where; Using index') But MySQL may be performing a full scan of the index, rather than doing a range scan. (And there may be differences between the MyISAM engine and the InnoDB engine.)
SELECT id FROM notification WHERE FIND_IN_SET('SUN',dow)
SELECT id FROM notification WHERE (dow+0) MOD 2 = 1
BUT... this usage is non-standard, and can't really be recommended. For one thing, this behavior is not guaranteed, and MySQL may change this behavior in a future release.
I've done a bit more research on this, and realized there's no way to get the indexing to work as I outlined above. So, I've created an auxiliary table (somewhat like the WordPress meta table format) which stores entries for day of week, etc. I'll just join these tables as needed. Fortunately, I don't anticipate having more than ~10,000 entries at present, so it should join quickly enough.
I'm still interested in a better answer if anyone has one!