I was wondering if there is any way to solve this.
So my row has an column of type date which increments with 1 day daily ( untill the end of the respective month ). At the beggining of a new month a new row has to be generated and the update will start again untill the and of that month, and so on..
Here's a way to think about the problem in MySQL's dialect of SQL.
First, you need a function that changes a datestamp into a value that's unique for each month. That is LAST_DAY(datestamp). It generates DATETIME values like 2017-09-30 00:00:00 from arbitrary inputs.
Next, you can exploit MySQL's INSERT ... ON DUPLICATE KEY UPDATE capability. You will create a table months with, say, these columns
month_ending DATETIME
category VARCHAR(20)
sum_of_input INT
Then you make month_ending, category into a unique compound index.
Then you do something like this
INSERT INTO months /* warning! not debugged! */
(month_ending, category, sum_of_input)
VALUES (LAST_DAY(?date), ?category, ?value)
ON DUPLICATE KEY
UPDATE months
SET sum_of_input = sum_of_input + ?value
WHERE month_ending=LAST_DAY(?date)
AND category=?category
However, this has the hallmarks of a big, hard to debug, pain in the neck. It make make more sense to use features inside your ETL system to do this summarizing work.
Related
In my database, a table has a column with an integer that needs to increment every day, counting the days that have passed from a date.
Is there any way I can do this?
I know Auto Increment exists, but I don't know if it fits for this occasion.
I found a solution using mysql events, but now I'm having trouble with the syntax.
PHPMyadmin gives me a form to complete.
https://imgur.com/Lhru1ZJ
I'm having trouble because I don't know what informations to put into it.
The best way to do this is to compute the elapsed days in a query, not to update the table every day.
For example, suppose you have a table with columns id and start_date.
This query gives you the elapsed days.
SELECT id,
DATEDIFF(CURDATE(), start_date) elapsed
FROM tbl
Doing it this way is better than changing the table every day, for several reasons.
It always works even if the event doesn't fire for some reason.
Updating an entire table can get more and more expensive as the table grows.
The computational cost of computing the elapsed days is far less than the computational cost of updating every day.
If you happen to use incremental backups, updating the whole table defeats that.
It's a good practice to avoid representing the same information multiple times in the database.
You can also add a generated (virtual) column to the table or use a VIEW.
You should have a look at event schedulers in MySQL, you could use them to run a job that increments your values once a day.
MySQL Event Scheduler
The following example creates a recurring event that updates a row in a table.
First, create a new table named counter.
CREATE TABLE counter (
id INT PRIMARY KEY AUTO_INCREMENT,
counts VARCHAR(255) NOT NULL,
created_at DATETIME NOT NULL
);
Second, create an event using the CREATE EVENT statement:
CREATE EVENT IF NOT EXISTS count_event
ON SCHEDULE AT EVERY interval STARTS timestamp [+INTERVAL] ENDS timestamp [+INTERVAL]
DO
UPDATE counter SET counts = counts + YOUR_COUNT WHERE id = YOUR_ID;
Replace interval timestamps, your_count and your_id with real variables
As an example let's say I want to save the lines of code I've written on specific days of different months in a MySQL database using Hibernate. On the 5th of Janurary, I write 100 lines of code. I have not previously written any lines of code on this day, so it simply inserts the data.
A year passes and it's again the 5th of January. I write 50 lines of code this day, and I try to insert it into the database. The ID of day 5 and month 1 already exists, so it can't insert the data. In this case I would like it to simply add the lines of code the the already existing row, so it now says:
DAY. MONTH. LOC.
5. 1. 150.
What I'm currently doing is that every time I want to insert new data, I'm first making a SELECT query to check if the row already exists. If it does I create an UPDATE statement, otherwise I just create an INSERT statement. I'm working with a lot of data and have noticed that the operations involving my database are starting to take a long time. Therefore, I was wondering if there is a better, more efficient way to do this? Help is much appreciated. Thank you.
You seem to be looking for the update... on duplicate key syntax. This would look like:
insert into mytable(day, month, loc)
values(:day, :month, :loc)
on duplicate key update loc = loc + values(loc)
:day, :month and :loc represent the parameters that you give for insert.
For this to work, you need a unique constraint on (day, month) (or make that tuple of columns the primary key of your table).
When an insert occurs that would violate the unique constraint, MySQL goes to the on duplicate key clause, where we add the new loc value to the one already there in the existing row.
I have a table with a column containing unix time. I wish to create a new column that contains the day of the week for this time. For example, 1436160600 would be a Monday in this column.
I have created a new column, entitled "day_of_week"
alter table master add column test varchar(20);
I now wish to update this new column with the appropriate values.
I found the MySQL Unixtimestamp() function (http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_unix-timestamp)
I then attempted the following
update master set day_of_week = _sent_time_stamp(from_unixtime(unix_timestamp, %W));
where _sent_time_stamp is the column containing the Unix time values
But this results in an Error 1064.
Can anyone advise?
Solution. Convert epoch to date time
alter table master add column test_date datetime ;
update master set test_date = from_unixtime(_sent_time_stamp) ;
convert datetime to day of week using dayname function
alter table master add column test_day varchar(20) ;
update master set test_day = dayname(test_date) ;
I know this post is old, but the accepted answer is sadly wasteful, and I hope that future people seeking this answer may be more enlightened.
No need to add a new column to the table just for some temporary value. To achieve what you requested, you can simply do this:
UPDATE master
SET test_day = dayname(from_unixtime(_sent_time_stamp)) ;
However, even the goal is a wasteful in that we're simply storing two representations of the same data. What you can do instead, is to create a view:
CREATE VIEW master_vw AS
(SELECT mstr.*, DAYNAME(FROM_UNIXTIME(mstr._sent_time_stamp)) AS test_day
FROM master mstr) ;
Now, you can SELECT from this view anytime you like, and the value of test_day will always be in sync with the value of _sent_time_stamp. And no new column to maintain and whatnot.
There is a use case for actually storing the test_day column - execution of the view will take a miniscule amount of additional processing versus selecting from a table. And you cannot index over the virtual test_day column like you could in a table. So if you have millions of rows and you need to quickly get one that's (say) 'Saturday' then perhaps the table approach is more ideal.
But in cases where the column is just a convenience, or where it is simply a different representation of data that already exists, you'll do well to consider the View approach.
I archive last years table and create a new table at the beginning of each year. I'd like to find a way to have one multi-year table so I don't have to manually change anything each year. Columns are: row (unique),date(primary),col1,col2,col3.. Users will type data (every column) into a form. I wanted to have a 'years' column that would be populated from year(date) with composite primary key(row,years). I also need primary key (row,year(date)). So each year we could start with row 1 and a new year. I even looked at and tried insert update triggers but I don't think that's the answer. What do you think? Is this too vague?
I did something similar in one of my Firebird databases (you didn't say which db you are using). First of all, you should have a primary key which is generated automatically - this will make life more simple. Secondly, you need one generator per year - or a table which has one field per year.
In this model, you have a table with two fields: year (primary key) and 'curvalue'. Every year you insert a new tuple such as (2012,0) or (2013, 0). You query this table to find the current value of the 'curvalue' field for the given year, then increment the curvalue field and use the incremented value in the tuple which you are inserting.
Using generators guarantees that you will always receive a unique number; using a table as I wrote above could cause problems if one row starts the process and then another row arrives.
I have a large table containing hourly statistical data broken down across a number of dimensions. It's now large enough that I need to start aggregating the data to make queries faster. The table looks something like:
customer INT
campaign INT
start_time TIMESTAMP
end_time TIMESTAMP
time_period ENUM(hour, day, week)
clicks INT
I was thinking that I could, for example, insert a row into the table where campaign is null, and the clicks value would be the sum of all clicks for that customer and time period. Similarly, I could set the time period to "day" and this would be the sum of all of the hours in that day.
I'm sure this is a fairly common thing to do, so I'm wondering what the best way to achieve this in MySql? I'm assuming an INSERT INTO combined with a SELECT statement (like with a materialized view) - however since new data is constantly being added to this table, how do I avoid re-calculating aggregate data that I've previously calculated?
I done something similar and here is the problems I have deal with:
You can use round(start_time/86400)*86400 in "group by" part to get summary of all entries from same day. (For week is almost the same)
The SQL will look like:
insert into the_table
( select
customer,
NULL,
round(start_time/86400)*86400,
round(start_time/86400)*86400 + 86400,
'day',
sum(clicks)
from the_table
where time_period = 'hour' and start_time between <A> and <B>
group by customer, round(start_time/86400)*86400 ) as tbl;
delete from the_table
where time_period = 'hour' and start_time between <A> and <B>;
If you going to insert summary from same table to itself - you will use temp (Which mean you copy part of data from the table aside, than it dropped - for each transaction). So you must be very careful with the indexes and size of data returned by inner select.
When you constantly inserting and deleting rows - you will get fragmentation issues sooner or later. It will slow you down dramatically. The solutions is to use partitioning & to drop old partitions from time to time. Or you can run "optimize table" statement, but it will stop you work for relatively long time (may be minutes).
To avoid mess with duplicate data - you may want to clone the table for each time aggregation period (hour_table, day_table, ...)
If you're trying to make the table smaller, you'll be deleting the detailed rows after you make the summary row, right? Transactions are your friend. Start one, compute the rollup, insert the rollup, delete the detailed rows, end the transaction.
If you happen to add more rows for an older time period (who does that??), you can run the rollup again - it will combine your previous rollup entry with your extra data into a new, more powerful, rollup entry.