How to create summary tables efficiently - mysql

During a month a process inserts a large number of rows in some database tables ~1M.
This happens daily and the whole process lasts ~40mins. That is fine.
I created some "summary tables" from these inserts so as to query the data fast. This works fine.
Problem: I keep inserting data in the summary tables and so the time to create the cache table matches the process to insert the actual data and this is good. But if the data inserted in the previous days have changed (due to any updates) then I would need to "recalculate" the previous days and to solve this instead of creating today's summary data daily I would need to change my process to recreate the summary data from the beginning of each month which would mean my running time would increase substantially.
Is there a standard way to deal with this problem?

We had a similar problem in our system, which we solved by generating a summary table holding each day's summary.
Whenever an UPDATE/INSERT changes the base tables the summary table is updated.. this will of course slow down these operations but keeps the summary table completely up to date.
This can be done using TRIGGERs, but as the operations are in one place, we just do it manually in a TRANSACTION.
One advantage of this approach is that there is no need to run a cron job to refresh/create the summary table.
I understand that this may not be applicable/feasible for your situation.

Related

How to achieve zero downtime in ETl

I have an ETL process which takes data from transaction db and keeps after processing stores the data to another DB. While storing the data we are truncating the old data and storing new data to have better performance, as update takes a lot of time than truncate insert. So in this process we experience counts as 0 or wrong data for some time (like for 2 3 mins). We are running the ETL in every 8 hours.
So how can we avoid this problem? How can we achieve zero downtime?
One way we did use in the past was to prepare the prod data in a table named temp. Then when finished (and checked, that was the lengthy part in our process), drop prod and rename the temp in prod.
Takes almost no time, and the process was successful even in case some other users were locking the table.

MySQL "pileup" when importing rows

I have the following cron process running every hour to update global game stats:
Create temporary table
For each statistic, insert rows into the temporary table (stat key, user, score, rank)
Truncate main stats table
Copy data from temporary table to main table
The last step causes massive backlog in queries. Looking at SHOW PROCESSLIST I see a bunch of updating-status queries that are stuck until the copy completes (which may take up to a minute).
However I did notice that it's not like it has consecutive query IDs piling up, many queries get completed just fine. So it almost seems like it's a "thread" that gets stuck or something. Also of note is that the stuck updates have nothing in common with the ongoing copy (different tables, etc)
So:
Can I have cron connect to MySQL on a dedicated "thread" such that its disk activity (or whatever it is) doesn't lock other updates, OR
Am I misinterpreting what's going on, and if so how can I find out what the actual case is?
Let me know if you need any more info.
MySQL threads are not perfectly named. If you're a Java dev, for example, you might make some untrue assumptions about MySQL threads based on your Java knowledge.
For some reason that's hard to diagnose from a distance, your copy step is blocking some queries from completing. If you're curious about which ones try doing
SHOW FULL PROCESSLIST
and try to make sense of the result.
In the meantime, you might consider a slightly different approach to refreshing these hourly stats.
create a new, non temporary table, calling it something like stats_11 for the 11am update. If the table with that name already existed, drop the old one first.
populate that table as needed.
add the indexes it needs. Sometimes populating the table is faster if the indexes aren't in place while you're doing it.
create or replace view stats as select * from stats_11
Next hour, do the same with stats_12. The idea is to have your stats view pointing to a valid stats table almost always.
This should reduce your exposure time to the stats-table building operaiton.
If the task is to completely rebuild the table, this is the best:
CREATE TABLE new_stats LIKE stats;
... fill up new_stats by whatever means ...
RENAME TABLE stats TO old_stats, new_stats TO stats;
DROP TABLE old_stats;
There is zero interference because table real is always available and always has a complete set of rows. (OK, RENAME does take a minuscule amount of time.)
No VIEWs, no TEMPORARY table, no copying the data over, no need for 24 tables.
You could consider doing the task "continually", rather than hourly. This becomes especially beneficial if the table gets so big that the hourly cron job takes more than one hour!

Inserting 1 million records in mysql

I have two tables and in both tables I get 1 million records .And I am using cron job every night for inserting records .In first table I am truncating the table first and then inserting the records and in second table I am updating and inserting record according to primary key. I am using mysql as my database.My problem is I need to do this task each day but I am unable to insert all data .So what can be the possible solution for this problem
Important is to set off all kind of actions and checks MySQL wants to perform when posting data, like autocommit, indexing, etc.
https://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-bulk-data-loading.html
Because if you do not do this, MySQL does a lot of work after every record added, and it adds up, when the process is proceeding, resulting in a very slow processing and importing in the end, and may not complete in one day.
If you must use MySql : For the first table, disable the indexes, do the inserts, than enable indexes. This will works faster.
Alternatively MongoDb will be faster, and Redis is very fast.

SSAS Tabular Refreshing only new data

i have a tabular cube which takes a long time to processing, my idea is to process only new data every hour and a full process during the night, is there a way to do that with SSIS and SQL Job?
Assuming your "new rows" are inserts to your fact table rather than updates or deletes you can do a ProcessAdd operation. ProcessAdd will take a SQL query you provide that returns the new rows and add them to your table in SSAS Tabular.
There are several ways to automate this, all of which could be run from SSIS. This article walks through the options well.
If you have updates and deletes then you need to partition your table inside SSAS. For example partition by week then only reprocess (ProcessData) the partitions where any rows have been inserted/updated/deleted.

Delete data from mysql innodb tables after one month is passed

Currently i am using cron for this. I thought perhaps it is possible to implement some procedure that will remove all data from database that is older than one month, but i am not sure that this is the best way.
Problem is that we have many servers with many cron processes, that are controlled by very small amount of stuff, and we need to make it clear and easy-to-manage, that's why i don't want to have such cron process.
Data in table i want to delete - statistics, huge amount of this data is inserted every day, and if it will not be deleted - database will be unbeliaveable huge (about ~500M every day, for us it's quite big amount, 500M * 365 days is 182,5G per year)
Is it possible to delete data using some procedure in mysql (perhaps after new row is added) / and is that a good idea?
If you're intending on moving away from cron jobs, you could always create an event that runs at a scheduled frequency.
Whatever you do, it's a very bad idea to delete data every time a new row is added, as it'll slow down your insert and it's more likely to fragment your tables.