Prevent table corruption with historical data - MariaDb / MySQL - mysql

I am working on a MariaDb database, in which there is a table called event that can calmly store 6-8 million monthly records. Although the data that is stored does not have much information, since it is simple records of events that occur, composed of 3 columns.
So I would like to prevent possible corruption of the table and my initial idea is, at the end of each month, to create a new historical table with the name in the format event_month_year, and dump the data of the corresponding date. Example:
event_jan_2022
event_feb_2022
event_mar_2022
In this way, if I want to access the information of the current month, I will just query to the table event, but if I want information of previous dates, then I will query to the table by its name in the format that I just explained (event_month_year).
I don't know if this way is the most correct, the most optimal or elegant or not. Surely there are other, better ways to do it, so I'd like to hear opinions.
Thanks in advance.

Related

best approach to exchanging data dumps between organizations

I am working a project where I will receive student data dumps once a month. The data will be imported into my system. The initial import will be around 7k records. After that, I don't anticipate more than a few hundred a month. However, there will also be existing records that will be updated as the student changes grades, etc.
I am trying to determine the best way to keep track of what has been received, imported, and updated over time.
I was thinking of setting up a hosted MySQL database with a script that imports the SFTP dump into a table that includes a creation_date and a modification_date field. My thought was, the person performing the extraction, could connect to the MySQL db and run a query on the imported table each month to get the differences before the next extraction.
Another thought I had, was to create a new received table every month for each data dump. Then I would perform the query on the differences.
Note: The importing system is legacy and will accept imports using a utility and unique csv type files. So that probably rules out options like XML.
Thank you in advance for any advice.
I'm going to assume you're tracking students' grades in a course over time.
I would recommend a two table approach:
Table 1: transaction level data. Add-only. New information is simply appended on. Sammy got a 75 on this week's quiz, Beth did 5 points extra credit, etc. Each row is a single transaction. Presumably it has the student's name/id, the value being added, maybe the max possible value or some weighting factor, and of course the timestamp added.
All of this just keeps adding to a never-ending (in theory) table.
Table 2: summary table, rebuilt at some interval. This table does a simple aggregation on the first table, processing the transactional scores into a global one. Maybe it's a simple sum, maybe it's a weighted average, maybe you have something more complex in mind.
This table has one row per student (per course?). You want this to be rebuilt nightly. If you're lazy, you just DROP/CREATE/INSERT. If you're worried about data-loss, you just INSERT and add a timestamp so you can have snapshots going back.

Should a session table be cleared off from the records after a user logs out?

I am using a MySql table to store a session record for the current logged in user. Once the user logs off, I update few fields in the same record and flags(revoked) it that it should not be used again. So for every LogIn a new record is created. This serves my purpose, but it turns out that the table is going to grow huge.
What should be the standard approach for storing Sessions? Should the ones, which are revoked be stored in a separate table, or should they be deleted or left in the same table?
I consider leaving the data in the same session table. While querying for a particular record, I query with two fields : (idPeople (not unique) and revoked (0 or 1)), for example SELECT * FROM session WHERE idPeople = "someValue" AND revoked = 0. and then update the record if needed while the user is, logged in or kogging out. Will the huge size of table affect this? or MySql will handle this? And what are other ramifications for this which I am unable to see?
First, it may be a good idea to add a unique field to your table (e.g. SESSION_ID, which could be a running auto-increment number), define this field as a unique ID, and use it to quickly find the record to be updated (i.e. revoke=1).
Second, this type of table always triggers the question you are asking, and the best answer can only be given after you assess and answer some preliminary questions, for instance:
When you wish to check the activities of a user, how far into the past does it make sense to go? One month? One year?
What is the longest period that you may wish to keep this information available (even using non routine queries to retrieve?
What type of questions (queries) I expect to be asked on this table?
One you answer those questions, you can consider the following options:
Have a routine process that would run once a day (at midnight or any other time your system can afford it) which would delete rows whose timestamp is older than, say, one month (or any other period suiting your needs), OR
Same as above but would first copy those records to an "history" table,
Change the structure of your table to a more efficient one, by adding some fields (as suggested above) and indices that would provide good answers for your "SELECT" needs.

MySQL Partitioning, Delete old data from multiple related tables

I am new to MySQL partitioning, therefore any example will be appreciated.
I am trying to create a sort of an ageing mechanism for a data that is distributed between several MyISAM tables.
My question will actually include several sub-questions.
The relevant tables are:
First table contains raw data with high input frequency (next to each record there is an auto incremented id).
Second table contains processed results, there is a result record per every raw data record (result record contains the source id record of the auto incremented field of raw data record)
Questions:
I need to be able to partition the raw data table and result data table similarly so that both of them will include only 10 weeks of data in single partition (each raw data record contains unixtimestamp field), how do i do it , can someone write small example case for two such tables?.
I want to be able to change the 10 weeks constraint on the fly.
I want that when ever the current partition will be filled or a new partition is created , the previous (10 weeks before) partition will be deleted automatically.
I don't want the auto increment id integer to be overflown, as much as i understand the ids are unique for the partition only, so if i am not wrong the auto increment id will start from zero for the next partition? but what if the previous partition still exist, will i have 2 duplicated ids , how i know to reference only for the last id when i present a result record?
I want to load raw data using LOAD DATA INTO... instead of multiple inserts , is MySQL partitioning functionality affected?
And the last question, would you suggest some other approach to implement aging mechanism (i am writing Java implementation product that processes around 1 GB or raw data per day and stores the results in MySQL)
It's hard to give a real answer on this question since it depends on your data. But let me give you some things to think about.
I assume we're talking about some kind of logs with recent data (so not spanning multiple years). You can partition by range. You could add one field to your table with the year/week number (ie 201201, 201202, etc). If this question is related to your question about importing into multiple tables, you can easily do this is that import script.
On the fly as in, repartition your data on the fly (70GB?). I would not recommend it. But you could do it if you had the weeknumber in there. If you later want to change it to 12 days, you could add a column for the date and partition by that.
Well it won't be deleted automatically but a cron job can handle that right? Just check how many partitions there are, and if there are 3(?) delete the first one.
The partition needs to have a primary index on the field that you partition (if you want to use auto increment). Therefor you can never fully rely on the auto increment id alone. I don't see a way around this.
I'm not sure what you mean.
If your data is just some logs in chronological order then you might just use separate tables for each period. Then before you start the new period (at 00:00) check the last id of the last table, create a new table and set the auto increment to that value +1. Then your import will decide when a new period will begin so it can be easily changed. Your import script can use a small table in where it can store the next period.
LOAD DATA is really quite fast. I would just have two steps(in no partic order) - LOAD DATA and then 'delete .. where date < 10 weeks'. Autoincrement will go on for as long as the datatype you're using. If you wanted to be super careful you could push it back to zero periodically.
Once the data is in the 'raw' table run your routine to create the 'processed' table. We use a v similar process where I work. We keep a separate table that has 'write' and 'parse' pointers to all of our 'raw' tables. As new data comes in and gets parsed the appropriate row pointers get set. If the 'raw' table gets truncated you can reset the 'write' pointer but leave the 'parse' pointer. (we store the offset in another table when this happens - just to be sure).
And if I recommend , creating the index column for each of the related columns can also enhanced the performance Delete old data from multiple related tables since we have just compared the index numbers rather than strings.
I wonder if your tables are being sorted or not.

MySQL Database Design Questions

I am currently working on a web service that stores and displays money currency data.
I have two MySQL tables, CurrencyTable and CurrencyValueTable.
The CurrencyTable holds the names of the currencies as well as their description and so forth, like so:
CREATE TABLE CurrencyTable ( name VARCHAR(20), description TEXT, .... );
The CurrencyValueTable holds the values of the currencies during the day - a new value is inserted every 2 minutes when the market is open. The table looks like this:
CREATE TABLE CurrencyValueTable ( currency_name VARCHAR(20), value FLOAT, 'datetime' DATETIME, ....);
I have two questions regarding this design:
1) I have more than 200 currencies. Is it better to have a separate CurrencyValueTable for each currency or hold them all in one table?
2) I need to be able to show the current (latest) value of the currency. Is it better to just insert such a field to the CurrencyTable and update it every two minutes or is it better to use a statement like:
SELECT value FROM CurrencyValueTable ORDER BY 'datetime' DESC LIMIT 1
The second option seems slower.. I am leaning towards the first one (which is also easier to implement).
Any input would be greatly appreciated!!
p.s. - please ignore SQL syntax / other errors, I typed it off the top of my head..
Thanks!
To your questions:
I would use one table. Especially if you need to report on or compare data from multiple currencies, it will be incredibly improved by sticking to one table.
If you don't have a need to track the history of each currency's value, then go ahead and just update a single value -- but in that case, why even have a separate table? You can just add "latest value" as a field in the currency table and update it there. If you do need to track history, then you will need the two tables and the SQL you posted will work.
As an aside, instead of FLOAT I would use DECIMAL(10,2). After MySQL 5.0, this will actually have improved results when it comes to currency handling with rounding.
It is better to have one table holding all currencies
If there is need for historical prices, then the table needs to hold them. A reasonable compromise in many situations is to split the price table into a full list of historical prices and another table which only has the current prices.
Using data type float can be troublesome. Please be sure you know what you are doing. If not, use a database currency data type.
As your webservice is transactional it is better if you'd have to access less tables at the same time. Since you will be reading and writing a lot, I would suggest having a single table.
Its better to insert a field to the CurrencyTable and update it rather than hitting two tables for a single request.

mysql optimization: current & previous orders should be on different tables, or same table with a flag column?

I want to know what's the most optimized way to work with mysql.
I have a quite large database for orders. i have another table for previous orders, whenever the order is completed, it's erased from orders and is moved to previous orders.
should i keep using this mothod or put them all in the same table and add a column that flags if it's current or previous orders?
kfir
In general, moving data around tables is a sensitive process - data can easily get lost or corrupted. The question is how are you querying this table - if you search and filter through it on an often basis, you want to keep the table relatively small. If you only access it to read specific lines (using a direct primary key), the size of the table is less crucial, and then I would advise to keep a flag.
A third option you might want to consider, is having 3 tables - one for ongoing orders, one for historic orders, and one with the order details. That third table can be long and static, while you query the first two tables.
Lastly - at some point, you might want to move the historic data out of the table all together. Maybe you would keep a cron job that runs once a month and moves out data which is older than 6 months to a different, remote database.