At the moment, we're working on a project that involves an archaic on board computer (OBC) and a proprietary database. The idea, at the moment, is to use MySQL on the desktop/website, but, when an OBC wants to become up to date, we send it the proprietary database files it needs to come up to date. That is, we don't send it a new copy of the files, just files with the changes and the OBC updates its own instance of the proprietary database.
At the moment, we are using said database on the desktop as well, but, we're trying to move away from it and into MySQL. The problem is that the OBC is so old and so heavily invested into, that we can't move away from its use of the proprietary version.
My question boils down to this: Is there a way to search MySQL for every row that has been altered since a given date (not searching for a date time in a column, but a date time for when this row was altered last), or, would we have to keep track of, on our own, every change (there won't be all that many, at least 1000) made to the database?
Related
Our product has been growing steadily over the last few years and we are now on a turning point as far as data size for some of our tables is, where we expect that the growth of said tables will probably double or triple in the next few months, and even more so in the next few years. We are talking in the range of 1.4M now, so over 3M by the end of the summer and (since we expect growth to be exponential) we assume around 10M at the end of the year. (M being million, not mega/1000).
The table we are talking about is sort of a logging table. The application receives data files (csv/xls) on a daily basis and the data is transfered into said table. Then it is used in the application for a specific amount of time - a couple of weeks/months - after which it becomes rather redundant. That is: if all goes well. If there is some problem down the road, the data in the rows can be useful to inspect for problem solving.
What we would like to do is periodically clean up the table, removing any number of rows based on certain requirements, but instead of actually deleting the rows move them 'somewhere else'.
We currently use MySQL as a database and the 'somewhere else' could be the same, but can be anything. For other projects we have a Master/Slave setup where the whole database is involved, but that's not what we want or need here. It's just some tables where the Master table would need to become shorter and the Slave only bigger, not a one-on-one sync.
The main requirement for the secondary store would be that the data should be easy to inspect/query when need to, either by SQL or another DSL, or just visual tooling. So we are not interested in backing up the data to one or more CSV files or another plain text format, since that is not as easy to inspect. The logs will then be somewhere on S3 so we would need to download it, grep/sed/awk on it... We'd much rather have something database like that we can consult.
I hope the problem is clear?
For the record: while the solution can be anything we prefer to have the simplest solution possible. It's not that we don't want Apache Kafka (example), but then we'd have to learn it, install it, maintain it. Every new piece of technology adds onto our stack, the lighter it remains the more we like it ;).
Thanks!
PS: we are not just being lazy here, we have done some research but we just thought it'd be a good idea to get some more insight in the problem.
I am working on a project with node.js (not express) server and a mysql database. When a user clicks a button on the page, it uploads 2 values (say SpecificName, Yes/No). Now these values get inserted into the mysql database through the node server. Later, mysql runs a check for the specificName column (if it finds none, it then creates a column with the same Name) and updates the second value in it.
Now I would like to keep every update of the second value that the user makes through website (i.e yes) for 5 minutes in the mysql database after which it automatically updates the the specific location with another value (say cancel). I'm auspicious in solving every thing except this 5 minutes paradox. Also I'm keeping 15-20 so called specificName columns in which the value (say yes/no) is being updated and at the same time there are more than 1000 rows that are working simultaneously so a lots of 5 minute timers going on for the values. Is there a way to store value temporarily in mysql after which it is destroyed automatically?.
I came across :
node-crons (too complex, don't even know if its a right choice)
mysql events (I'm not sure how to use it with node)
timestamp (can't create more than one timestamp (guess I need one for each column))
datetime (haven't tested it yet) and other things like
(DELETE FROM table WHERE timestamp > DATE_SUB(NOW(), INTERVAL 5 MINUTE)).
Now I have no idea what to use or how to resolve this dilemma.
Any help would be appreciated.
Per my conversation with Sammy on kik, I'm pretty sure you don't want to do this. This doesn't sound like a use case that fits MySQL. I also worry that your MySQL knowledge is super limited, in which case, you should take the time to do more research on MySQL. Without a better understanding of the larger goal(s) your application is trying to accomplish, I can't suggest better alternatives. If you can think of a way to explain the application behavior without compromising the product idea, that would be very helpful in helping us solve your problem.
General things I want to make clear before giving you potential answers:
You should not be altering columns from your application. This is one of my issues with the Node/Mongo world. Relational databases don't like frequently changing table definitions. It's a quick way to a painful day. Doing so is fine in non-relational systems like Mongo or Cassandra, but traditional relational databases do not like this. The application should only be inserting, updating, and deleting rows. Ye hath been warned.
I'm not sure you want to put data into MySQL that has a short expiration date. You probably want some sort of caching solution like memcache or redis. Now, you can make MySQL behave like a cache, but this is not its intended use. There are better solutions. If you're persistent on using MySQL for this, I recommend investigating the MEMORY storage engine for faster reads/writes at the cost of losing data if the system suddenly shuts down.
Here are some potential solutions:
MySQL Events - Have a timestamp column and an event scheduled to run... say every minute or so. If the event finds that a row has lived more than 5 minutes, delete it.
NodeJS setTimeout - From the application, after inserting the record(s), set a timeout for 5 minutes to go and delete said records. You'll probably want to ensure you have some sort of id or timestamp column for supahfast reference of the values.
Those are the two best solutions that come to mind for me. Again, if you're comfortable revealing how your application behaves that requires an unusual solution like this, we can likely help you arrive at a better solution.
ok so guess i figured it out myself. I'm posting this answer for all those who still deal with this query.I used DATETIME stamp in mysql that i created for each column of specificName.so with every specificName column,there exist another specificName_TIME column that stores the time at which the value (yes/no) is updated.the reason i didn't use timestamp is because its not possible to create end number of timestamp in mysql versions lower than 5.6.Now i updated the current time by adding 5 minutes before storing it in database.Then i ran 2 chain functions. First one checks if the datetime in the database is smaller than the current time (SELECT specificName FROM TABLE WHERE specificName_TIME < NOW()).If it turns out to be true it shows me the value else it reflects null.Then i ran the second function to update the value if its true to continue the whole process again and if not to continue it anyways updating the last value with null.
HOPE THIS HELPS.
I would love to hear some opinions or thoughts on a mysql database design.
Basically, I have a tomcat server which recieves different types of data from about 1000 systems out in the field. Each of these systems are unique, and will be reporting unique data.
The data sent can be categorized as frequent, and unfrequent data. The unfrequent data is only sent about once a day and doesn't change much - it is basically just configuration based data.
Frequent data, is sent every 2-3 minutes while the system is turned on. And represents the current state of the system.
This data needs to be databased for each system, and be accessible at any given time from a php page. Essentially for any system in the field, a PHP page needs to be able to access all the data on that client system and display it. In other words, the database needs to show the state of the system.
The information itself is all text-based, and there is a lot of it. The config data (that doesn't change much) is key-value pairs and there is currently about 100 of them.
My idea for the design was to have 100+ columns, and 1 row for each system to hold the config data. But I am worried about having that many columns, mainly because it isn't too future proof if I need to add columns in the future. I am also worried about insert speed if I do it that way. This might blow out to a 2000row x 200column table that gets accessed about 100 times a second so I need to cater for this in my initial design.
I am also wondering, if there is any design philosophies out there that cater for frequently changing, and seldomly changing data based on the engine. This would make sense as I want to keep INSERT/UPDATE time low, and I don't care too much about the SELECT time from php.
I would also love to know how to split up data. I.e. if frequently changing data can be categorised in a few different ways should I have a bunch of tables, representing the data and join them on selects? I am worried about this because I will probably have to make a report to show common properties between all systems (i.e. show all systems with a certain condition).
I hope I have provided enough information here for someone to point me in the right direction, any help on the matter would be great. Or if someone has done something similar and can offer advise I would be very appreciative. Thanks heaps :)
~ Dan
I've posted some questions in a comment. It's hard to give you advice about your rapidly changing data without knowing more about what you're trying to do.
For your configuration data, don't use a 100-column table. Wide tables are notoriously hard to handle in production. Instead, use a four-column table containing these columns:
SYSTEM_ID VARCHAR System identifier
POSTTIME DATETIME The time the information was posted
NAME VARCHAR The name of the parameter
VALUE VARCHAR The value of the parameter
The first three of these columns are your composite primary key.
This design has the advantage that it grows (or shrinks) as you add to (or subtract from) your configuration parameter set. It also allows for the storing of historical data. That means new data points can be INSERTed rather than UPDATEd, which is faster. You can run a daily or weekly job to delete history you're no longer interested in keeping.
(Edit if you really don't need history, get rid of the POSTTIME column and use MySQL's nice extension feature INSERT ON DUPLICATE KEY UPDATE when you post stuff. See http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html)
If your rapidly changing data is similar in form (name/value pairs) to your configuration data, you can use a similar schema to store it.
You may want to create a "current data" table using the MEMORY access method for this stuff. MEMORY tables are very fast to read and write because the data is all in RAM in your MySQL server. The downside is that a MySQL crash and restart will give you an empty table, with the previous contents lost. (MySQL servers crash very infrequently, but when they do they lose MEMORY table contents.)
You can run an occasional job (every few minutes or hours) to copy the contents of your MEMORY table to an on-disk table if you need to save history.
(Edit: You might consider adding memcached http://memcached.org/ to your web application system in the future to handle a high read rate, rather than constructing a database design for version 1 that handles a high read rate. That way you can see which parts of your overall app design have trouble scaling. I wish somebody had convinced me to do this in the past, rather than overdesigning for early versions. )
Context
I'm currently developing a tool for managing orders and communicating between technicians and services. The industrial context is broadcast and TV. Multiple clients expecting media files each made to their own specs imply widely varying workflows even within the restricted scope of a single client's orders.
One client can ask one day for a single SD file and the next for a full-blown HD package containing up to fourteen files... In a MySQL db I am trying to store accurate information about all the small tasks composing the workflow, in multiple forms:
DATETIME values every time a task is accomplished, for accurate tracking
paths to the newly created files in the company's file system in VARCHARs
archiving background info in TEXT values (info such as user comments, e.g. when an incident happens and prevents moving forward, they can comment about it in this feed)
Multiply that by 30 different file types and this is way too much for a single table. So I thought I'd break it up by client: one table per client so that any order only ever requires the use of that one table that doesn't manipulate more than 15 fields. Still, this a pretty rigid solution when a client has 9 different transcoding specs and that a particular order only requires one. I figure I'd need to add flags fields for each transcoding field to indicate which ones are required for that particular order.
Concept
I then had this crazy idea that maybe I could create a temporary table to last while the order is running (that can range from about 1 day to 1 month). We rarely have more than 25 orders running simultaneously so it wouldn't get too crowded.
The idea is to make a table tailored for each order, eliminating the need for flags and unnecessary forever empty fields. Once the order is complete the table would get flushed, JSON-encoded, into a TEXT or BLOB so it can be restored later if changes need made.
Do you have experience with DBMS's (MySQL in particular) struggling from such practices if it has ever existed? Does this sound like a viable option? I am happy to try (which I already started) and I am seeking advice so as to keep going or stop right here.
Thanks for your input!
Well, of course that is possible to do. However, you can not use the MySQL temporary tables for such long-term storage, you will have to use "normal" tables, and have some clean-up routine...
However, I do not see why that amount of data would be too much for a single table. If your queries start to run slow due to much data, then you should add some indexes to your database. I also think there is another con: It will be much harder to build reports later on, when you have 25 tables with the same kind of data, you will have to run 25 queries and merge the data.
I do not see the point, really. The same kinds of data should be in the same table.
I have read through the solutions to similar problems, but they all seem to involve scripts and extra tools. I'm hoping my problem simple enough to avoid that.
So the user uploads a csv of next week's data. It gets inserted into the DB, no problem.
BUT
an hour later he gets feedback from everyone, and must make updates accordingly. He updates the csv and goes to upload it to the DB.
Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.
Obviously, it is a lot easier to just wipe it clean and reenter the data, but not the best method, especially if there are lots of changes or tons of data. But I have to know WHAT changes have been made to send out alerts. But I don't want a transaction log, as the alerts only need to be sent out the one time and after that, the old data is useless.
So!
Is there a smart way to compare the new data to the already existing data, get only the rows that are changed/deleted/added, and make those changes? Right now it seems like I could do an update, but then I won't get any response on what has changed...
Thanks!
Quick Edit:
No foreign keys are currently in use. This will soon change, but it shouldn't make a difference, because the foreign keys will only point to who the data effects and thus won't need to be changed. As far as primary keys go, that does present a bit of a dilemma:
The data in question is everyone's work schedule. So it would be nice (for specific applications of this schedule beyond simple output) for each shift to have a key. But the problem is, let's say that user1 was late on Monday. The tardiness is recorded in a separate table and is tied to the shift using the shift key. But if on Tuesday there is some need to make some changes to the week already in progress, my fear is that it will become too difficult to insure that all entries in the DB that have already happened (and thus may have associations that shouldn't be broken) will get re-keyed in the process. Unfortunately, it is not as simple as only updating all events occurring AFTER the current time, as this would add work (and thus make it less marketable) to the people who do the uploading. Basically, they make the schedule on one program, export it to a CSV, and then upload it on a web page for all of the webapps that need that data. So it is simply much easier for them (and less stressful for everyone involved) to do the same routine every time of exporting the entire week and uploading it.
So my biggest concern is to make the upload script as smart as possible on both ends. It doesn't get bloated trying to find the changes, it can find the changes no matter the input AND none of the data that is unchanged risks getting re-keyed.
Here's a related question:
Suppose Joe User was schedule to wash dishes from 7:00 PM to 8:00 PM, but the new
data has him working 6:45 PM to 8:30 PM. Has the shift been changed? Or has the old
one been deleted and a new one added?
And another one:
Say Jane was schedule to work 1:00 PM to 3:00 PM, but now everyone has a mandatory
staff meeting at 2:00 to 3:00. Has she lost one shift and gained two? Or has one
shift changed and she gained one?
I'm really interested in knowing how this kind of data is typically handled/approached, more than specific answers to the above.
Again, thank you.
Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.
So your script knows the differences, right? And you don't want to use some extra extra tools, apart from your script and MySQL, right?
I'm quite convinced that MySQL doesn't offer any 'diff' tool by itself, so the best you can achieve is making new CSV file for updates only. I mean - it should contain only changed rows. Updating would be quicker, and all changed data would be easily available.
If you have a unique key on one of the fields, you can use:
LOAD DATA LOCAL INFILE '/path/to/data.csv' REPLACE INTO TABLE table_name