I need a concept how I can archive all data in D-Pool which is older than one year.
At the moment we have more than 3 million records in the D-Pool.
Because of this huge data foundation searches and filters over the database takes quite too long, because most searches are done over the whole D-Pool data, but in most cases I am only interested in current data.
So I want to archive regularly all data from D-pool which is not needed for current work and evaluations.
But for some functions it should be possible furthermore to do a search over the whole D-pool, the current and the old data.
Could you tell me some ideas for this problem?
This describes the typical data warehousing solution. Most owners of large datasets that change daily have a transactional database and a historical or reporting database. The historical or reporting database allows users to mine for insight against everything - except the data added since the last extract. That's usually sufficient.
Related
I've been tasked with transferring our information out of an old antiquated system, back into Access.
I have personally not used Access before, and I'm seeing that in order to transfer the data, I have to compact the file, because the data set was created on an older version of Access.
After hours of trying to find a function to do this that actually works, it seems like I can't find anything that works. Has anyone had success in this?
I'm wondering if it's easier to have two data sets, one in excel that is searchable for the old data set, and create a new database for all new items in Access, and anything that needs to be changed from the old database, could be transferred on a "as needed" basis.
We track Serial Numbers, Warranty & Purchase dates, Purchase Order Numbers, along with customer names. So it's not a large amount of information, however it needs to be clear and efficient.
Any wise words greatly appreciated.
I have many IoT devices sending data currently to MySQL Database.
I want to port it to some other Database, which will be Open Source and provide me with:
JSON support
Scalability
Flexibility to add multiple columns automatically as per payload
Python and PHP Support
Extremely Fast Read, Write
Ability to export at least 6 months of data in CSV format
Please revert back soon.
Any help will be appreciated.
Thanks
Shaping your database based on input data is a mistake. Think of tomorrow your data will be CSV or XML, in a slight different format. Design your database based on your abstract data model, normalize it and apply existing data to your model. Shape your structure based on what input you have and what output you plan to get. If you retrieve the same content as the input, storing data in files will be sufficient, you don't need a database.
Also, you don't want to store "raw" records the database. Even if your database can compose a data record out of the raw element at run time, you cannot run a selection based on a certain extracted element, without visiting all the records.
Most of the databases allow you to connect from anywhere (there is not such thing as better support of PostgreSQL in Java as compared to Python, but the quality and level of standardization for drivers may vary). The question is what features shall your driver support. For example, you may require support for bulk import (don't issue large INSERT sets to the database).
What you actually look for is:
scalability: can your database grow with your data? Would the DB benefit of adding additional CPUs (MySQL particularly doesn't for large queries). Can you shard your database on multiple instances? (MySQL again fails to handle that).
does your model looks like a snowflake? If yes, you may consider NoSQL, otherwise stay away of it. If you manage to model as a snowflake (and this means you are open for compromises) you may use anything like Lucene based search products, Mongo, Cassandra, etc. The fact you have timeseries doesn't qualify you for NoSQL. For example, you may have 10K devices issuing 5k message types. Specific data is redundantly recorded at device level and at message type level. In that case, because of the n:m relation, you don't have the snowflake anymore.
why do you store the data? What queries are you going to issue?
Why do you want to move away from MySQL? It is open source and can meet all of the criteria you listed above. This is a very subjective question so it's hard to give a good answer, but MySQL is not a bad option
I am trying to realize a project where people can login into a site where they find a personal calendar. In this calendar people shall be able to leave timestamps. Since a year has around 365 days, (and per day more than 1 timestamp is possible) there will be a lot of timestamps to save.
I need a way to save those timestamps in a sort of database. I am new to this and I want to know if using a JSON File for storing those timestamps or using a MySQL database is the better way of doing this.
Background-Story:
I work on a project where a microcontroller does certain things at those given timestamps from the User. The user leaves timestamps in a calendar on an iOS-App. So it also has to be compatible with Swift/iOS.
Any ideas?
Databases have a few ways to store timestamps. For example the data-type TIMESTAMP or DATETIME are all ways to store timestamps
If you do it in a database, you have the ability to sync it across devices.
To do it in JSON, I'll refer you to this question on StackOverflow:
The "right" JSON date format
EDIT: After reviewing the comments, you most likely want a database. I have an example here where you have a table for users and a table for events that can be joined to get each event for each user, even though each user has all their events in the same table.
I created this because I used to not know what Databases were good for, so I came here and someone put me in the right direction. Databases are VERY powerful and fast. To maintain everyone's JSON file of events would be a nightmare. I 100% recommend a database for your situation
Play around with this sample DB I created: http://sqlfiddle.com/#!9/523e2d/5
If you have only a few users with a few timestamps and not much else going on, then you could indeed store each user’s data in a text file. That file could store JSON if you want, or XML, or more simply tab-delimited or comma-delimited text. If you just need the next timestamp, without keeping history, this might well be the best approach with one file per user. Beware of risks such as your own code inadvertently overwriting a file, or crashing during a write so the file is ruined and data lost.
A step-up from text-in-a-file is SQLite, to help you manage the data so that you need not do the chore of parsing. While SQLite is a valuable product when used appropriately, it is not meant to compete with serious database products.
But for more heavy-duty needs, such as much more data or multiple apps accessing the data, use a database. That is what they are for – storing large amounts of structured data in a way that can handle concurrent access by more than one user and more than one app.
While MySQL is more famous, for serious enterprise-quality database where preserving your data is critical, I recommend Postgres. Postgres has the best date-time handling of most any database, both in data types and in functions. Postgres also has the best documentation of any SQL database.
Another pair of options, both built in pure Java and both free-of-cost/open-source, are H2 and Derby. Or commercial products like Microsoft SQL Server or Oracle.
You have not really given enough details to make a specific recommendation. Furthermore, this site is not meant for software recommendations. For that, see the sister site: Software Recommendations Stack Exchange.
If you choose a database product, be sure to learn about date-time handling. Database products vary widely in this respect, and the SQL standard barely touches upon the subject. Quick tips: Store values in UTC, and indeed your database such as Postgres is likely to adjust incoming values. Apply a time zone only for presentation to users. Avoid the data-types that ignore the issue of time zone. So for standard SQL, this means using the TIMESTAMP WITH TIME ZONE data type rather than TIMESTAMP WITHOUT TIME ZONE. For Java work, use only the java.time classes rather than the troublesome old date-time classes, and for your purposes use the Instant and ZonedDateTime classes but not the LocalDateTime class which lacks any offset or time zone.
So let's say I have a site with appx. 40000 articles.
What Im hoping to do is record the number of page visits per each article overtime.
Basically the end goal is to be able to visualize via graph the number of lookups for any article between any period of time.
Here's an example: https://books.google.com/ngrams
I've began thinking about mysql data structure -> but my brain tells me it's probably not the right task for mysql. Almost seems like I'd need to use some specific nosql analytics solution.
Could anyone advice what DB is the right fit for this job?
SQL is fine. It supports UPDATE statements that guarantee your count is correct rather than just eventual consistency.
Although most people will just use a log file, and process this on-demand. Unless you are Google scale, that will be fast enough.
There exist many tools for this, often including some very efficient specialized data structures such as RDDs that you won't find in any database. Why don't you just use them?
What do you think is a data store appropriate for sensor data, such as temperature, humidity, velocity, etc. Users will have different sets of fields and they need to be able to run basic queries against data.
Relational databases like MySQL is not flexible in terms of schema, so I was looking into NoSql approaches, but not sure if there's any particular project that's more suitable for sensor data. Most of NoSql seem to be geared toward log output as such.
Any feedback is appreciated.
Thanks!
I still think I would use an RDBMS simply because of the flexibility you have to query the data. I am currently developing applications that log approximately one thousand remote sensors to SQL Server, and though I've had some growing pains due to the "inflexible" schema, I've been able to expand it in ways that provides for multiple customers. I'm providing enough columns of data that, collectively, can handle a vast assortment of sensor values, and each user just queries against the fields that interest them (or that their sensor has features for).
That said, I originally began this project by writing to a comma separated values (CSV) file, and writing an app that would parse it for graphing, etc. Each sensor stored its data in a separate CSV, but by opening multiple files I was able to perform comparisons between two or more sets of sensor data. Also CSV is highly compressible, opens in major office applications, and is fairly user-editable (such as trimming sections that you don't need).