How should I setup my database with an instable connection? - mysql

In order to log values and work on strategy of a student race car, I am working on a system that logs sensor values and store that into a database. Using a WiFi connection we would like to read these values, but this connection is not very stable. Besides that, we really want to be sure that the computer in the car doesn't have downtime and that it is energy efficient.
Our current design idea consist of a raspberry pi(or beaglebone or equivalent) in the car, and a server in the pits. In the car we would like to log the data, and when we have a connection this should be synchronized to the server in the pits. In the pits there are about 10 laptops that connects to this server where we would like to receive real time data if available, and otherwise the historical data.
MySQL replication looks like the way to go, where the car is the master and the server in the pits the slave. The downside of this is that both computers need to have the same data(correct?). We would like that the car only has the data of today, to have a small database on the raspberry. At the other side, we want a complete archive of racing days on the server in the pits. In the pits we should be able to easily select data from the past 8 days and plot this.
I think we have the following options:
Complete replication, Synchronize all data from the car to the pitlane. Easily implementable, but hard for the rasberry and not possible for a lot of data.
Replication per day. Replicate the data between the car and the server. On the server we have an archive, at the end of the day we have to copy the data from replication database to the archive database. This makes it difficult to select data over days.
Custom replication. We have to make a script on the pitlane server, that connects to the car and compares that with the database in the pitlane. If there are entries with a newer timestamp it copies it to the archieve. Custom scripts are less reliable than build in functions. Maybe more intensive for the raspberry, which can result in downtime.
Is there a better way to do it? That the server in the car automatically clears the database when synchronisation is complete for example, but that auto id's from the rows still keeps counting.
Thank you for thinking with me!
Bart
ps. To give an idea of the data, we have 70 values at 1Hz, 20 values at 5Hz, and 15 values at 20Hz (Double or INT).

Related

How to have a centralized MySQL database?

I am trying to setup a MySQL database that takes data from 3 other MySQL databases. The data that would be copied would be a query that standardizes the data format. The method would need to either be run daily as a script or synced in real time, either method would be fine for this project.
For example:
The query from source DB:
SELECT order_id, rate, quantity
WHERE date_order_placed = CUR_DATE()
FROM orders
Then I want to take the results of that query to be inserted into a destination DB.
The databases are on separate hosts.
I have tried creating scripts that run CSV and SQL exports/imports without success. I have also tried using Python pymysql library but seemed overkill. I'm pretty lost haha.
Thanks :)
Plan A:
Connect to source. SELECT ... INTO OUTFILE.
Connect to destination. LOAD DATA INFILE from the output above.
Plan B (both MySQL):
Set up Replication from the source (as a Master) and the destination (as a Slave)
Plan C (3 MySQL servers):
Multi-source replication to allow gathering data from two sources into a single, combined, destination.
I think MariaDB 10.0 is when they introduced multi-source repl. Caution: MariaDB's GTIDs are different than MySQL's. But I think there is a way to make the replication you seek to work. (It may be as simple as turning off GTIDs??)
Plan D (as mentioned):
Some ETL software.
Please ponder which Plan you would like to pursue, then ask for help in focusing on one. Meanwhile, your question is too broad.

Mysql MyISAM table crashes with lots of inserts and selects, what solution should I pick?

I have the following situation:
MySQL MyISAM database on Amazon EC2 instance with PHP on a apache webserver. We need to store incomming packages in json in MySql. For this I use a staging database where a cronjob each minutes moves old data with a where DateTime > 'date - 2min' query to another table (named stage2).
The stage1 table has only actual information and contains 35k rows at normal and can contain up to 100k when it's busy. We can reach 50k new rows a minute, which should be about 10k insert queries. The insert looks like this:
INSERT DELAYED IGNORE INTO stage1 VALUES ( ... ), (....), (....), (....), (...)
Then we have 4 scripts running about each 10second doing the following:
grab max RowID from stage1 (primary key)
export data till that rowID and from the previous max RowId
a) 2 scripts are in bash and using the mysql export commandline method
b) 1 script in node.js and is using the export method with into outfile
c) 1 script in php which using the default mysql select statement and loop through each row.
send data to external client
write last send time and last rowid to a mysql table so it knows where it is next time.
Then we have one cronjob each minute moving old data form stage1 to stage2.
So everything worked well for a long time but now we are increasing in users and during our rush hours the stage1 table is crashing now and then. We can easily repair it but that's not the right way because we will be down for some time. Memory and CPU are ok during the rush hours but when stage1 is crashing everything is crashing.
Also worth to say: I don't care if I'm missing rows because of a failure, so I don't need any special backup plans just in case something went wrong.
What I did so far:
Adding delayed and ignore to the insert statements.
Tried switching to innoDB but this was even worse, mainly think of the large memory it needed. My EC2 currently is a t2.medium which has 4gb memory and 2 vCPU with burst capacity. Following: https://dba.stackexchange.com/questions/27328/how-large-should-be-mysql-innodb-buffer-pool-size and running this query:
SELECT CEILING(Total_InnoDB_Bytes*1.6/POWER(1024,3)) RIBPS FROM
(SELECT SUM(data_length+index_length) Total_InnoDB_Bytes
FROM information_schema.tables WHERE engine='InnoDB') A;
it returned 11gb, I tried 3gb which is the max for my instance (80%). And since it was more instable I switched every table back to myISAM yesterday
recreate the stage1 table structure
What are my limitations?
I cannot change all 4 scripts to one export because the output to the client is different. for example some use json others xml.
Options I'm considering
m3.xlarge instance with 15GB memory is 5 times more expensive, but if this is needed Im willing to do the offer. Then switch to innoDB again and see if it's stable ?
can I just move stage1 to innoDB and run it with 3gb buffer pool size? So the rest will be myISAM ?
Try doing it with a nosql database or a in memory type database. Should that work?
Queue the packages in memory and have the 4 scripts get the data from memory and save all later when done to mysql. Is there some kind of tool for this?
Move stage1 to a RDS instance with innoDB
Love to get some advice and help on this! Perhaps I'm missing the easy answer ? or what options should I not consider.
Thanks,
Sjoerd Perfors
Today we fixed these issues with the following setup:
AWS Loadbalancer going to a T2.Small instance "Worker" where Apache en PHP handeling the request and sending to a EC2 instance mysql system calling the "Main".
When CPU of the T2.small instance is above 50% automatically new instances are launched connecting to the loadbalancer.
"Main" EC2 has mysql running with innodb.
All updated to Apache 2.4 and php 5.5 with performance updates.
Fixed one script acting a lot faster.
Innodb has now 6GB
Things we did try-out but didn't worked:
- Setting up a DynamoDB but sending to this DB did cost almost 5seconds.
Things we are considering:
- Removing the stage2 database and doing backups directly from Stage1. Seems having this kind of rows isn't bad for the performance.

SSIS to insert non-matching data on non-linked server

This is regarding SQL Server 2008 R2 and SSIS.
I need to update dozens of history tables on one server with new data from production tables on another server.
The two servers are not, and will not be, linked.
Some of the history tables have 100's of millions of rows and some of the production tables have dozens of millions of rows.
I currently have a process in place for each table that uses the following data flow components:
OLEDB Source task to pull the appropriate production data.
Lookup task to check if the production data's key already exists in the history table and using the "Redirect to error output" -
Transfer the missing data to the OLEDB Destination history table.
The process is too slow for the large tables. There has to be a better way. Can someone help?
I know if the servers were linked a single set based query could accomplish the task easily and efficiently, but the servers are not linked.
Segment your problem into smaller problems. That's the only way you're going to solve this.
Let's examine the problems.
You're inserting and/or updating existing data. At a database level, rows are packed into pages. Rarely is it an exact fit and there's usually some amount of free space left in a page. When you update a row, pretend the Name field went from "bob" to "Robert Michael Stuckenschneider III". That row needs more room to live and while there's some room left on the page, there's not enough. Other rows might get shuffled down to the next page just to give this one some elbow room. That's going to cause lots of disk activity. Yes, it's inevitable given that you are adding more data but it's important to understand how your data is going to grow and ensure your database itself is ready for that growth. Maybe, you have some non-clustered indexes on a target table. Disabling/dropping them should improve insert/update performance. If you still have your database and log set to grow at 10% or 1MB or whatever the default values are, the storage engine is going to spend all of its time trying to grow files and won't have time to actually write data. Take away: ensure your system is poised to receive lots of data. Work with your DBA, LAN and SAN team(s)
You have tens of millions of rows in your OLTP system and hundreds of millions in your archive system. Starting with the OLTP data, you need to identify what does not exist in your historical system. Given your data volumes, I would plan for this package to have a hiccup in processing and needs to be "restartable." I would have a package that has a data flow with only the business keys selected from the OLTP that are used to make a match against the target table. Write those keys into a table that lives on the OLTP server (ToBeTransfered). Have a second package that uses a subset of those keys (N rows) joined back to the original table as the Source. It's wired directly to the Destination so no lookup required. That fat data row flows on over the network only one time. Then have an Execute SQL Task go in and delete the batch you just sent to the Archive server. This batching method can allow you to run the second package on multiple servers. The SSIS team describes it better in their paper: We loaded 1TB in 30 minutes
Ensure the Lookup is a Query of the form SELECT key1, key2 FROM MyTable Better yet, can you provide a filter to the lookup? WHERE ProcessingYear = 2013 as there's no need to waste cache on 2012 if the OLTP only contains 2013 data.
You might need to modify your PacketSize on your Connection Manager and have a network person set up Jumbo frames.
Look at your queries. Are you getting good plans? Are your tables over-indexed? Remember, each index is going to result in an increase in the number of writes performed. If you can dump them and recreate after the processing is completed, you'll think your SAN admins bought you some FusionIO drives. I know I did when I dropped 14 NC indexes from a billion row table that only had 10 total columns.
If you're still having performance issues, establish a theoretical baseline (under ideal conditions that will never occur in the real world, I can push 1GB from A to B in N units of time) and work your way from there to what your actual is. You must have a limiting factor (IO, CPU, Memory or Network). Find the culprit and throw more money at it or restructure the solution until it's no longer the lagging metric.
Step 1. Incremental bulk import of appropriate proudction data to new server.
Ref: Importing Data from a Single Client (or Stream) into a Non-Empty Table
http://msdn.microsoft.com/en-us/library/ms177445(v=sql.105).aspx
Step 2. Use Merge Statement to identify new/existing records and operate on them.
I realize that it will take a significant amount of disk space on the new server, but the process would run faster.

MySQL: Database Plan For maintaining EOD data of stocks

Broader View: Database plan to maintain Stock EOD data.
Tools at hand: I am planning to use MySQL database 4.1+ alongwith PHP. A custom DBAL (based on mysqli) is implemented in PHP to work with MySQL. However I am open to any other database engines provided they are available free and work with SQL statements :P
Problem Domain: I need to plan out a database for my project to maintain the EOD data for stocks. Since, the number of stocks maintained in the database is going to be huge, so the updation process of EOD data for the same is going to be pretty heavy process at end of day.
Needless to say I am on shared hosting and have to avoid MySQL performance bottleneck, during initial startup. However may move to VPS later.
Questions:
1. Can normalized schemas take the heavy updation process without creating a performance issue ?
2. Analysis based on popular algorithms like MACD and CMF has to be done on the EOD data so as to spot a particular trend in the stocks, the data from analysis will again have to be stored for further reference. Analysis data will be calculated once the EOD data is updated for the day. So is going with normalized schemas here is fine, keeping the performance issue in view ? Also I would need to fetch both the EOD and Analysis data quite often !
Elaboration:
1 INSERT statement (to insert EOD data) + 1 INSERT statement (to insert analysis data)
= 2 INSERT statements * 1500 stocks (for startup)
= 3000 INSERT statements done back 2 back !
I am further planning to add more stocks as the project grows, so I am looking here at scalability as well.
Although I don't know about the concept of DW (only heard about it) but, if from performance point of view, it is more viable than OLTP, I am ready to give it a shot.

How do I combine two MySQL databases from different points in time?

I recently switch to a new hosting provider for my application. My employees used the old site until the new site went live, however, the database backup from the old site was taken two days before the new site went live. So in the midst of transferring, records were being entered into the old site database while the new site had no existence of them (hence my two day time lag). How do I merge the two databases to reflect the changes?
A couple of things to note are the primary keys might be duplicated for some tables and there are only timestamps on a few tables as well. I would do a 'diff' or something of the sort, but the tables are dumped in different formats.
Any thoughts?
This is something where you'll need to actually understand your database schema. You'll need to create a program that can look at both versions of the database, identify which records are shared, which are not, and which have conflicting primary keys (vs ones which were updated with the same keys). It then needs to copy over changes, possibly replacing the value of primary keys (including the values in other rows that refer to the row being renumbered!) This isn't easy, and it's not an exact science - you'll be writing heuristics, and expect to do some manual repairs as well.
Next time shut that database down when you grab the final backup :)
You don't need to create any additional programs. All what you need, to setup replications from the old DB to the new one.
All your data from the old DB will automatically transfer to the new DB. At this period you should use you old DB as the main data source. And as soon as all data will be copied to the new location, you'll need just brake replica connection and change the DB address in your code (or DNS pointer) to the new one.
1. oldDB ===> replication ==> newDB
R/W operations
2. oldDB ==/= brake ==/= newDB
R/W operations
MySQL Doc: 15.1.1. How to Set Up Replication