I set up Google Cloud MySQL, I store there just one user (email, password, address) and I'm querying it quite often due to testing purposes of my website. I set up minimal zone availability, the lowest SSD storage, memory 3.75GB, 1vCPUs, automatic backups disabled but running that database from the last 6 days costing me £15... How can I decrease the costs of having MySQL database in the cloud? I'm pretty sure paying that amount is way too much. Where is my mistake?
I suggest using the Google Pricing Calculator to check the different configurations and pricing you could have for a MySQL database in Cloud SQL.
Choosing Instance type
As you've said in your question, you're currently using the lowest standard instance, which is based on CPU and memory pricing.
As you're currently using your database for testing purposes, I could suggest to configure your database with the lowest Shared-Core Machine Type which is db-f1-micro, as shown here. But note that
The db-f1-micro and db-g1-small machine types are not included in the Cloud SQL SLA. These machine types are designed to provide low-cost test and development instances only. Do not use them for production instances.
Choosing Storage type
As you have selected the lowest allowed disk space, you could lower cost changing the storage type to HDD instead of a SSD if you haven't done so, as stated in the documentation:
Choosing SSD, the default value, provides your instance with SSD storage. SSDs provide lower latency and higher data throughput. If you do not need high-performance access to your data, for example for long-term storage or rarely accessed data, you can reduce your costs by choosing HDD.
Note that Storage type could only be selected when you're creating the instance and could not be changed later, as stated in the message when creating your instance.
Choice is permanent. Storage type affects performance.
Stop instance when is not in use
Finally, you could lower costs by stopping the database instance when it is not in use as pointed in the documentation.
Stopping an instance suspends instance charges. The instance data is unaffected, and charges for storage and IP addresses continue to apply.
Using Google Pricing Calculator
The following information is presented as a calculation exercise based in the Google Pricing Calculator
The estimated fees provided by Google Cloud Pricing Calculator are for discussion purposes only and are not binding on either you or Google. Your actual fees may be higher or lower than the estimate. A more detailed and specific list of fees will be provided at time of sign up
Following the suggestions above, you could get a monthly estimate of 6.41 GBP. Based on a 24 hour per 7 days running instance.
And using a SSD, it increases to 7.01 GBP. As said before, the only way to change the storage type would be to create a new instance and load your data.
And this could lower to 2.04 GBP if you only run it for 8 hours 5 days a week running on HDD.
I'm looking for some general strategies for synchronizing data on a central server with client applications that are not always online.
In my particular case, I have an android phone application with an sqlite database and a PHP web application with a MySQL database.
Users will be able to add and edit information on the phone application and on the web application. I need to make sure that changes made one place are reflected everywhere even when the phone is not able to immediately communicate with the server.
I am not concerned with how to transfer data from the phone to the server or vice versa. I'm mentioning my particular technologies only because I cannot use, for example, the replication features available to MySQL.
I know that the client-server data synchronization problem has been around for a long, long time and would like information - articles, books, advice, etc - about patterns for handling the problem. I'd like to know about general strategies for dealing with synchronization to compare strengths, weaknesses and trade-offs.
The first thing you have to decide is a general policy about which side is considered "authoritative" in case of conflicting changes.
I.e.: suppose Record #125 is changed on the server on January 5th at 10pm and the same record is changed on one of the phones (let's call it Client A) on January 5th at 11pm.
Last synch was on Jan 3rd. Then the user reconnects on, say, January 8th.
Identifying what needs to be changed is "easy" in the sense that both the client and the server know the date of the last synch, so anything created or updated (see below for more on this) since the last synch needs to be reconciled.
So, suppose that the only changed record is #125.
You either decide that one of the two automatically "wins" and overwrites the other, or you need to support a reconcile phase where a user can decide which version (server or client) is the correct one, overwriting the other.
This decision is extremely important and you must weight the "role" of the clients. Especially if there is a potential conflict not only between client and server, but in case different clients can change the same record(s).
[Assuming that #125 can be modified by a second client (Client B) there is a chance that Client B, which hasn't synched yet, will provide yet another version of the same record, making the previous conflict resolution moot]
Regarding the "created or updated" point above... how can you properly identify a record if it has been originated on one of the clients (assuming this makes sense in your problem domain)?
Let's suppose your app manages a list of business contacts. If Client A says you have to add a newly created John Smith, and the server has a John Smith created yesterday by Client D... do you create two records because you cannot be certain that they aren't different persons? Will you ask the user to reconcile this conflict too?
Do clients have "ownership" of a subset of data? I.e. if Client B is setup to be the "authority" on data for Area #5 can Client A modify/create records for Area #5 or not? (This would make some conflict resolution easier, but may prove unfeasible for your situation).
To sum it up the main problems are:
How to define "identity" considering that detached clients may not have accessed the server before creating a new record.
The previous situation, no matter how sophisticated the solution, may result in data duplication, so you must foresee how to periodically solve these and how to inform the clients that what they considered as "Record #675" has actually been merged with/superseded by Record #543
Decide if conflicts will be resolved by fiat (e.g. "The server version always trumps the client's if the former has been updated since the last synch") or by manual intervention
In case of fiat, especially if you decide that the client takes precedence, you must also take care of how to deal with other, not-yet-synched clients that may have some more changes coming.
The previous items don't take in account the granularity of your data (in order to make things simpler to describe). Suffice to say that instead of reasoning at the "Record" level, as in my example, you may find more appropriate to record change at the field level, instead. Or to work on a set of records (e.g. Person record + Address record + Contacts record) at a time treating their aggregate as a sort of "Meta Record".
Bibliography:
More on this, of course, on Wikipedia.
A simple synchronization algorithm by the author of Vdirsyncer
OBJC article on data synch
SyncML®: Synchronizing and Managing Your Mobile Data (Book on O'Reilly Safari)
Conflict-free Replicated Data Types
Optimistic Replication YASUSHI SAITO (HP Laboratories) and MARC SHAPIRO (Microsoft Research Ltd.) - ACM Computing Surveys, Vol. V, No. N, 3 2005.
Alexander Traud, Juergen Nagler-Ihlein, Frank Kargl, and Michael Weber. 2008. Cyclic Data Synchronization through Reusing SyncML. In Proceedings of the The Ninth International Conference on Mobile Data Management (MDM '08). IEEE Computer Society, Washington, DC, USA, 165-172. DOI=10.1109/MDM.2008.10 http://dx.doi.org/10.1109/MDM.2008.10
Lam, F., Lam, N., and Wong, R. 2002. Efficient synchronization for mobile XML data. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM, New York, NY, 153-160. DOI= http://doi.acm.org/10.1145/584792.584820
Cunha, P. R. and Maibaum, T. S. 1981. Resource &equil; abstract data type + synchronization - A methodology for message oriented programming -. In Proceedings of the 5th international Conference on Software Engineering (San Diego, California, United States, March 09 - 12, 1981). International Conference on Software Engineering. IEEE Press, Piscataway, NJ, 263-272.
(The last three are from the ACM digital library, no idea if you are a member or if you can get those through other channels).
From the Dr.Dobbs site:
Creating Apps with SQL Server CE and SQL RDA by Bill Wagner May 19, 2004 (Best practices for designing an application for both the desktop and mobile PC - Windows/.NET)
From arxiv.org:
A Conflict-Free Replicated JSON Datatype - the paper describes a JSON CRDT implementation (Conflict-free replicated datatypes - CRDTs - are a family of data structures that support concurrent modification and that guarantee convergence of such concurrent updates).
I would recommend that you have a timestamp column in every table and every time you insert or update, update the timestamp value of each affected row. Then, you iterate over all tables checking if the timestamp is newer than the one you have in the destination database. If it´s newer, then check if you have to insert or update.
Observation 1: be aware of physical deletes since the rows are deleted from source db and you have to do the same at the server db. You can solve this avoiding physical deletes or logging every deletes in a table with timestamps. Something like this: DeletedRows = (id, table_name, pk_column, pk_column_value, timestamp) So, you have to read all the new rows of DeletedRows table and execute a delete at the server using table_name, pk_column and pk_column_value.
Observation 2: be aware of FK since inserting data in a table that´s related to another table could fail. You should deactivate every FK before data synchronization.
If anyone is dealing with similar design issue and needs to synchronize changes across multiple Android devices I recommend checking Google Cloud Messaging for Android (GCM).
I am working on one solution where changes done on one client must be propagated to other clients. And I just implemented a proof of concept implementation (server & client) and it works like a charm.
Basically, each client sends delta changes to the server. E.g. resource id ABCD1234 has changed from value 100 to 99.
Server validates these delta changes against its database and either approves the change (client is in sync) and updates its database or rejects the change (client is out of sync).
If the change is approved by the server, server then notifies other clients (excluding the one who sent the delta change) via GCM and sends multicast message carrying the same delta change. Clients process this message and updates their database.
Cool thing is that these changes are propagated almost instantaneously!!! if those devices are online. And I do not need to implement any polling mechanism on those clients.
Keep in mind that if a device is offline too long and there is more than 100 messages waiting in GCM queue for delivery, GCM will discard those message and will send a special message when the devices gets back online. In that case the client must do a full sync with server.
Check also this tutorial to get started with CGM client implementation.
this answers developers who are using the Xamarin framework (see https://stackoverflow.com/questions/40156342/sync-online-offline-data)
A very simple way to achieve this with the xamarin framework is to use the Azure’s Offline Data Sync as it allows to push and pull data from the server on demand. Read operations are done locally, and write operations are pushed on demand; If the network connection breaks, the write operations are queued until the connection is restored, then executed.
The implementation is rather simple:
1) create a Mobile app in azure portal (you can try it for free here https://tryappservice.azure.com/)
2) connect your client to the mobile app.
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started/
3) the code to setup your local repository:
const string path = "localrepository.db";
//Create our azure mobile app client
this.MobileService = new MobileServiceClient("the api address as setup on Mobile app services in azure");
//setup our local sqlite store and initialize a table
var repository = new MobileServiceSQLiteStore(path);
// initialize a Foo table
store.DefineTable<Foo>();
// init repository synchronisation
await this.MobileService.SyncContext.InitializeAsync(repository);
var fooTable = this.MobileService.GetSyncTable<Foo>();
4) then to push and pull your data to ensure we have the latest changes:
await this.MobileService.SyncContext.PushAsync();
await this.saleItemsTable.PullAsync("allFoos", fooTable.CreateQuery());
https://azure.microsoft.com/en-us/documentation/articles/app-service-mobile-xamarin-forms-get-started-offline-data/
I suggest you also take a look at Symmetricds. it is a SQLite replication library available to android systems. you can use it to synchronize your client and server database, I also suggest to have separate databases on server for each client. Trying to hold the data of all users in one mysql database is not always the best idea. Specially if the user data is going to grow fast.
Lets call it the CUDR Sync problem (I don't like CRUD - because Create/Update/Delete are writes and should be paired together)
The problem may also be looked at from write-offliine-first or write-online-first perspective. The write-offline-approach has a problem with unique identifier conflict, and also multiple network calls for same transaction increasing risk (or cost)...
I personally find write-online-first approach easier to manage (so it will be the single source of truth - from where everything else is synced). The write-online-approach will require not letting users write offline first - they will write offline by getting ok response form online write.
He may read offline first and as soon as network is available get the data from online and update the local database and then update the ui....
One way to avoid the unique identifier conflict would be to use a combination of unique user id + table name or table id + row id (generated by sqlite)... and then use the synced boolean flag column with it.. but still the registration has to be done online first to get the unique id on which all other ids will be generated... here the issue will also be if clocks are not synced - which someone mentioned above...
Good morning, everyone,
As part of the migration from a heavy client project to a connected
application to a server (currently under study DataSnap XE10.2) in order to transfer on an ad hoc basis and retrieve information from the server.
We would like to have some feedback on other available technologies,
their durability and ease of adaptation.
Here is the profile of our application
The client connects to a remote server that can be hosted elsewhere.
There can be up to 300 clients connected at the same time over a period of 3 days.
these 300 customers can send on a variable hourly interval (1 to 2 hours and in a different way.
depending on the time of day (different countries).
These connections can transmit up to 5000 data so 300 = 1,500,000 over a period of one month.
For the moment we have chosen the DataSnap solution because it is already used on medical applications.
and especially for its ease of migration from the Delphi heavy client project to this type of architecture.
and also for his perenity with Delphi.
Our questions: what do you think?
What arguments and intermediate or other solutions do you propose? As far as RAD Server is concerned, this has a cost per license, but does it exit it examples of migration from a DataSnap application to RAD Server?
What are your experiences in these different areas? (concrete case in point)
On our side we will launch a simulation of 300 clients transmitting 5000 requests JSON REST to our DATASNAP server which will insert each of these queries into a database.
MySQL of 40GB, the insertion will return an acknowledgement of receipt and a written acknowledgement (simple boolean)
Thank you for your feedback, on our side we will publish the results of our tests
There are several solutions, but I recommend our Open Source mORMot framework.
Its SOA is based on interface type definitions, it is REST/JSON from the ground up, and was reported to have very good performance and stability, especially in respect to DataSnap. It is Open Source and work with both Delphi and FPC (also under Linux) - so could be considered as a safer solution for the middle/long term. DataSnap didn't evolve much since years, and I don't understand the RAD Server "black box" approach.
About migrating an existing database or system, check this blog article which shows some basic steps with mORMot.
You have other bricks available, like an ORM, a MVC layer for dynamic web site generation, logging, interface stubbing, high performance database layer, cross-platform clients, an exhaustive documentation and a lot of other features.
I have a massive table that records events happening on our website. It has tens of millions of rows.
I've already tried adding indexing and other optimizations.
However, it's still very taxing on our server (even though we have quite a powerful one) and takes 20 seconds on some large graph/chart queries. So long in fact that our daemon intervenes to kill the queries often.
Currently we have a Google Compute instance on the frontend and a Google SQL instance on the backend.
So my question is this - is there some better way of storing an querying time series data using the Google Cloud?
I mean, do they have some specialist server or storage engine?
I need something I can connect to my php application.
Elasticsearch is awesome for time series data.
You can run it on compute engine, or they have a hosted version.
It is accessed via an HTTP JSON API, and there are several PHP clients (although I tend to make the API calls directly as i find it better to understand their query language that way).
https://www.elastic.co
They also have an automated graphing interface for time series data. It's called Kibana.
Enjoy!!
Update: I missed the important part of the question "using the Google Cloud?" My answer does not use any specialized GC services or infrastructure.
I have used ElasticSearch for storing events and profiling information from a web site. I even wrote a statsd backend storing stat information in elasticsearch.
After elasticsearch changed kibana from 3 to 4, I found the interface extremely bad for looking at stats. You can only chart 1 metric from each query, so if you want to chart time, average time, and avg time of 90% you must do 3 queries, instead of 1 that returns 3 values. (the same issue existing in 3, just version 4 looked more ugly and was more confusing to my users)
My recommendation is to choose a Time Series Database that is supported by graphana - a time series charting front end. OpenTSDB stores information in a hadoop-like format, so it will be able to scale out massively. Most of the others store events similar to row-based information.
For capturing statistics, you can either use statsd or reimann (or reimann and then statsd). Reimann can add alerting and monitoring before events are sent to your stats database, statsd merely collates, averages, and flushes stats to a DB.
http://docs.grafana.org/
https://github.com/markkimsal/statsd-elasticsearch-backend
https://github.com/etsy/statsd
http://riemann.io/
I have 3 Beagleboards that needs to share data between each other as fast as possible. They are running debian (with a real time kernel) and are connected to each other via wlan.
All beagleboards have different sensors attached. All Beagleboards need the sensor data of the others in real time (or at least as fast as possible -the data are used in control algorithms for actuators).
The system is supposed to be used for demonstrate a concept and does not need to be 100% fault proof but as close as possible.
What is the best way to design such a system?
Ideas:
Design program for UDP broadcast and some sql server or just an object/class on the receiver end.
Embedded MySQL/High Performance MySQL with replication or cluster.
SQLite - need some addons?
Any other solutions might be better, I have never designed such a system before. Any help is much appreciated.
If "as fast as possible" is your requirement you need to do the sharing of data yourself and use the database just to store shared data.
You can implement a publisher / subscriber mechanism. One of your nodes becomes master and each of other nodes subbscribes to this node at startup. Master node multiplies and routes messages from subscribers.
Another (faster) option is implementing the publisher / subscriber mechanism without a master node. Each node registers itself to other nodes, it is similar to broadcasting you mentioned.