Can we use the Ethereum network just like a database to store data. What might be the possible issues that can occur if it is used as a database.
Yes, it's possible. Just write a smart contract to store and retrieve your data.
Google the term "Solidity CRUD" for articles and tutorials for storing data in on Ethereum.
The downsides are:
Speed - Blockchains are slow to write and not fast to read. Ethereum will never be able to compete with even low performance databases like SQLite much less go against Postgres, Oracle or MongoDB.
Cost - Reading from Ethereum is free but writes cost Ether. The exact cost depends on the size of the data you want to store. For small amounts of data this does not matter much. For services you can even make this part of the API that your users will pay writes (such as buying a ticket form you) so it doesn't cost you anything. But if you have gigabytes of legacy data migrating it to the blockchain can be very expensive.
On top of that, doing large data transfer to the blockchain will see demand for transactions spike which will increase the cost per transaction. This is not just theoretical, it has happened before - when the cryptokitties smart contract was launched the game suddenly became so popular that transactions went from less than one cent per transaction to tens of dollars per transaction (USD).
In general you'd want to store only the core data that you need to be secure on Ethereum and link it to other data sources (for example store a URL link and hash of the object but store the object itself on Amazon S3 or Azure Storage)
Related
I set up Google Cloud MySQL, I store there just one user (email, password, address) and I'm querying it quite often due to testing purposes of my website. I set up minimal zone availability, the lowest SSD storage, memory 3.75GB, 1vCPUs, automatic backups disabled but running that database from the last 6 days costing me £15... How can I decrease the costs of having MySQL database in the cloud? I'm pretty sure paying that amount is way too much. Where is my mistake?
I suggest using the Google Pricing Calculator to check the different configurations and pricing you could have for a MySQL database in Cloud SQL.
Choosing Instance type
As you've said in your question, you're currently using the lowest standard instance, which is based on CPU and memory pricing.
As you're currently using your database for testing purposes, I could suggest to configure your database with the lowest Shared-Core Machine Type which is db-f1-micro, as shown here. But note that
The db-f1-micro and db-g1-small machine types are not included in the Cloud SQL SLA. These machine types are designed to provide low-cost test and development instances only. Do not use them for production instances.
Choosing Storage type
As you have selected the lowest allowed disk space, you could lower cost changing the storage type to HDD instead of a SSD if you haven't done so, as stated in the documentation:
Choosing SSD, the default value, provides your instance with SSD storage. SSDs provide lower latency and higher data throughput. If you do not need high-performance access to your data, for example for long-term storage or rarely accessed data, you can reduce your costs by choosing HDD.
Note that Storage type could only be selected when you're creating the instance and could not be changed later, as stated in the message when creating your instance.
Choice is permanent. Storage type affects performance.
Stop instance when is not in use
Finally, you could lower costs by stopping the database instance when it is not in use as pointed in the documentation.
Stopping an instance suspends instance charges. The instance data is unaffected, and charges for storage and IP addresses continue to apply.
Using Google Pricing Calculator
The following information is presented as a calculation exercise based in the Google Pricing Calculator
The estimated fees provided by Google Cloud Pricing Calculator are for discussion purposes only and are not binding on either you or Google. Your actual fees may be higher or lower than the estimate. A more detailed and specific list of fees will be provided at time of sign up
Following the suggestions above, you could get a monthly estimate of 6.41 GBP. Based on a 24 hour per 7 days running instance.
And using a SSD, it increases to 7.01 GBP. As said before, the only way to change the storage type would be to create a new instance and load your data.
And this could lower to 2.04 GBP if you only run it for 8 hours 5 days a week running on HDD.
Is there any tools or proper way to handle more than 2000 requests (Mostly write request) per second to mysql database? Without reaching queuelimit.
There are a few different ways to handle massive amounts of requests to a MySQL (or any other relational/RDB) database. Starting out with growing traffic you can employ replication which allows for additional machines to send read-only (no INSERTs, UPDATEs, DELETEs, etc.) from one machine and to only write to a single "master" machine (the read replicas copy the written data from the master or write-allowed instance but may be slightly behind the latest data written for a short period of time). Oracle (owner of the MySQL project) has a good article about it (and scaling PHP) here: http://www.oracle.com/technetwork/articles/dsl/white-php-part1-355135.html
Once your app begins taking on requests on a truly massive scale (like Facebook, Google, etc. level) you will want to consider other strategies such as clustering, utilizing NoSQL (for certain functions such as search, analytics, logging, monitoring, etc.), splitting tables and databases based on geographic regions (if it makes sense). There is a starter white paper here: https://www.mysql.com/why-mysql/white-papers/guide-to-scaling-web-databases-with-mysql-cluster/
You can also conduct generic searches for "scaling MySQL" which deliver even more results.
MariaDB 10+ comes with Galera Cluster that allows you to have multiple MASTER servers and you can load balance either by IP or through a device.
Also, the number or requests/second are dependent on how fast a write is completed. If you have a simple atomic raw write, you can turn off INDEXES on the receiving table, so it's as fast as your server can handle. That raw table can by MyISAM and not InnoDB. That's usually up to 10x faster in writes. Have another process read the raw data in bulk into another table with proper indexes. We've had success with up to 10K transactions/second this way
We are building a SaaS backend for restaurants using Rails. We integrate directly with POS, so each POS keeps sending customer orders that we store for later processing. we have this POS integration going at about 1,000 locations which send us about 3 Million individual customer orders on monthly basis.
for this write-heavy app, we store all orders in redis which is working beautifully. We are growing at incredible pace, we keep adding new restaurants with hundreds of locations that keep sending us crazy amount of data. Except there is one problem -- redis keeps running out of memory every month! As, everything which doesn't have to be in memory is in memory.
This is why we contemplating to switch to mysql. As we really don't need to keep all data in memory. here are we numbers of current redis database:
used_memory_human:39.83G
dbsize: 34706870
Here is what we store in redis as Hash:
id - integer
location_id - integer
stored_at - timestamp
token - string
transaction_no - integer
menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty)
order_amount - decimal
order_subtotal_amount - decimal
order_amount_payable - decimal
order_datetime - timestamp
employee_id - integer
employee_name - string
pos_type - string
post_version - string
restaurant_id - integer
So, looking for some advice on:
moving from redis to mysql is good idea? how will it effect us in long run as we will need to keep updating our indexes & partition scheme to cater to huge demand.
What other databases(relational or non-relational) would be suited for this use case than redis?
Or we are all wrong, as redis is made for storing this type of data. so, we just keep using redis & upgrading our machines every month?
Data on the web is bound to grow. Any long-term project should anticipate this, and have a strategy for scaling.
As your volume of data or volume of traffic increases, you will find that approximately every order of magnitude growth requires changes to your architecture to handle it. Maybe you can be ahead of the curve a bit, but not forever. And you can't predict where your bottlenecks will be very far in advance.
It's common for a small subset of your data to be important for minute-to-minute work of your app, and you can keep this subset in Redis to take advantage of your current code. Then the rest of the data can be available in another data store, perhaps a bit slower to access, but much easier to handle growth.
You could scrap your current code and move everything to MySQL or another datastore, but keep two things in mind:
There is no database that will allow you to neglect having a scaling strategy. You could use MySQL, or PostgreSQL, or MongoDB, or Hadoop, or anything else, and you will still have the problem that your data is growing faster than a single database on a single server can handle.
It's generally not cost-effective to rewrite your app from the ground up for internal reasons of more efficient development or operations (read Things You Should Never Do, Part I by Joel Spolsky).
I'd recommend keeping your Redis app, but try to move historical data to another datastore.
I think MySQL is a fine choice, I'm sure it would be capable of handling your data. I work with clients regularly who keep terrabytes of data in MySQL, and handle tens of thousands of transactions per second. But since you haven't given any details about your usage of data, I can't offer an opinion about whether MySQL is the best choice. It could be Hadoop would have advantages, for example.
moving from redis to mysql is good idea? how will it effect us in long run as we will need to keep updating our indexes & partition scheme to cater to huge demand.
My vote is moving off of Redis is probably a good idea if you're concerned about the cost of hosting due to the necessity of keeping all data in memory. This doesn't have to involve moving all the data off of Redis, perhaps just the historical "colder" data where you care less about latency. The other advantage of moving the cold data off Redis is that any bugs that are found during the migration are likely to have a less significant impact.
What other databases(relational or non-relational) would be suited for this use case than redis?
This is a tough question to answer without better understanding your use case. That said I think any number of scalable relational DBs are probably good enough for your workload. A key requirement in my mind would be the ability to easily add/remove machines to scale as needed. A personal favorite is CitusDB but there are various options.
One trade-off to be aware of when moving to a relational database is that you'll potentially have more work to do when managing structured data then you would with Redis key/value store. For example, adding new fields could involve schema changes. PostgreSQL (and CitusDB) have support for some semi-structured data types which make this easier, I'm sure there's other relational databases that have similar features.
If mysql (or any other traditional Database) would suffice why did you go for Redis in the first place?
"we store for later processing" is vague. Can you please elaborate on this? I assume, this later processing is an Analysis kind of activity for which latency doesn't really matter and only throughput matters, right? If that's the case Redis was an overkill don't you think?
Have you consider compressing the data before dumping it to Redis.
From what I understood from your question is, your data is always structured, your READ is non-real time, "Durability" matters to you than the latency. If all of this assumption is correct, mysql is a safe choice. If you ever hit WRITE bottleneck you can think about Sharding.
This thread will give you a fair idea.
Can redis fully replace mysql?
Always keep in mind that most of the NoSQL solutions(including Redis) are fast because they trade ACID properties for speed. But here, in your case, from what I understood, ACID properties matters more.
With the upcoming 3.0 of Redis, the cluster functionality will be ready for production. Have a look a http://redis.io/topics/cluster-tutorial to get an overview. This will not directly help concerning the growing data volumes, but I assume this could make scaling/sharding easier for your setup.
What you also could consider is to move "old" data from Redis to another system, for example ElasticSearch with the help of a Redis River:
https://github.com/leeadkins/elasticsearch-redis-river
Compression using MessagePack could also be an option:
http://msgpack.org/
http://ruby.msgpack.org/
Storing a MessagePacked hash in Redis
http://redis.io/commands/EVAL
Which are the performance considerations I should keep in mind when I'm planning an SQL Azure application? Azure Storage, and the worker and the web roles looks very scalable, but if at the end they are using one database... it looks like the bottleneck.
I was trying to find numbers about:
How many concurrent connections does
SQL Azure support?
Which is the bandwidth?
But no luck.
For example, I'm planning and application that uses a very high level of inserts, but I need return the result of an aggregate function each time (e.g.: the sum of all records with same key in a column), so I can not go with table storage.
Batching is an option, but time response is critical as well, so I'm afraid the database will be bloated with lot of connections.
Sharding is another option, but even when the amount of inserts is massive, the amount of data is very small, 4 to 6 columns with one PK and no FK. So even a 1Gb DB would be an overkill (and an overpay :D) for a partition.
Which would be the performance keys I should keep in mind when I'm facing these kind of applications?
Cheers.
Achieving both scalability and performance can be very difficult, even in the cloud. Your question was primarily about scalability, so you may want to design your application in such a way that your data becomes "eventually" consistent, using queues for example. A worker role would listen for incoming insert requests and would perform the insert asynchronously.
To minimize the number of roundtrips to the database and optimize connection pooling make sure to batch your inserts as well. So you could send 100 inserts in one shot. Also keep in mind that SQL Azure now supports MARS (multiple active recordsets) so that you can return multiple SELECTs in a single batch back to the calling code. The use of batching and MARS should reduce the number of database connections to a minimum.
Sharding usually helps for Read operations; not so much for inserts (although I never benchmarked inserts with sharding). So I am not sure sharding will help you that much for your requirements.
Remember that the Azure offering is designed first for scalability and reasonable performance in a multitenancy environment, where your database is shared with others on the same server. So if you need strong performance with guaranteed response time you may need to reevaluate your hosting choices or indeed test the performance boundaries of Azure for your needs as suggested by tijmenvdk.
SQL Azure will throttle your connections if any form of resource contention occurs (this includes heavy load but might also occur when your database is physically moved around). Throttling is non-deterministic, meaning that you cannot predict if and when this happens. When throttling, SQL Azure will drop your connection, requiring you to perform a retry. Number of connections supported and bandwidth is not published "by design" due to the flexible nature of the underlying infrastructure. Having said that, the setup is optimized for high availability, not high throughput.
If the bursts happen at a known time, you might consider sharding just during those bursts and consolidating the data after the burst has happened. Another way to handle this, is to start queueing/batching writes if and only if throttling occurs. You can use an Azure Queue for that plus a worker role to empty the queue later. This "overflow mechanism" has the advantage of automatically engaging if throttling occurs.
As an alternative you could use Azure Table Storage and keep a separate table of running totals that you can report back instead of performing an aggregation over the data to return the required sum of all records (this might be tricky due to the lack of locking on the tables though).
Apologies for stating the obvious, but the first step would be to test if you run into throttling at all in your scenario. I would give the overflow solution a try.
INFORMIX-SQL 7.32 (SE) Linux Pawnshop App.
I have some users who own several pawnshops within a 100-mile radius. Each pawnshop app runs with SE. The only functionality these owners need are: ability to remotely login to any store in order to view transactions, running totals and consolidate daily totals at end of business day. This can be accomplished with dialup modems, as the app doesnt have any need for displaying BLOB's. At end-of-day, each stores totals are unloaded to a flat file and transferred to the owner's system.
What would my owners gain by converting to distributed db's?.. ability to find out if a stores customer has conducted business in another store or if another store has a desired inventory item for sale? (not important, seldomly happens). Most customers will usually do business with the same store and if they dont have a desired item for sale, they will visit the closest competitors pawnshop. What gains would distributed db's offer to accomplish the same functionality as described in the first paragraph?.. Pawnshop owners absolutely refuse to connect their production systems via the internet! They dont trust its security, even using VPN, Cisco, etc, or its reliability! In this part of the world, ISP's have a bad track record for uptime. I know of several apps which have converted from web to dialup because of comm problems!
Distributed DBs, more precisely Informix XPS and IDS, don't have just one advantage. If you care just about getting data from different places, you can accomplish it with just a design strategy. If you add a "branch_id", or something like that, you're done.
Distributed DBs have a lot of advantages, from availability to scalability. You must review all these things first.
Sorry for this kind of answer, but is really difficult to give you an straight answer about this topic.
CouchDB is a peer based distributed database system. Any number of CouchDB hosts (servers and offline-clients) can have independent “replica copies” of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes are replicated bi-directionally.
CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication. Most applications require no special planning to take advantage of distributed updates and replication.
Unlike cumbersome attempts to bolt distributed features on top of the same legacy models and databases, it is the result of careful ground-up design, engineering and integration. The document, view, security and replication models, the special purpose query language, the efficient and robust disk layout are all carefully integrated for a reliable and efficient system.
If you are not going to have general 90%+ uptime connection between the databases, then there isn't any benefit to distributed databases.
One main benefit is to give large businesses a 'failover' when one machine goes down or is unavailable. If they have the database distributed over three or four machines, then the loss of one doesn't impact their ability to do business.
A second major benefit is when a database is simply too big for one server to cope with. 'Internet scale' databases (Amazon, Twitter, etc) have that level of traffic. Walmart would have that level of traffic. A couple of storefront operations wouldn't.
I think that this is a context where there is little to gain from distributed database operation.
If you were to go towards distributed operation, I'd probably look towards using a simple ER topology, with the 'head office' store being the primary (root) node and the other shops being leaf nodes. You would then have changes to the individual store databases replicated to the HQ node; you might or might not also propagate the data back to the other stores. Especially with just two stores, you might in fact simply replicate all the information to both stores; this gives you an automatic off-site backup of the database. (You'd probably configure all nodes as root nodes in this case - at least, until a chain grew to, say, five or six nodes.)
This would give you some resiliency for disaster recovery. It would also allow the HQ (in particular) to see what is going on at each store.
My impression is that you are probably not discussing 'transactions per second' on average; the rate of transactions at a single store is probably a few transactions per minute, with 'few' possibly being less than one TPM. Consequently, the network bandwidth is unlikely to be a bottleneck at any point, even with dial-up speeds (though that might be borderline).