Pros and Cons of the MySQL Archive Storage Engine? - mysql

For a website I am developing, I am going to store user activity information (like Facebook) and HTTP requests to the server into a MySQL database.
I was going to use InnoDB as the storage engine, however, I briefly read about the Archive storage engine and this sounds like the way forward for information which is only Inserted or Selected and never Updated or Deleted.
Before I proceed, I would like to know the any pros and cons to help me make a decision whether to use it or not.

We are not currently using the Archive storage engine, but have done some research on it. One requirement of our application is that it stores lots of historical information for long periods of time. That data is never updated, but needs to be selected occasionally.
So ... with that in mind:
PROs
Very small footprint. Uses zlib for data compression.
More efficient/quicker backups due to small data file size.
CONs
Does not allow column indexes. Selects require a full table scan.
Does not support spatial data.
--
It is likely that we will not use this storage engine due to the indexing limitations.

Related

Should I switch to InnoDB for my tables

I have an PHP-based API that runs on shared hosting and uses MySQL. I've been doing reading on InnoDB vs MyISAM and wanted to paste some specific things about my API's database to make sure it makes sense to move on to InnoDB. MyISAM was set by default for these tables, so I didn't deliberately pick that database engine.
My queries are a little more writes than reads (70% writes I'd say). Reads/lookups are always by a "foreign key" (userid) (I understand MyISAM doesn't have these constraints) but might be good to know if I move since I could take advantage of that.
I don't do full text searches
My data is important to me, and I recently learned MyISAM has a risk of losing data? A few times in the past I've lost some data and just assumed it was my user's fault in how they interacted with the API. Perhaps not? I am confused about how anyone would be ok with losing data and thus choosing MyISAM so perhaps I don't understand MyISAM enough.
I'm on a shared host and they confirmed I don't have access to change settings in my.cnf, change buffers, threading, concurrency settings, etc.
I will probably switch to DigitalOcean or AWS in the future
My hosting company uses MySQL Version is 14.14 Distribution: 5.6.34
Based on these factors, my instinct is to switch all my tables to InnoDB and at least see if there are problems. If I hit an issue, I can just run the same statement but swap InnoDB with MyISAM to revert back.
Thanks so much.
Short answer: YES! MyISAM was the original format of MySQL, but many years ago InnoDB has been preferred for many reasons. On high-level picture, your app will better perform as InnoDB has a better lock management.
You can find here a longer answer to your question Should I change my DB from MyISAM to InnoDB? (AWS notification) and the following 2 articles covering migration from MyISAM to InnoDB:
https://dba.stackexchange.com/questions/167842/can-converting-myisam-to-innodb-cause-problems
https://kinsta.com/knowledgebase/convert-myisam-to-innodb/

Using Spider for mysql

I'm looking for solution to shard my data in mysql w/o changing application code and this project shows up pretty deep in google search result.
While there's not much document available about this, this seems to be a promising out of the box solution to shard ur data across many db.
This is their project description spider for mysql
The spider storage engine enables tables of different MySQL instances to be treated like a table of a same instance. Because xa transaction and partitioning is supported, it can do decentralized arrangement to two or more servers of data of same table.
The new storage engine "Spider" does work for its strong scalability to access other storage engine of MySQL, to idea to the most considerations are below;
1:Scalability
2:Faster Access
3:Data Synchronizations
4:Reduce The Cost
It's still quite an active project (it supports mysql 5.5.14 currently) but I don't see many results on the search engine. can you guys tell me why.
Since I don't have much knowledge in this field to assess this, I want to ask about advantages and disadvantages when use this kind of approach. Is the Spider storage the SPOF?
Can I have multiple Spider storage, will it affect transaction committing if I do so?
I need to consider this approach before making a decision to switch to MongoDB.
My application is a write intensive app (a social network project).
And it really needs perfect horizontal scaling in the future.
I m really curious about spiderdb...
I understand that your spider server is just a kind of 'sql router'. You have to define some sharding rules with partition comments and the server will forward and aggregate data from different shards.
Logically it seems to be a SPOF... But you could clone your spider server as many time you want to end with SPOF since SPIDER doesn't store any data. You just have to keep synchro all your spider instances...
Maybe you could do that with a replication scheme to keep spider conf synch...
As I already said, I never used this promising engine, but I m very curious and I hope you'll make a feedback return if you decide to use it
Regards

MySQL - Best way to keep database synchronized with multiple clients

There are hundreds of clients that will use a windows system in which they manage their own information. That information is sent to a server for analysis (web system).
Both, the clients and the server will have the same database structure. There are some tables that will be updated through the windows clients and other tables through the web system. Few of the tables can be updated in both.
The problem reside that we need to keep synchronized those databases.
The current approach is to store in a table all the "INSERT,UPDATE,DELETE" statements and once a day update the clients (on open) and the server (on close).
I don't like the current approach as I think it is not very secure (even we are using strong encryption on the data), and I believe there is a better way to do it.
I just migrated to use MariaDB and I was reading this about the Aria Storage Engine:
Aria can replay almost everything from
the log. (Including
create/drop/rename/truncate tables).
Therefore, you make a backup of Aria
by just copying the log.
Questions:
Do you think it is possible to use Aria Logging to solve my problem (how does that is different of the current approach)?
Does anyone has any experience with a similar case (multi-client synchronization)?
I just have experience with MyISAM and InnoDB... is there any other Storage Engine that could be better for this case?
### UPDATE (Jul 2nd)###
I explored the possibility to use MySQL binlogs and its automatic replication. IMHO, that method is ideal for those cases in which the server data is exactly the same as in the clients. BLACKHOLE storage engine can be used to synchronize only determined tables (which may be very useful). In my case, the server and the clients have the same structure BUT they don't have the same data. In other words, the server contains all client's data, and each client has their own set of data (keeping the same structure).
Trying to apply MySQL's automatic replication, will require to have in the server a database per client, which makes it more complicated.
I think I will stick to the original plan as it gives me the flexibility to easily query the changes per client (until I found a nicer way to do it).

MySQL MyISAM data loss possibilities?

Many sites and script still use MySQL instead of PostgreSQL. I have a couple low-priority blogs and such that I don't want to migrate to another database so I'm using MySQL.
Here's the problem, their on a low-memory VPS. This means I can't enable InnoDB since it uses about 80MB of memory just to be loaded. So I have to risk running MyISAM.
With that in mind, what kind of data loss am I looking at with MyISAM? If there was a power-outage as someone was saving a blog post, would I just lose that post, or the whole database?
On these low-end-boxes I'm fine with losing some recent comments or a blog post as long as the whole database isn't lost.
MyISAM isn't ACID compliant and therefore lacks durability. It really depends on what costs more...memory to utilise InnoDB or downtime. MyISAM is certainly a viable option but what does your application require from the database layer? Using MyISAM can make life harder due to it's limitations but in certain scenarios MyISAM can be fine. Using only logical mysqldump backups will interrupt your service due to their locking nature. If you're utilising binary logging you can back these up to give you incremental backups that could be replayed to aid recovery should something corrupt in the MyISAM tables.
You might find the following MySQL Performance article of interest:
For me it is not only about table locks. Table locks is only one of MyISAM limitations you need to consider using it in production. Especially if you’re comming from “traditional” databases you’re likely to be shocked by MyISAM behavior (and default MySQL behavior due to this) – it will be corrupted by unproper shutdown, it will fail with partial statement execution if certain errors are discovered etc...
http://www.mysqlperformanceblog.com/2006/06/17/using-myisam-in-production/
The MySQL manual points out the types of events that can corrupt your table and there is an article explaining how to use myisamchk to repair tables. You can even issue a query to fix it.
REPAIR TABLE table;
However, there is no information about whether some types of crashes might be "unfix-able". That is the type of data loss that I can't allow even if I'm doing backups.
With a server crash your auto increment primary key can get corrupted, so your blog post IDs can jump from 122, 123, 75912371234, 75912371235 (where the server crashed after 123). I've seen it happen and it's not pretty.
You could always get another host on the same VLAN that is slaved to your database as a backup, this would reduce the risk considerably. I believe the only other options you have are:
Get more RAM for your server or kill of some services
See if your host has shared database hosting of any kind on the VLAN you can use for a small fee.
Make regular backups and be prepared for the worst.
In my humble opinion, there is no kind of data loss with MyISAM.
The risk of data loss from a power outage is due to the power outage, not the database storage mechanism.

Which database for a web crawler, and how do I use MySQL in a distributed environment?

Which database engine should I use for a web crawler, InnoDB or MYiSAM? I have two PC's, each with 1TB hard drives. If one fills up, I'd like for it to save to the other PC automatically, but reads should go to the correct PC; how do I do that?
As for the first part of your question, it rather depends on you precise implementation. If you are going to have a single crawler limited by network bandwidth, then MYiSAM can be quicker. If you are using multiple crawlers then InnoDB will give you advantages such as transactions which may help.
AFAIK MySQL doesn't support the hardware configuration you are suggesting. If you need large storage you may wan tot look at MySQL Cluster.
MyISAM is the first choice, because you will have write only operations and crawlers -- even run in parallel -- will be configured -- I suppose -- to crawl different domains/urls. So you do not need to take care of access conflicts.
When writing a lot of data, especially text!, to Mysql avoid transactions, indexes, etc., because it will slow down MySQL drastically.