Does Geomesa provide ability to create HBase table snapshots? If yes, then how it works with primary and index table? To ensure index table and primary table are in sync, What it does?
GeoMesa does not provide any mechanism to take snapshots in HBase, however, the standard HBase snapshot mechanisms work fine. As long as you're not performing any administrative operations on GeoMesa while taking the snapshots there won't be any issues around keeping the GeoMesa metadata table and the index tables in sync.
Related
I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
Disable Autocommit
Use COPY (for mass import, mentioned above)
Remove Indexes
Remove Foreign Key Constraints (mentioned above)
Increase maintenance_work_mem (changing the memory allocation of your database engine)
Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.
I'm trying to import a large SQL file that was generated by mysqldump for an InnoDB table but it is taking a very long time even after adjusting some parameters in my.cnf and disabling AUTOCOMMIT (as well as FOREIGN_KEY_CHECKS and UNIQUE_CHECKS but the table does not have any foreign or unique keys). But I'm wondering if it's taking so long because of the several indexes in the table.
Looking at the SQL file, it appears that the indexes are being created in the CREATE TABLE statement, prior to inserting all the data. Based on my (limited) research and personal experience, I've found that it's faster to add the indexes after inserting all the data. Does it not have to check the indexes for every INSERT? I know that mysqldump does have a --disable-keys option which does exactly that – disable the keys prior to inserting, but apparently this only works with MyISAM tables and not InnoDB.
But why couldn't mysqldump not include the keys with the CREATE TABLE statement for InnoDB tables, then do an ALTER TABLE after all the data is inserted? Or does InnoDB work differently, and there is no speed difference?
Thanks!
I experimented with this concept a bit at a past job, where we needed a fast method of copying schemas between MySQL servers.
There is indeed a performance overhead when you insert to tables that have secondary indexes. Inserts need to update the clustered index (aka the table), and also update secondary indexes. The more indexes a table has, the more overhead it causes for inserts.
InnoDB has a feature called the change buffer which helps a bit by postponing index updates, but they have to get merged eventually.
Inserts to a table with no secondary indexes are faster, so it's tempting to try to defer index creation until after your data is loaded, as you describe.
Percona Server, a branch of MySQL, experimented with a mysqldump --optimize-keys option. When you use this option, it changes the output of mysqldump to have CREATE TABLE with no indexes, then INSERT all data, then ALTER TABLE to add the indexes after the data is loaded. See https://www.percona.com/doc/percona-server/LATEST/management/innodb_expanded_fast_index_creation.html
But in my experience, the net improvement in performance was small. It still takes a while to insert a lot of rows, even for tables with no indexes. Then the restore needs to run an ALTER TABLE to build the indexes. This takes a while for a large table. When you count the time of INSERTs plus the extra time to build indexes, it's only a few (low single-digit) percents faster than inserting the traditional way, into a table with indexes.
Another benefit of this post-processing index creation is that the indexes are stored more compactly, so if you need to save disk space, that's a better reason to use this technique.
I found it much more beneficial to performance to restore by loading several tables in parallel.
The new MySQL 8.0 tool mysqlpump supports multi-threaded dump.
The open-source tool mydumper supports multi-threaded dump, and also has a multi-threaded restore tool, called myloader. The worst downside of mydumper/myloader is that the documentation is virtually non-existant, so you need to be an intrepid power user to figure out how to run it.
Another strategy is to use mysqldump --tab to dump CSV files instead of SQL scripts. Bulk-loading CSV files is much faster than executing SQL scripts to restore the data. Well, it dumps an SQL file for the table definition, and a CSV for the data to import. It creates separate files for each table. You have to manually recreate the tables by loading all the SQL files (this is quick), and then use mysqlimport to load the CSV data files. The mysqlimport tool even has a --use-threads option for parallel execution.
Test carefully with different numbers of parallel threads. My experience is that 4 threads is the best. With greater parallelism, InnoDB becomes a bottleneck. But your experience may be different, depending on the version of MySQL and your server hardware's performance capacity.
The fastest restore method of all is when you use a physical backup tool, the most popular is Percona XtraBackup. This allows for fast backups and even faster restores. The backed up files are literally ready to be copied into place and used as live tablespace files. The downside is that you must shut down your MySQL Server to perform the restore.
I am planning a centralize log server, which receiving syslog from many devices (up to 2000).
It should have ability to query & sort events.
I have read Mysql storage engine for log table
Since MyISAM is better on selecting, while InnoDB is better on writing. How about using hybrid engine to gain different benefit ?
Use innoDB for writing, to get benefit from row-locking.
Use MyISAM for read-only ancient log.
Use Merge to split large table into many smaller tables.
Here are the steps:
Create InnoDB table A, syslog-ng will insert row to it when receiving log.
Create MyISAM table named 'yyyymmdd' every midnight, move rows from table A to it. The data will keep persistently.
Using a table with merge engine, merge each 'yyyymmdd' for query operation.
Considering both query and write, is it more efficient strategy than using a single innoDB/MyISAM table?
Environment: JSF2, persistence with Hibernate, MySQL
I have a database that is rappidly filling because of a table with image data. That data is never searched but only directly accessed by id. Problem is that it stays in the database and so enlarges the backups and the runtime memory usage of the database.
I'm thinking that there could possibly be multiple solutions:
Tell MySQL that the table should not be cached and/or kept in memory.
Don't use MySQL at all for that table. Just let persistence know that this should be stored on disk directly.
???
But I haven't found a way to do either. Please advice.
Thanks,
Milo van der Zee
Storage type depends on storage engine in MySQL. Only tables having MEMORY storage engine are stored in RAM others are stored on disk.
In select queries you can use SELECT SQL_NO_CACHE to tell MySQL not to cache query data in MySQL query cache.
You can partition the table by defining partitions on table. This will make inserts and selects faster.
You can also create day wise tables like table_name_2012_07_20 and archive tables with old dates and to store data in compress format you can either use Archive storage engine or if you are using MyIsam storage engine then do myisamchk or myisampack to save disk space on the hard drive.
due to internal reasons (framework structure) I save many images in a table with a mediumBLOB.
Considering the query to retrive these images are sent with a pretty low rate is there a way to tell mysql to keep off this table from memory? I don't want to have a table of 2GBs in memory used only once in a while.
Are there any way to optimize this?
(Note: if this helps I can move this table in a new database containing only this table)
Thanks
MySQL won't generate in-memory table for BLOB types, as the storage engine doesn't support it.
http://dev.mysql.com/doc/refman/5.0/en/internal-temporary-tables.html
"Some conditions prevent the use of an in-memory temporary table, in which case the server uses an on-disk table instead:
Presence of a BLOB or TEXT column in the table"
Which means you should put the BLOB into a different table, and leave other useful data in a BLOBless table so that table will be optimized.
Moving that sort of table into a separate database sounds like a perfectly valid approach to me. At work we have one database server for our operational content (orders, catalogue etc) and one for logs (web logs and copies of emails) and media (images, videos and other binaries). Even running separate instances on the same machine can be worthwhile since, as you mentioned, it partitions the buffer cache (or whatever your storage engine's equivalent is).