Modeshape: how to configure JDBC storage option for the indexes - configuration

I am using modeshape 3.6.0.Final with JBOSS 6 EAP.
According to https://github.com/ModeShape/modeshape/blob/master/deploy/jbossas/modeshape-jbossas-subsystem/src/main/resources/schema/modeshape_1_0.xsd the previous element
cache-index-storage
for configuring the cache amongst others via the JDBC have been removed.
I found some all configurations and that is exactly what i would like to achieve:
<cache-index-storage
cache-container-jndi-name="index_cache_container"
lock-cache-name="index_locks"
data-cache-name="index_data"
metadata-cache-name="index_metadata"/>
and the cache container
<!-- index storage, metadata and locks -->
<cache-container name="index_cache_container" default-cache="index_data">
<local-cache name="index_locks">
<transaction mode="NON_XA"/>
<string-keyed-jdbc-store datasource="java:jboss/datasources/MySqlDS" passivation="false" purge="false">
<property name="databaseType">MYSQL</property>
<property name="createTableOnStart">true</property>
<string-keyed-table prefix="stringbased">
<id-column name="id_column" type="VARCHAR(255)"/>
<data-column name="data_column" type="BLOB"/>
<timestamp-column name="timestamp_column" type="BIGINT"/>
</string-keyed-table>
</string-keyed-jdbc-store>
</local-cache>
<local-cache name="index_data" batching="true">
<transaction mode="NON_XA"/>
<string-keyed-jdbc-store datasource="java:jboss/datasources/MySqlDS" passivation="false" purge="false">
<property name="databaseType">MYSQL</property>
<property name="createTableOnStart">true</property>
<string-keyed-table prefix="stringbased">
<id-column name="id_column" type="VARCHAR(255)"/>
<data-column name="data_column" type="BLOB"/>
<timestamp-column name="timestamp_column" type="BIGINT"/>
</string-keyed-table>
</string-keyed-jdbc-store>
</local-cache>
<local-cache name="index_metadata">
<transaction mode="NON_XA"/>
<string-keyed-jdbc-store datasource="java:jboss/datasources/MySqlDS" passivation="false" purge="false">
<property name="databaseType">MYSQL</property>
<property name="createTableOnStart">true</property>
<string-keyed-table prefix="stringbased">
<id-column name="id_column" type="VARCHAR(255)"/>
<data-column name="data_column" type="BLOB"/>
<timestamp-column name="timestamp_column" type="BIGINT"/>
</string-keyed-table>
</string-keyed-jdbc-store>
</local-cache>
</cache-container>
Can anyone give me an hint on how to configure the cache for indexes for modeshape 3.6.0.Final such that they are stored in a database?
Thanks in advance for your help?

The ModeShape community removed support for storing Lucene indexes inside a database because it performed horribly, especially for writes since a concurrent changes to content on different processes each compete for writes to the database. Even in a non-clustered topology, storing indexes in a database is simply not recommended due to performance.
It is much better to have each process in the cluster maintain its own complete copy of the index. Yes, this does add work to each write (since the write has to be done on each process), but it significantly improves query performance and eliminates the potential for cluster-wide write conflicts, thus increasing the update throughput of the system.
Of course it is still possible to store indexes in Infinispan. ModeShape kept this option because Infinispan can be configured in ways that don't use a shared storage (essentially making each process have independent indexes), and in these configurations there is no cluster-wide index write conflict. But storing the indexes in a shared database will again there is likely to be cluster-wide index write conflicts.
You can try it out if you want, and if you do make sure that each of the three caches are stored in a separate database table (using a unique value for the "prefix" attribute).
However, I would strongly encourage you to not store your indexes in a relational database or another location shared by multiple processes in the cluster.

Related

How to fill for the first time a SQL database with multiple tables

I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
Disable Autocommit
Use COPY (for mass import, mentioned above)
Remove Indexes
Remove Foreign Key Constraints (mentioned above)
Increase maintenance_work_mem (changing the memory allocation of your database engine)
Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.

how to determine c3p0 max_statements

I'm wondering how to properly determine what value to use for c3p0 max_statements. I've experienced some caching dead locks which seems to point back to my max_statements configuration based on all the SO Q&A I've read.
I'm using mysql and the deadlock appears to happen when I'm doing some multi threading where I have 4 active threads.
My configuration
<property name="hibernate.connection.provider_class">org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider</property>
<property name="hibernate.c3p0.min_size">10</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.max_size">50</property>
<property name="hibernate.c3p0.idle_test_period">1800</property>
<property name="hibernate.c3p0.timeout">3600</property>
The exception
[WARN] async.ThreadPoolAsynchronousRunner com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector#72df1587 -- APPARENT DEADLOCK!!! Complete Status:
Managed Threads: 3
Active Threads: 3
Active Tasks:
com.mchange.v2.c3p0.stmt.GooGooStatementCache$StatementDestructionManager$1UncheckedStatementCloseTask#e877a61
on thread: C3P0PooledConnectionPoolManager[identityToken->1hge0wd9a1o1iea71i8u346|1a799bb]-HelperThread-#2
com.mchange.v2.c3p0.stmt.GooGooStatementCache$StatementDestructionManager$1UncheckedStatementCloseTask#109b1150
on thread: C3P0PooledConnectionPoolManager[identityToken->1hge0wd9a1o1iea71i8u346|1a799bb]-HelperThread-#0
com.mchange.v2.c3p0.stmt.GooGooStatementCache$StatementDestructionManager$1UncheckedStatementCloseTask#3eb42946
on thread: C3P0PooledConnectionPoolManager[identityToken->1hge0wd9a1o1iea71i8u346|1a799bb]-HelperThread-#1
Pending Tasks:
com.mchange.v2.c3p0.stmt.GooGooStatementCache$1StmtAcquireTask#52729f95
Pool thread stack traces:
Thread[C3P0PooledConnectionPoolManager[identityToken->1hge0wd9a1o1iea71i8u346|1a799bb]-HelperThread-#0,5,main]
com.mysql.jdbc.PreparedStatement.realClose(PreparedStatement.java:2765)
com.mysql.jdbc.StatementImpl.close(StatementImpl.java:541)
com.mchange.v1.db.sql.StatementUtils.attemptClose(StatementUtils.java:53)
com.mchange.v2.c3p0.stmt.GooGooStatementCache$StatementDestructionManager$1UncheckedStatementCloseTask.run(GooGooStatementCache.java:938)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:648)
Thread[C3P0PooledConnectionPoolManager[identityToken->1hge0wd9a1o1iea71i8u346|1a799bb]-HelperThread-#1,5,main]
com.mysql.jdbc.PreparedStatement.realClose(PreparedStatement.java:2765)
com.mysql.jdbc.StatementImpl.close(StatementImpl.java:541)
com.mchange.v1.db.sql.StatementUtils.attemptClose(StatementUtils.java:53)
com.mchange.v2.c3p0.stmt.GooGooStatementCache$StatementDestructionManager$1UncheckedStatementCloseTask.run(GooGooStatementCache.java:938)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:648)
Thread[C3P0PooledConnectionPoolManager[identityToken->1hge0wd9a1o1iea71i8u346|1a799bb]-HelperThread-#2,5,main]
com.mysql.jdbc.PreparedStatement.realClose(PreparedStatement.java:2765)
com.mysql.jdbc.StatementImpl.close(StatementImpl.java:541)
com.mchange.v1.db.sql.StatementUtils.attemptClose(StatementUtils.java:53)
com.mchange.v2.c3p0.stmt.GooGooStatementCache$StatementDestructionManager$1UncheckedStatementCloseTask.run(GooGooStatementCache.java:938)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:648)
So my question is how do I determine what these values should be. I'm sure there must be away to do this without guessing.
Articles I've read
should i activate c3p0 statement pooling?
How to trace and prevent the deadlock appeared in c3po which is running in seperate processes?
To resolve deadlocks associated with Statement caching under Oracle / jTDS / mySQL, please make sure you are using a recent c3p0 (0.9.5.1 is the current version), and please see statementCacheNumDeferredCloseThreads and Configuring Statement Pooling.
TL;DR set config param
<property name="hibernate.c3p0.statementCacheNumDeferredCloseThreads">1</property>
The exact value of max_statements is only incidentally associated with this issue. If max_statements is too small, you will churn through statements unnecessarily, and this issue, associated with the fragility of PreparedStatement.close(), in some drivers will appear more frequently.
However, your value for hibernate.c3p0.max_statements is too small for a pool of maxPoolSize 50. Even after you fix the deadlock issue, churning through statements will diminish or kill any performance benefit from the statement cache. To compute a good value for hibernate.c3p0.max_statements (which maps to c3p0.maxStatements), count the number of distinct PreparedStatements that are used frequently in your application and multiply that by maxPoolSize (or in your case hibernate.c3p0.max_size). Or, alternatively, just set hibernate.c3p0.maxStatementsPerConnection to the number of distinct PreparedStatements used frequently by your application.
Please see maxStatements, maxStatementsPerConnection, and Configuring Statement Pooling.

MySQL cloning aggregated database from an existing database

We have a MySQL database based on InnoDB. We are looking to build an Analytics system for this data. We are thinking to create a cloned database that denormalizes the data to prevent join and uses MyIsam for faster querying. This second database will also facilitate avoiding extra load on the main database to which the data will be written.
Apart from this, we are also creating some extra tables that will store aggregated numbers to avoid recalculation.
I am wondering how can I sync these tables once every day to keep them updated. It looks similar to Master-slave config of MySQL which uses binary log. But in our case, the second database is not an exact slave. Are there any open-source reliable tools or any other ideas which I can use to write an 'update mechanism'?
Thanks in advance.

Transfer MySQL data from machineX to machineY

I want to collect MySQL data from 10 different machines and aggregate into a one big MySQL db on a different machine. All machines are Linux based.
What is the "mysqldump" syntax if I want to do this periodically to collect only the "delta" data?
Are there any other ways to achieve this?
This isn't natively supported in MySQL. You could use replication, but a replica can have only a single master, not 10 masters. I know of two workable options:
1) is to script something up that switches the replica between masters in a round-robin fashion. You might wish to refer to http://code.google.com/p/mysql-mmre/ or http://thenoyes.com/littlenoise/?p=117.
2) is to use an ETL tool.
If you get stuck, we (Percona) can help you. This is a common request, but not an easy one, because each case is different.
mysqldump can't generate incremental backups, as it doesn't have any way of determining which rows (or what parts of the schema!) have changed since the last backup, or indeed even when the last backup was. For that you'd need something which could read the MySQL binlog and convert it into a bunch of INSERT/UPDATE/DELETE statements; I'm not aware of anything that exists quite like that.
The current "state of the art" in MySQL backups is generally considered to be Percona XtraBackup.
Multiple Master Slave? Have each of the 10 as Masters, and the aggregate a slave to all 10. This assumes that the data you are aggregating is different on each of the 10. If the data is the same (or similar) on all 10 and you want to interleave it as well as integrate it then this won't work.

How do you identify unused indexes in a MySQL database?

I have recently completely re-written a large project. In doing so, I have consolidated great number of random MySQL queries. I do remember that over the course of developing the previous codebase, I created indexes on a whim, and I'm sure there are a great number that aren't used anymore.
Is there a way to monitor MySQL's index usage to determine which indexes are being used, and which ones are not?
I don't think this information is available in a stock MySQL installation.
Percona makes tools and patches for MySQL to collect index usage data.
See:
User Statistics (and Index Statistics)
How expensive is USER_STATISTICS?
pt-index-usage
See also:
New INDEX_STATISTICS table in the information_schema
check-unused-keys: A tool to interact with INDEX_STATISTICS
New table_io_waits_summary_by_index_usage table in performance_schema in MySQL 5.6
You may also be interested in a Java app called MySQLIndexAnalyzer, which helps to find redundant indexes. But this tool doesn't have any idea which indexes are unused.