Can we use JCR API over MySQL? - mysql

Apache Jackrabbit (or the JCR API) helps you separate the data store from the data management system. This would mean that every data store provider would have to implement the JCR API for his own data store. The question is JCR implemented for MySQL? Can we use the JCR API over MySQL? I want to truly abstract out where i store my content, so that tomorrow if i don't want to use a relational DB i can swap it out with the file system with ease.

You could try ModeShape, which is a JCR implementation that can store its data in a variety of systems, including MySQL (or almost any other relational database), data grids (such as Infinispan), file systems, version control systems (e.g., SVN), etc. You can even create a single JCR repository backed by multiple federated systems. ModeShape does this through an extensible connector library (that is much, much simpler than implementing the full JCR API), so you can use the JCR API to get at your data in other systems, too.

Apache Jackrabbit can be configured to use MySQL for storage, the discussion at http://markmail.org/message/fbkw5vey2mme4uxe is a good starting point.

"ModeShape isn't your father's JCR" covers all this in more detail, as does the Reference Guide on the project site.

So is it correct to say that ModeShape and Teiid are kind of the same thing, apart from the fact that one gives you a relational view and the other a hierarchial (or tree) based view of the various data sources?

Related

How to store JSON in DB without schema

I have a requirement to design an app to store JSON via REST API. I don't want to put limitation on JSON size(number of keys,etc). I see that MySQL supports to store JSON, but we have to create table/schema and then store the records
Is there any way to store JSON in any type of DB and have to query data with keys
EDIT: I don't want use any in-memory DB like Redis
Use ElasticSearch. In addition to schema less json, it support fast search.
The tagline of ElasticSearch is "You know, for search".
It is built on top of text indexing library called "Apache Lucene".
The advantage of using ElasticSearch are:
Scalable to petabytes of data clusters.
Fully open source. No cost to pay.
Enterprise support available for platinum license.
Comes with additional benefits such as analytics using Kibana.
I believe NoSQL is best solution. i.e MongoDB. I have tested MongoDB, looks good and has python module to interact easily. For quick overview on pros https://www.studytonight.com/mongodb/advantages-of-mongodb
I've had great results with Elasticsearch, so I second this approach as well. One question to ask yourself is how you plan to access the JSON data once it is in a repository like Elasticsearch. Will you simply store the JSON doc or will you attempt to flatten out the properties so that they can be individually aggregated? But yes, it is indeed fully scalable by increasing your compute capacity via instance size, expanding your disk space or by implementing index sharding if you have billions of records in a single index.

Which best fits my needs: MongoDB, CouchDB, or MySQL. Criteria defined in question

Our website needs a content management type system. For example, admins want to create promotion pages on the fly. They'll supply some text and images for the page and the url that the page needs to be on. We need a data store for this. The criteria for the data store are simple and defined below. I am not familiar with CouchDB or MongoDB, but think that they may be a better fit for this than MySQL, but am looking for someone with more knowledge of MongoDB and CouchDB to chime in.
On a scale of 1 to 10 how would you rate MongoDB, CouchDB, and MySQL for the following:
Java client
Track web clicks
CMS like system
Store uploaded files
Easy to setup failover
Support
Documentation
Which would you choose under these circumstances?
Each one is suitable for different usecases. But in low traffic sites mysql/postgresql is better.
Java client: all of them have clients
Track web clicks : mongo and cassandra is more suitable for this high write situation
Store uploaded files : mongo with gridfs is suitable. cassandra can store up to 2gb by each column splitted into 1 mb. mysql is not suitable. storing only file location and store the file in the filesystem is preffered for cassandra and mysql.
Easy to setup failover : cassandra is the best, mongo second
Support : all have good support, mysql has the largest community, mongo is second
Documentation : 1st mysql, 2nd mongo
I prefer MongoDB for analytics (web clicks, counters, logs) (you need a 64 bit system) and mysql or postgresql for main data. on the companies using mongo page in the mongo website, you can see most of them are using mongo for analytics. mongo can be suitable for main data after version 1.8. the problem with cassandra is it's poor querying capabilities (not suitable for a cms). and the problem with mysql is not as easy scalable & HA as cassandra & mongo and also mysql is slower especially on writes. I don't recommend couchdb, it's the slowest one.
my best
Serdar Irmak
Here are some quick answers based on my experience with Mongo.
Java client
Not sure, but it does exist and it is well supported. Lots of docs, even several POJO wrappers to make it easy.
Track web clicks
8 or 9. It's really easy to do both inserts and updates thanks to "fire and forget". MongoDB has built-in tools to map-reduce the data and easy tools to export the data to SQL for analysis (if Mongo isn't good enough).
CMS like system
8 or 9. It's easy to store the whole web page content. It's really easy to "hook on" extra columns. This is really Mongo's "bread and butter".
Store uploaded files
There's a learning curve here, but Mongo has a GridFS system designed specifically for both saving and serving binary data.
Easy to set up failover
Start your primary server: ./mongo --bindip 1.2.3.4 --dbpath /my/data/files --master
Start your slave: ./mongo --bindip 1.2.3.5 --dbpath /my/data/files --slave --source 1.2.3.4
Support
10gen has a mailing list: http://groups.google.com/group/mongodb-user. They also have paid support.
Their response time generally ranks somewhere between excellent and awesome.
Documentation
Average. It's all there, but it is still a little dis-organized. Chock it up to a lot of new development in the last.
My take on CouchDB:
Java Client: Is great, use ektorp which is pretty easy and complete object mapping. Anyway all the API is just Json over HTTP so it is all easy.
Track web clicks: Maybe redis is a better tool for this. CouchDB is not the better option here.
CMS like system: It is great as you can easly combine templates, dynamic forms, data and etc and collate them using views.
Store uploaded files: Any document in couchdb can have arbitary attachments so it's a natural fit.
Easy to setup failover: Master/master replication make sure you are always read to go, database never gets corrupts so in case of failure it's only a matter of start couch again and it will take over where it stop (minimal downtime) and replication will catch the changes.
Support: Have a mailing list and paid support.
Documentation: use the open book http://guide.couchdb.org and wiki.
I think there are plenty of other posts related to this topic. However, I'll chime in since I've moved off mysql and onto mongodb. It's fast, very fast but that doesn't mean it's perfect. My advice, use what you're comfortable with. If it takes you longer to refactor code in order to make it fit with mongo or couch, then stick to mysql if that's what you're familiar with. If this is something you want to pick up as a skillset then by all means learn mongodb or couchdb.
For me, I went with mongodb for couple of reasons, file storage via gridfs and geolocation. Yea I could've used mysql but I wanted to see what all the fuss was about. I must say, I'm impress and I still have ways to go before I can say I'm comfortable with mongo.
With what you've listed, I can tell you that mongo will fit most of your needs.
I don't see anything here like "must handle millions of req/s" that would indicate rolling your own would be better than using something off the shelf like Drupal.

how to store spatial files in MySQL

what is a better way to store spatial data in MySQL (say tracks)
internally or as references to the external flat files?
MySQL has a spatial extensions to store geographic objects (objects with a geometric attributes). More detail available there.
I would reccomend against mysql if you want to store it as explicitly spatial information. Instead I would reccomend Postgresql/PostGIS if you want to stay with Open Source DB. MySQL barely implements any of their spatial functionality. If you read the doc closely most spatial functions are yet to be implemented.
If you are don't care about explicitly spatial information then go ahead and store it directly in the DB.
If you give some more background on what you want to do we might be able to help more
The "better way" to store data depends on several factors which you, yourself need to consider:
Are the files rediculously large? +50MB? MySql can time out on long transactions.
Are you working on a closed network environment where the file system is secure and controlled?
Do you plan only to serve the raw files? There's no point in processing them into MySql format only to re-process them on the way out.
Is it expected that 'non technical' people are going to want to access this data? 'non technical' people generally don't like obfuscated information.
Do you have the capability in your applciation (if you have an applicaiton) to read the spatial data in the format that MySql stores it in? There's no point in processing and storing a .gpx or .shp file into MySql format if you can't read it back from there.
Do you have a system / service that will control the addition / removal / modification of the file structure and corresponding database records? Keeping a database and file system in sync is not an easy task. Especially when you consider the involvement of 'non technical' people.

AOP and user-specific data storage

I have been using AOP for "classic" things like logging and security for a while and am starting to take it further.
One problem I come across frequently with desktop applications is the need to store user-specific data locally. To that end, I have built a component that works well for me that stores data as XML in an application-specific subfolder of the LocalApplicationData folder (on Windows, but the concept applies to any OS).
Each application needs to store it's own data, but I have also built a code library where several components also need to store data.
One approach I could take is to tightly couple each of my components that need the local storage service to my implementation of local storage. However, a change to the interface of that local storage engine would be expensive.
Is this problem domain well-suited to AOP? Are there better approaches? Are there pitfalls that I'm not seeing?
I really don't see the cross cutting concern which would make this a problem for AOP to solve.
Simply define a little API for storing the information. Make your library implement that interface. Put everything together with Spring, any DI Tool or manual glue code and you are done.

Rails and CouchDB - Architectural Concerns

I am working on a project that is going to use CouchDB for flexible storage of documents. The requirements of my system are a neat match for CouchDB for storage.
BUT
My question really boils down to this:
Should I keeop using ActiveRecord and MySQL as well ... there are a raft of handy Plugins that are all readily available for use with ActiveRecord (such as authentication and access control). Just wondering if the advantages of leveraging existing plugins is worth the extra management overhead and possible integration issues (working across disparate datastores).
It is not uncommon to have to deal with several persistent stores in a single application. A very common approach is to use a relational database that stores paths pointing to files that are stored in a file system.
So you might think as CouchDB as a special "file system" for a special part of your data model.
Also, in larger applications, multiple stores and complex physical architectures are quite common, so don't be shy of using more than one persistent store for your models.
You can use both; Some models can still be ActiveRecord, and others can be CouchDB.