Problem:
A Table in MySQL
that has a few normal fields and
one text field that holds XML
I need to use Solr Data Import Handler to process this table into a Solr Index.
However, the XML field needs to be parsed into several other solr fields each
Question:
Is it possible to do this without having to write a custom Transformer? If yes how. Can I use XPathEntityProcessor with a my SQL DB as datasource?
If I write a custom transformer, how exactly do I configure it in dataConfig?
I am using older version of solr (1.4.1), so can I just drop a new jar with new class into my solr web-app?
The thing I am quite unsure about is how I need to configure the data-config.xml to do this. If anyone has any examples, please share! Thanks.
My suggestion is to write a program which selects the data from the database, parses the XML data field and then inserts the entire document into the SOLR index.
The solrj Java apis are really easy to use. The hardest part of this is parsing the XML, but it's a far easier challenge and easier to test.
Related
I am using MongoDB first time here. Is there any best way to transfer bulkdata from mysql into MongoDB. I try to search to in different ways but i did not find.
You would have to physically map all your mysql tables to documents in mongodb but you can use an already developed tool.
You can try: Mongify (http://mongify.com/)
It's a super simple way to transform your data from a MySql to MongoDB. It has a ton of support for changing your existing schema into a schema that would work better with MongoDB.
Mongify will read your mysql database, build a translation file for you and all you have to do is map how you want your data transformed.
It supports:
Updating internal IDs (to BSON ObjectID)
Updating referencing IDs
Typecasting values
Embedding Tables into other documents
Before filters (to change data manually before import)
and much much more...
There is also a short 5 min video on the homepage that shows you how easy it is.
Try it out and tell me what you think.
Please up vote if you find this answer helpful.
I am creating a spring-boot application which will interact with elasticsearch using spring-data. But the problem is, my data in the elasticsearch is unpredictable. That means there can be slight changes in the fields like additional fields or can be totally new field coming in JSON. Please guide me for a solution to address that. Using normal repository is seems not working because I don't have a defined JSON format. Your guide will be highly appreciated.
You need to provide a bit more data on your case.
Normally, when you use #Field annotations, or introducing/dropping a new simple or object field, this should not be a problem at all since spring-data-elasticsearch updates mapping when you save to ElasticsearchRepository. In some cases, e.g. introducing a parent-child rel, you would need to drop and recreate index but this can also be done programatically, if needed.
If you need advanced mapping that is also changing dynamically, then you need to build and execute a mapping update request from your code on save (custom repo).
I have a few XML files containing data for a research project which I need to run some statistics on. The amount of data is close to 100GB.
The structure is not so complex (could be mapped to perhaps 10 tables in a relational model), and given the nature of the problem, this data will never be updated again, I only need it available in a place where it's easy to run queries on.
I've read about XML databases, and the possibility of running XPATH-style queries on it, but I never used them and I'm not so comfortable with it. Having the data in a relational database would be my preferred choice.
So, I'm looking for a way to covert the data stored in XML into a relational database (think of a big .sql file similar to the one generated by mysqldump, but anything else would do).
The ultimate goal is to be able to run SQL queries for crunching the data.
After some research I'm almost convinced I have to write it on my own.
But I feel this is a common problem, and therefore there should be a tool which already does that.
So, do you know of any tool that would transform XML data into a relational database?
PS1:
My idea would be something like (it can work differently, but just to make sure you get my point):
Analyse the data structure (based on the XML themselves, or on a XSD)
Build the relational database (tables, keys) based on that structure
Generate SQL statements to create the database
Generate SQL statements to create fill in the data
PS2:
I've seen some posts here in SO but still I couldn't find a solution.
Microsoft's "Xml Bulk Load" tool seems to do something in that direction, but I don't have a MS SQL Server.
Databases are not the only way to search data. I can highly recommend Apache Solr
Strategies to Implement search on XML file
Keep your raw data as XML and search it using the Solr index
Importing XML files of the right format into a MySql database is easy:
https://dev.mysql.com/doc/refman/5.6/en/load-xml.html
This means, you typically have to transform your XML data into that kind of format. How you do this depends on the complexity of the transformation, what programming languages you know, and if you want to use XSLT (which is most probably a good idea).
From your former answers it seems you know Python, so http://xmlsoft.org/XSLT/python.html may be the right thing for you to start with.
Take a look at StAX instead of XSD for analyzing/extraction of data. It's stream based and can deal with huge XML files.
If you feel comfortable with Perl, I've had pretty good luck with XML::Twig module for processing really big XML files.
Basically, all you need is to setup few twig handlers and import your data into MySQL using DBI/DBD::mysql.
There is pretty good example on xmltwig.org.
If you comfortable with commercial products, you might want to have a look at Data Wizard for MySQL by the SQL Maestro Group.
This application is targeted especially at exporting and, of course, importing data from/ to MySQL databases. This also includes XML import. You can download a 30-day trial to check if this is what you are looking for.
I have to admit that I did not use the MySQL product line from them yet, but I had a good user experience with their Firebird Maestro and SQLite Maestro products.
I am aware of the batch LOAD XML technique e.g. Load XML Update Table--MySQL
Can MySql insert/replace rows directly from xml. I'd like to pass an XML string to MySQL.
Something like replace into user XML VALUES maybe even using as to map the tags to the column names??
The primary thing is that I dont want to parse the XML in my code, I'd like MySql to handle this. I dont have a file, I have the XML as a string.
I have looked and found there are some XML Functions:
12.11. XML Functions
The XML functions can do XPath, but I think this is a little fiddly as I have a 1:1 mapping from the XML to the table structure so I'd hjst like to be able to say hey MySql, insert the values in the xml string in to the table.
Is this possible?
In a nutshell, No.
What your looking for is an XML storage engine for MySQL. There has never been one created officially, and i have never seen a third party one either (but feel free to google).
If you really want to achieve this, then the closest you would get is to look for an alternative (R)DMS, but then that might not support the type of queries you wish to perform, may require a bit of a learning curve, would no doubt require you are using a server with superuser access, and potentially mean re-factoring a lot of your code.
I have an existing MySQL database, I would like to import the schema into Xcode and create a Core Data data model.
Is there a way (tool, process) to import the CREATE statements so I don't have to build the models "by hand"?
As an intermediary step I could convert to SQLite, I'm not worried about the relationships, foreign keys etc just auto-generating the Entities (Tables) and Properties (Columns).
Actually I needed the feature so badly too that I have decided to make an OSX utility to do so. BUT... then I found a utility in the Mac Appstore that (partially) solves this problem (it was free for some time, I do not know its current state). Its called JSONModeler and what it does is parsing a json tree and generates the coredata model and all derived NSManagedObject subclasses automatically. So a typical workflow would be:
Export the tables from MySQL to xml
Convert the xml to json
Feed the utility with that json and get your coredata model
Now, for a more complicated scenario (relationships etc) I guess you would have to tweak your xml so it would reflect a valid object tree. Then JSONModeler will be able to recreate that tree and export it for coredata.
The problem here is that entities are not tables and properties are not columns. Core Data is an object graph management system and not a database system. The difference is subtle but important. Core Data really doesn't have anything to do with SQL it just sometimes uses SQL as one its persistence options.
Core Data does use a proprietary sqlite schema and in principle you can duplicate that but I don't know of anyone who has succeeded in a robust manner except for very simple SQL databases. Even when they do, its a lot of work. Further, doing so is unsupported and the schema might break somewhere down the line.
The easiest and most robust solution is to write a utility app to read in the existing DB and create the object graph as it goes. You only have to run it once and you've got to create the data model anyway so it doesn't take much time.