Query mysql database while indexing Solr documents - mysql

I need to update my solr documents with detailed informations i can grab from a mysql database.
Example:
solr field "city" --> "London" (read from an xml source with post.jar tool)
on update time (requestHandler /update already configured with custom plugin to do other stuff) solr should query mysql for more information about "London" (or whatever just read)
solr updates the fields of that document with the query result
i've been trying with a JDBC plugin and with a DIH handler (which i can only use calling /dataimport/ full-import... and i can't in my specific case) and so far no success :(
Any of you had the same problem? How did you solve it? Thanks!
edit: i forgot, for the dih configuration i tried following this guide http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/

Please do include the full output of /dataimport/full-import when you access it in your browser. Solr error messages can get cryptic.
Have you considered uploading documents by XML? http://wiki.apache.org/solr/UpdateXmlMessages . Its more powerful, allowing you to use your own logic when uploading documents.
Read each row from SQL and compose an XML document (string) with each document under tags.
Post the entire XML string to /update . Dont forget to set the MIMEtype header as text/xml . And make sure to set your Servler container's (Tomcat, Jetty) upload limit on POSTs (Tomcat has 2mb limit, if I recall right)
dont forget the commit and optimize commands

Related

How to upload XML to MySQL in react with axios and nodejs

I am trying to upload a XML file to MySQl server.
I have a React web app, I am using Axios and NodeJS.
I was using the follwing statement to import the xml file to the product table directly from the workbench
LOAD XML INFILE "C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/products.xml" INTO TABLE product ROWS IDENTIFIED BY <Product>;
It worked fine.
Now I want to have a button that will upload anew xml file and replace the existing data in the table.
What I tried so far is using the HTML input file element, grabing the file from the event.target.files[0] and sending the file object to the server with a POST request.
I am not realy sure how to go from here I cant find a statement that can import the data out of the file object and imoprt it into the sql table.
any ideas? what is the best way to go about it?
I figured out my problem, my site was deployed to Heroku.
Apparently the clearDB, Heroku's add-on sql database, does not allow the use of LOAD XML INFILE / LOAD DATA INFILE as said here - https://getsatisfaction.com/cleardb/topics/load-data-local-infile.
What I ended up doing was converting the xml file to JS object.
That solution presented a new problem, my xml file was around 3MB summed up to over 12000 rows to insert to the database.
MySQL does not allow inserting more then a 1000 rows in a single query.
I had to split the object to several chuncks and loop through them uploading each one by it self.
This process takes some time to execute and I am sure there are better ways of doing it.
If anyone can shed some light on how best to go about it or provide an alternative I would apprreciate it.

How do I use Pentaho spoon to push data from MySQL database to facebook webpage

1) I have already made transformation mapping for getting data from specific MySQL (Table Input) and convert it as Text File output.
2) Also I have created a facebook developer account page and trying to figure out how the Facebook API works to push data from MYsql to facebook.
3) would appreciate if transformation mapping can be provided. Also I would not like to use XML, instead I would like to use JSON.
Already the msql table is converted to csv file, but I am not sure how to post the csv file to facebook or is there a way to connect mysql table to facebook directly. Please share your ideas or transformation mapping. Thanks
I would Assuemm you are familiar with Facebook Development API to do all actions like post,get and so on.
You have a step called "REST CLIENT STEP" in Pentaho.
you will have an API url to post the data that you want from mySQL. There several methods GET PUT POST DELETE
Also set the Application Format to Json (XML,JSON etc).
I used to read data from FB using REST Client by using GET Method. Work around.

Indexing flat XML files in elasticsearch

I'm working on a specific project where external data provided by external providers is to be indexed on our ElasticSearch Engine.
The data is provided as XML flat files.
The idea here is to script something out that reads each file, parse it and launch as many HTTP POST as needed for each one of them.
Is there a simpler way to do this? something like uploading the XML file that gets indexed automatically without any script?
You can use logstash with an xml filter to do this. Takes a bit of work to get setup the first time, but it's the most straightforward way to do it.

How do I index HTML files into Apache SOLR?

By default SOLR accepts XML files, I want to perform search on millions of crawled URLS (html).
Usually, the first step I would recommend rolling your own application using SolrJ or similar to handle the indexing, and not do it directly with the DataImportHandler.
Just write your application and have that output the contents of those web pages as a field in a SolrInputDocument. I recommend stripping the HTML in that application, because it gives you greater control. Besides, you probably want to get at some of the data inside that pag, such as <title>, and index it to a different field. An alternative is to use HTMLStripTransformer on one of your fields to make sure it strips HTML out of anything that you send to that field.
How are you crawling all this data? If you're using something like Apache Nutch it should already take care of most of this for you, allowing you to just plug in the connection details of your Solr server.
Solr CEL can accept HTML and indexes them for full-text search: http://wiki.apache.org/solr/ExtractingRequestHandler
curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=#tutorial.html"
You can index downloaded html file with solr very well.
This is the fastest way that I did my indexing:
curl http://localhost:8080/solr/update/extract?stream.file=/home/index.html&literal.id=www.google.com
Here stream.file is the local path of your html file and literal.id is url from index.html.

Importing a XML file to MySQL via phpMyAdmin

I'm trying to import a XML file via phpMyAdmin and map each of the child elements to their corresponding fields within a MySQL table. XML sample:
<event>
<date>1992</date>
<title>Event Title</title>
<description>Event description goes here.</description>
</event>
I have MySQL fields within the table with names identical to the child elements listed above, however, when I import my XML file, I get a message that says "0 queries executed," and of course nothing gets imported.
I tried looking this up via the phpMyAdmin documentation, but I couldn't find anything but a modest description of XML as an import method. How is this supposed to be done?
phpMyAdmin doesn't provide a feature to map XML-elements to corresponding table fields. Sorry.
You need to write a program to do this trick. Or enhance phpMyAdmin.
Sorry but, did you try to do a ISO XML? Why dont you try with the head, and all the things that come with the standard XML? If you can't, try CSV. But the way that phpmyadmin parses the files are very specific