error importing MYSQL table rows into solr using DataImportHandler - mysql

I want to index MySQL table with Solr4.0 row by row . I have installed the necessary java my database is called 'twitter_db' and the table i want ot index called "tweets"
and i login using user : root and no password
the schema is so i added it in the :
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:mysql://localhost/twitter_db" user="root" password="" />
<document name="tweet">
<entity name="tweet" query="select * from tweets">
<field column="tweet_id" name="tweet_id" />
<field column="text" name="text" />
<field column="user" name="user" />
<field column="tweet_time" name="tweet_time" />
<field column="topic_kw" name="topic_kw" />
<field column="timestamp" name="timestamp" />
</entity>
</document>
</dataConfig>
and the solrconfig changes is:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
and when i hit [root]:8983/solr/db/dataimport?command=full-import
for full import
it fails : error message in the GUI is :
Indexing failed. Rolled back all changes.
and the the part of the error message in the log is :
SEVERE: Exception while processing: tweet document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from tweets Processing Document # 1
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:252)
at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
i wonder if it's a wrong data in selecting the database of the table
i tried a similar Question it's a similar problem but i didn't find the answer there

Check driver="org.hsqldb.jdbcDriver" it is hsqldb rather it should be pointing to MySQL driver class. Try updating the driver class to appropriate driver for MySQL & you can run it in debug mode.

Related

How can we use solr with both MongoDB and MySQL?

I want to use Solr with MongoDB and MySQL together and need to combine in single core.
For example, I have a MongoDB collection which has depends on MySQL's one table,
I tried both with separate Solr core it's working fine but i want it in single core, i don't know its possible or not, if its possible then how we can use?
Updated
Here my DIHs: (Data import Handler)
- Solr with MySQL
<dataConfig>
<dataSource
name="MySQl"
type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/test"
user="root" password="root"
batchSize="-1"/>
<document>
<entity
query="select * from master_table"
name="master">
</entity>
</document>
</dataConfig>
- Solr with MongoDB
<dataConfig>
<dataSource
name="MyMongo"
type="MongoDataSource"
database="test" />
<document>
<entity
processor="MongoEntityProcessor"
query=""
collection="MarketCity"
datasource="MyMongo"
transformer="MongoMapperTransformer"
name="sample_entity">
<field column="_id" name="id" mongoField="_id" />
<field column="keyName" name="keyName" mongoField="keyName"/>
</entity>
</document>
</dataConfig>
So i want to do with the single core.
You can read the data from Mysql and MongoDB. Merge this records in single record and the index the same into solr.
To get the data from MySql, use any programming language and fetch the data.
For example you can use Java and fetch the data from mysql.
Apply the same logic to MongoDB. Get all the required records from mongoDB using Java.
Now By using the SolrJ apis create the solrDocument. Read more about the SolrDOcument and other apis here
Once your create the instance of SolrDocument then add the data that you fetched from Mysql and MongoDB into it using the below method.
addField(String name, Object value)
This will add a field to the document.
You can prepare the document something like this.
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("name", "Kevin Ross");
document.addField("price", "100.00");
solr.add(document);
solr.commit();
Get a solr instance of HttpSolrClient.
Once the SolrDocument is ready, index it to solr.

solr data import handler working on localhost and not on server

i have been trying to configure solr-DIH on server which has about 5 million documents and it is not working but it is working well on my localhost with 100000 documents.what can be the problem?
this is the log i am getting
Exception while processing: product_master document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT cs_product_id,title FROM product_master Processing Document # 1
16:10:56
SEVERE
DataImporter
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT cs_product_id,​title
my data-config goes here
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://www.mysite.com/mydb" user="myusername" password="mypwd" batchSize="-1"/>
<document>
<entity name="product_master" pk="cs_product_id"
query="SELECT cs_product_id,title FROM product_master"
deltaImportQuery=" SELECT cs_product_id,title FROM product_master WHERE cs_product_id = '${dataimporter.delta.cs_product_id}'"
deltaQuery=" SELECT cs_product_id FROM product_master WHERE update_timestamp > '${dataimporter.last_index_time}'">
<field column="cs_product_id" name="cs_product_id"/>
<field column="title" name="title"/>
</entity>
</document>
</dataConfig>
There shouldn't be any difference so I suggest you to do the following:
set to DEBUG the log level of DIH components
create a simple class with a simple main method that does something trivial with that database, using that exact connection URL, exact username and password
I think with one of the two tricks above you will get the point (which, as I guess, has nothing to do with Solr)

How to properly set up DataImportHandler for MySQL database with large number or records?

I have set up Solr's data import handler as instructed in manual. Solr reads the records from a MySQL database. The database has large number of records (expected is milliards/billions).
I have read that batch size does not work for MySQL because the JDBC driver does not support it. I have tried setting it up to -1. In this case, Solr performs one select and gets all records from the DB and indexes them.
Now, I have problem, since a timeout occurred while indexing and caused it to stop. I see that Solr hasn't written any id value in the properties file after the exception occurred. I am not sure how to proceed with indexing the rest of the records.
Can anyone suggest to me how to set up Solr with MySQL for a proper data import?
Below is data config I am currently using.
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-2" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/myowndb" batchSize="-1" />
<document name="statuses">
<entity name="status" query="select s.*, ti.id2, ti.value2 from tblTable1 s inner join tblTable2 ti on s.table2Id = ti.id;">
<field column="id" name="id" />
<field column="statusID" name="statusId" />
<field column="type" name="type" />
<field column="date" name="date" />
<field name="id2" column="id2" />
<field name="value2" column="value2" />
</entity>
</document>
</dataConfig>
EDIT:
Based on my tests today, it looks like batchSize is working. If batchSize is set to -1, it will make single request to MySQL retrieving all rows at once. If set to some value greater than 0, it will put every record in memory before processing.
New question is next: how to set up data import handler so it can index in batches? Not only to perform batch select from database, but to index collected set before collecting next one.
EDIT: Specified question
New question that came up from reading is next: is it possible to mark row in database as processed? There are only two events available in DIH, onImportStart and onImportEnd.
Current flow in ideas lead me to implement EntityProcessor. If it would be possible to know when some row is indexed, it would also be easy to mark isIndexed flag in database for indexed row. This is in case I implement custom EntityProcessor.

HTTP ERROR 500 prbm accessing /solr/dataimport unable to configuring solr

Thanks, I edited my data-config.xml file
It's like this
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/lol"
user="root"
password="n"/>
<document name="content">
<entity name="id">
query="SELECT id from foo"
</entity>
</document>
</dataConfig>
When I run
http://localhost:8983/solr/dataimport?command=full-import
On the browser I get this
<response><lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst><lst name="initArgs"><lst name="defaults"><str name="config">data-config.xml</str></lst></lst><str name="status">idle</str><str name="importResponse"/><lst name="statusMessages"><str name="Time Elapsed">0:0:6.299</str><str name="Total Requests made to DataSource">1</str><str name="Total Rows Fetched">0</str><str name="Total Documents Processed">0</str><str name="Total Documents Skipped">0</str><str name="Full Dump Started">2013-06-28 11:17:34</str><str name="">Indexing failed. Rolled back all changes.</str><str name="Rolledback">2013-06-28 11:17:34</str></lst><str name="WARNING">This response format is experimental. It is likely to change in the future.</str></response>
I believe that configuration should look more like (note, query should be an attribute of the entity element):
<document name="content">
<entity name="id" query="SELECT id from foo">
<!--I assume you have a field in Solr and a column in MySQL, both of which are named "id"-->
</entity>
</document>

Solr DataImportHandler MySQL 5.0 SELECT CONCAT query results are base64 encoded

I have Solr 4.0 up and running and using DataImportHandler to import data from MySQL.
I have notcied that if I point DataImportHandler at MySQL 5.5 data source everyhting works as expected. However when using exactly the same Solr/DataImportHandler config and exactly the same database but running on MySQL 5.0 certain fields come back base64 encoded.
Relevant entries in data-config.xml
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
name="DB-SOURCE"
url="jdbc:mysql://dbhost/dbname"
user="user"
password="password"
/>
<document name="articles">
<entity name="article_ph" transformer="HTMLStripTransformer" dataSource="DB-SOURCE" pk="article_id"
query="SELECT 'Politics Home' AS article_site,
CONCAT('ph-article-', article_id) AS article_id,
article_title,
article_text_plain AS article_content,
article_articletype_id,
article_datetime AS article_date,
'Uncategorised' AS article_section,
'Non Member' AS article_source
FROM articles
WHERE
article_datetime!='0000-00-00 00:00:00'
AND article_datetime is NOT NULL
AND article_live=1
AND article_text_plain!=''
AND article_text_plain IS NOT NULL
AND article_title is NOT NULL
AND article_title !=''">
<field column="ARTICLE_SITE" name="article_site" />
<field column="ARTICLE_ID" name="article_id" />
<field column="ARTICLE_TITLE" name="article_title" />
<field column="ARTICLE_CONTENT" name="article_content" stripHTML="true" />
<field column="ARTICLE_DATE" name="article_date" />
<field column="ARTICLE_SECTION" name="article_section" />
<field column="ARTICLE_SOURCE" name="article_source" />
<entity name="articletype_name" dataSource="DB-SOURCE"
query="SELECT
articletype_name
FROM articletypes
WHERE articletype_id='${article_ph.article_articletype_id}'">
<field column="articletype_name" name="article_type"/>
</entity>
</entity>
When I run import pointing at MySQL 5.5 I get :
<arr name="article_id"><str>ph-article-124</str></arr>
When I run import pointing at MySQL 5.0 I get articles with base64 encode IDs :
<arr name="article_id"><str>cGgtYXJ0aWNsZS0xMjQ=</str></arr>
All other fields come back correctly.
Collation and character sets on both DBs are the same.
Any help appreciated.
Try converting it back to string
CONCAT('ph-article-', CAST(article_id AS CHAR(50))