Im using solr 4.4 ,my config.xml file is as given below.
First time i do a full import of 40000 rows ,it is indexed.
Now in my application i add one more row the total count comes to 40001,i need
to do the full import or delta import?
I know that the delta import is applied to a row which is indexed ,
What is the approach when a new row is added to mysql ,do we need to use full
import of all 40001 data ?
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/mydb" user="uname" password="pwd" batchSize="1" />
<document name="resource">
<entity name="resource" query="SELECT * FROM resource"
deltaImportQuery="SELECT * FROM resource WHERE ref = '${dataimporter.delta.ref}'"
deltaQuery="SELECT * FROM resource WHERE last_modified > '${dataimporter.last_index_time}'" transformer="RegexTransformer">
<field column="ref" name="ref"/>
<field column="name" name="name"/>
................
............
</entity>
</document>
</dataConfig>
Here is a good article right from the documentation , that talks about a delta import via full import approach with being efficient . Have a look at it here https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport . Hope this helps :)
Related
I want to use Solr with MongoDB and MySQL together and need to combine in single core.
For example, I have a MongoDB collection which has depends on MySQL's one table,
I tried both with separate Solr core it's working fine but i want it in single core, i don't know its possible or not, if its possible then how we can use?
Updated
Here my DIHs: (Data import Handler)
- Solr with MySQL
<dataConfig>
<dataSource
name="MySQl"
type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/test"
user="root" password="root"
batchSize="-1"/>
<document>
<entity
query="select * from master_table"
name="master">
</entity>
</document>
</dataConfig>
- Solr with MongoDB
<dataConfig>
<dataSource
name="MyMongo"
type="MongoDataSource"
database="test" />
<document>
<entity
processor="MongoEntityProcessor"
query=""
collection="MarketCity"
datasource="MyMongo"
transformer="MongoMapperTransformer"
name="sample_entity">
<field column="_id" name="id" mongoField="_id" />
<field column="keyName" name="keyName" mongoField="keyName"/>
</entity>
</document>
</dataConfig>
So i want to do with the single core.
You can read the data from Mysql and MongoDB. Merge this records in single record and the index the same into solr.
To get the data from MySql, use any programming language and fetch the data.
For example you can use Java and fetch the data from mysql.
Apply the same logic to MongoDB. Get all the required records from mongoDB using Java.
Now By using the SolrJ apis create the solrDocument. Read more about the SolrDOcument and other apis here
Once your create the instance of SolrDocument then add the data that you fetched from Mysql and MongoDB into it using the below method.
addField(String name, Object value)
This will add a field to the document.
You can prepare the document something like this.
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("name", "Kevin Ross");
document.addField("price", "100.00");
solr.add(document);
solr.commit();
Get a solr instance of HttpSolrClient.
Once the SolrDocument is ready, index it to solr.
i have been trying to configure solr-DIH on server which has about 5 million documents and it is not working but it is working well on my localhost with 100000 documents.what can be the problem?
this is the log i am getting
Exception while processing: product_master document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT cs_product_id,title FROM product_master Processing Document # 1
16:10:56
SEVERE
DataImporter
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT cs_product_id,title
my data-config goes here
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://www.mysite.com/mydb" user="myusername" password="mypwd" batchSize="-1"/>
<document>
<entity name="product_master" pk="cs_product_id"
query="SELECT cs_product_id,title FROM product_master"
deltaImportQuery=" SELECT cs_product_id,title FROM product_master WHERE cs_product_id = '${dataimporter.delta.cs_product_id}'"
deltaQuery=" SELECT cs_product_id FROM product_master WHERE update_timestamp > '${dataimporter.last_index_time}'">
<field column="cs_product_id" name="cs_product_id"/>
<field column="title" name="title"/>
</entity>
</document>
</dataConfig>
There shouldn't be any difference so I suggest you to do the following:
set to DEBUG the log level of DIH components
create a simple class with a simple main method that does something trivial with that database, using that exact connection URL, exact username and password
I think with one of the two tricks above you will get the point (which, as I guess, has nothing to do with Solr)
I don't figure out how to import the correct datetime from mysql to solr via the DataImportHandler. After the import the datetime values get substracted 2 hours
mysql "created_at 2013-04-05 15:04:21" gets in solr to "created_at":"2013-04-05T13:04:21Z"
mysql ##global.time_zone, ##session.time_zone are both system and display the correct CET time.
Here my data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/test"
user="+++" password="++++/> <document>
<entity name="id"
query="SELECT table.created_at, ... from table"
<field column="created_at" name="created_at"/>
I tried to use the CONVERT_TZ command. In mysql it worked out. But with solr I have no success, the created_at value is the not indexed at all.
<entity name="id"
query="SELECT query="SELECT CONVERT_TZ(table.created_at,'+00:00','+01:00'), ... from table"
<field column="created_at" name="created_at"/>
try this : i used this in my indexing,
$query= "SELECT DATE_FORMAT(CONVERT_TZ(table.created_at,'+00:00','+01:00'),'%Y-%m-%dT%TZ'),.. FROM table ";
DateField Maual of SOLR
I have Solr 4.0 up and running and using DataImportHandler to import data from MySQL.
I have notcied that if I point DataImportHandler at MySQL 5.5 data source everyhting works as expected. However when using exactly the same Solr/DataImportHandler config and exactly the same database but running on MySQL 5.0 certain fields come back base64 encoded.
Relevant entries in data-config.xml
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
name="DB-SOURCE"
url="jdbc:mysql://dbhost/dbname"
user="user"
password="password"
/>
<document name="articles">
<entity name="article_ph" transformer="HTMLStripTransformer" dataSource="DB-SOURCE" pk="article_id"
query="SELECT 'Politics Home' AS article_site,
CONCAT('ph-article-', article_id) AS article_id,
article_title,
article_text_plain AS article_content,
article_articletype_id,
article_datetime AS article_date,
'Uncategorised' AS article_section,
'Non Member' AS article_source
FROM articles
WHERE
article_datetime!='0000-00-00 00:00:00'
AND article_datetime is NOT NULL
AND article_live=1
AND article_text_plain!=''
AND article_text_plain IS NOT NULL
AND article_title is NOT NULL
AND article_title !=''">
<field column="ARTICLE_SITE" name="article_site" />
<field column="ARTICLE_ID" name="article_id" />
<field column="ARTICLE_TITLE" name="article_title" />
<field column="ARTICLE_CONTENT" name="article_content" stripHTML="true" />
<field column="ARTICLE_DATE" name="article_date" />
<field column="ARTICLE_SECTION" name="article_section" />
<field column="ARTICLE_SOURCE" name="article_source" />
<entity name="articletype_name" dataSource="DB-SOURCE"
query="SELECT
articletype_name
FROM articletypes
WHERE articletype_id='${article_ph.article_articletype_id}'">
<field column="articletype_name" name="article_type"/>
</entity>
</entity>
When I run import pointing at MySQL 5.5 I get :
<arr name="article_id"><str>ph-article-124</str></arr>
When I run import pointing at MySQL 5.0 I get articles with base64 encode IDs :
<arr name="article_id"><str>cGgtYXJ0aWNsZS0xMjQ=</str></arr>
All other fields come back correctly.
Collation and character sets on both DBs are the same.
Any help appreciated.
Try converting it back to string
CONCAT('ph-article-', CAST(article_id AS CHAR(50))
I'm trying to update my index, but I keep on getting the error:
org.apache.solr.handler.dataimport.DataImporter
verifyWithSchema INFO: UPC is a
required field in SolrSchema . But not
found in DataConfigfound in DataConfig
I can't figure out why it's complainting, since:
the first time I ran the import, it worked fine, and the only thing I changed was add a few fields (columns) to schema.xml
the table I am querying indeed has a UPC column. Here is what my data-config.xml looks like:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/product"
user="root"
password="some_password"/>
<document>
<entity name="product"
query="select * from productdetails">
</entity>
</document>
</dataConfig>
But again, the interesting part is that the import worked a second ago, but fails on re-import. I'm hoping somebody has had this problem before. If not, maybe someone can suggest other things to check for?
The reason for this is that when DataImportHandler starts it checks its config against your loaded schema. It is not an error, merely a warning. To remove it you have to add a specific field in your import config with a name that matches your required field.
This is not the cause of your failed reimport as this is simply a warning.