I am trying to configure solr to index an oracle database. I have enable leading wildcard searches and various other filters as such in my schema.xml
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have indexed all searchable column in the database and use the same names as column names for solr fields and hence my select query in the db-data-config.xml is simply a select * from view ;
We are receiving different results from solr and database queries
for example an sql query as below returns 1307 records
select count(*) from VIEW
where (COLUMN1 = 'Value0' and COLUMN2='Value1')
OR COLUMN3 in ('Value2','Value3','Value4')
AND COLUMN4='Value5'
and a solr query as below returns only 7
(COLUMN1:(Value0) AND COLUMN2:(Value1))
OR (COLUMN3:(Value2 OR Value3 OR Value4))
AND (COLUMN4:(Value5))
We are operating on only 1 view which has all the relevant data. The solr configurations index to all columns in the view and I would have thought that the results would be similar. Can someone kindly explain the discrepancy ?
EDIT
Adding values as asked : solr query
COUNTRY_NAME:(United States) AND ELEMENT_CODE:(CT)) OR (UNIT_STATUS:(Operating OR Order OR Shipped)) AND (FRAME_GR_DISPLAY:(06A))
database query
select * from VIEW where
(COUNTRY_NAME = 'United States' AND ELEMENT_CODE='CT')
OR UNIT_STATUS in ('Operating','Order','Shipped')
AND FRAME_GR_DISPLAY='06A'
Generally speaking if you want verbatim matches like SQL does, you should not use an analyzed field type. For example, in the default schema the string field type (class solr.StrField) is unanalyzed.
If you get lost in analysis filters and tokenizers and don't quite understand what's going on, try using debugQuery=on in your Solr querystring or use the Solr analysis tool.
Related
I am trying to migrate quote details using the Kingswaysoft SSIS CRM migration package from CRM2016 on premise to D365.
To read the data I am using Fetch XML, on the quotedetails entity and want to link to the quote entity to retreive quotenumber and revisionnumber.
My current query looks like the following -
<fetch mapping='logical'>
<entity name="quotedetail">
<attribute name="baseamount" />
<link-entity name="quote" from="quoteid" to="quoteid" link-type="inner" alias="quote">
<attribute name="quotenumber" />
<attribute name="revisionnumber" />
<filter>
<condition attribute='revisionnumber' operator='eq' value='1'/>
</filter>
</link-entity>
</entity>
</fetch>
Using the data viewer shows that all values in the linked entity are returned as null, and therefore I cannot use these as a lookup for the quoteid.
Data Viewer Results Image
Using this FetchXML statement in XRMtoolbox however does show the fields correctly.
According to KingswaySoft official "Known Limitation", this is a known issue with a workaround:
In the CDS/CRM Source Component, with the FetchXML option, if you are using linked entities in your FetchXML queries, the CRM component returns NULL values for the fields from linked entities. To avoid this, provide an alias for each linked entity.
I.e. you should provide an alias in the FetchXML to avoid this issue. I got it working in my case by providing the alias on the attribute, meaning you would have to do the following:
<fetch mapping='logical'>
<entity name="quotedetail">
<attribute name="baseamount" />
<link-entity name="quote" from="quoteid" to="quoteid" link-type="inner">
<attribute name="quotenumber" alias="quote_quotenumber"/>
<attribute name="revisionnumber" alias="quote_revisionnumber" />
<filter>
<condition attribute='revisionnumber' operator='eq' value='1'/>
</filter>
</link-entity>
</entity>
</fetch>
Reference: https://www.kingswaysoft.com/products/ssis-integration-toolkit-for-microsoft-dynamics-365/help-manual/crm/Known-Limitations
I want to use Solr with MongoDB and MySQL together and need to combine in single core.
For example, I have a MongoDB collection which has depends on MySQL's one table,
I tried both with separate Solr core it's working fine but i want it in single core, i don't know its possible or not, if its possible then how we can use?
Updated
Here my DIHs: (Data import Handler)
- Solr with MySQL
<dataConfig>
<dataSource
name="MySQl"
type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/test"
user="root" password="root"
batchSize="-1"/>
<document>
<entity
query="select * from master_table"
name="master">
</entity>
</document>
</dataConfig>
- Solr with MongoDB
<dataConfig>
<dataSource
name="MyMongo"
type="MongoDataSource"
database="test" />
<document>
<entity
processor="MongoEntityProcessor"
query=""
collection="MarketCity"
datasource="MyMongo"
transformer="MongoMapperTransformer"
name="sample_entity">
<field column="_id" name="id" mongoField="_id" />
<field column="keyName" name="keyName" mongoField="keyName"/>
</entity>
</document>
</dataConfig>
So i want to do with the single core.
You can read the data from Mysql and MongoDB. Merge this records in single record and the index the same into solr.
To get the data from MySql, use any programming language and fetch the data.
For example you can use Java and fetch the data from mysql.
Apply the same logic to MongoDB. Get all the required records from mongoDB using Java.
Now By using the SolrJ apis create the solrDocument. Read more about the SolrDOcument and other apis here
Once your create the instance of SolrDocument then add the data that you fetched from Mysql and MongoDB into it using the below method.
addField(String name, Object value)
This will add a field to the document.
You can prepare the document something like this.
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("name", "Kevin Ross");
document.addField("price", "100.00");
solr.add(document);
solr.commit();
Get a solr instance of HttpSolrClient.
Once the SolrDocument is ready, index it to solr.
This is the current query I am using for my Apache Solr full import.
<document>
<entity name="id" query="select f_name as 'id', f_name, l_name, AES_DECRYPT(l_name,'key')as decp_col from table_name;"/>
</document>
This is the format I get on Apache Solr.
l_name":["4\u0007{¾Nen•øÕ+·$Õ\u0002"],
"f_name":["Jayde"],
"id":"Jayde",
"_version_":1650635178055303168},
I want it to be imported in a way that I get a decp_col with the decrypted values.
I have tried multiple queries but am unable to generate anything on Sokr side.
I have set up Solr's data import handler as instructed in manual. Solr reads the records from a MySQL database. The database has large number of records (expected is milliards/billions).
I have read that batch size does not work for MySQL because the JDBC driver does not support it. I have tried setting it up to -1. In this case, Solr performs one select and gets all records from the DB and indexes them.
Now, I have problem, since a timeout occurred while indexing and caused it to stop. I see that Solr hasn't written any id value in the properties file after the exception occurred. I am not sure how to proceed with indexing the rest of the records.
Can anyone suggest to me how to set up Solr with MySQL for a proper data import?
Below is data config I am currently using.
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-2" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/myowndb" batchSize="-1" />
<document name="statuses">
<entity name="status" query="select s.*, ti.id2, ti.value2 from tblTable1 s inner join tblTable2 ti on s.table2Id = ti.id;">
<field column="id" name="id" />
<field column="statusID" name="statusId" />
<field column="type" name="type" />
<field column="date" name="date" />
<field name="id2" column="id2" />
<field name="value2" column="value2" />
</entity>
</document>
</dataConfig>
EDIT:
Based on my tests today, it looks like batchSize is working. If batchSize is set to -1, it will make single request to MySQL retrieving all rows at once. If set to some value greater than 0, it will put every record in memory before processing.
New question is next: how to set up data import handler so it can index in batches? Not only to perform batch select from database, but to index collected set before collecting next one.
EDIT: Specified question
New question that came up from reading is next: is it possible to mark row in database as processed? There are only two events available in DIH, onImportStart and onImportEnd.
Current flow in ideas lead me to implement EntityProcessor. If it would be possible to know when some row is indexed, it would also be easy to mark isIndexed flag in database for indexed row. This is in case I implement custom EntityProcessor.
I want to index MySQL table with Solr4.0 row by row . I have installed the necessary java my database is called 'twitter_db' and the table i want ot index called "tweets"
and i login using user : root and no password
the schema is so i added it in the :
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:mysql://localhost/twitter_db" user="root" password="" />
<document name="tweet">
<entity name="tweet" query="select * from tweets">
<field column="tweet_id" name="tweet_id" />
<field column="text" name="text" />
<field column="user" name="user" />
<field column="tweet_time" name="tweet_time" />
<field column="topic_kw" name="topic_kw" />
<field column="timestamp" name="timestamp" />
</entity>
</document>
</dataConfig>
and the solrconfig changes is:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
and when i hit [root]:8983/solr/db/dataimport?command=full-import
for full import
it fails : error message in the GUI is :
Indexing failed. Rolled back all changes.
and the the part of the error message in the log is :
SEVERE: Exception while processing: tweet document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from tweets Processing Document # 1
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:252)
at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
i wonder if it's a wrong data in selecting the database of the table
i tried a similar Question it's a similar problem but i didn't find the answer there
Check driver="org.hsqldb.jdbcDriver" it is hsqldb rather it should be pointing to MySQL driver class. Try updating the driver class to appropriate driver for MySQL & you can run it in debug mode.