I am new to solr so I might have wrongly written the dih.I have data already in my solr db and I need to extend my dih file such that it can index csv file which contain more information and csv and solr db data are mapped by common id.What I have done is -find in the code below
This is my csv file-->
Node_IP probe_name Global_ID
10.53.3.87 ILRNAPSUD03 4b44aaff1e09f3d793fe9
10.224.47.26 ILRNAPSUD03 47eebea2c2d485b59
Here is a dih snippet-->
<entity name="tmin"
processor="XPathEntityProcessor"
dataSource="FileDataSource"
stream="true"
url="${pickupdir.fileAbsolutePath}"
onError="skip"
forEach="/execution/"
transformer="script:makePair,script:makeLogPair, TemplateTransformer,
RegexTransformer, HTMLStripTransformer"
>
<field column="jobid_t" xpath="/execution/#jobId" />
<field column="destinationid_t" xpath="/execution/#destinationid" />
<field column="id" template="${tmin.destinationid_t}" />
<field column="log_param" xpath="/execution/log/#severity" />
<field column="log" xpath="/execution/log" />
<entity name="importcsv"
processor="LineEntityProcessor"
url="C:\Users\arpiagar\Desktop\IP Probe name_ILRNAPSUD01.csv"
rootEntity="false"
dataSource="FileDataSource"
header="true"
separator=","
transformer="TemplateTransformer, RegexTransformer,script:mapcsv"
>
<field column="rawLine" groupNames="Node_IP,probe_name,Global_ID"/>
<field column="id" name="Global_ID" />
<field column="probe_name" name="probe_name" />
</entity>
</entity>
I need to map id in the tmin entity with the id which we will get after indexing csv data and index probe_name and node_ip at that particular id.
Related
I'm new to SEO and having an issue with Mysql data extraction within Solr 8.8; despite that below declaration, the document is only retrieving ID instead of the whole bunch.
<document>
<entity name="foobars"
query="SELECT *, 'test' AS ENTITY FROM foobar"
deltaquery="SELECT ID FROM foobar WHERE updated >= '${dataimporter.last_index_time}'"
deltaimportquery="SELECT *, 'MAT' AS ENTITY FROM foobar WHERE ID = ${dataimporter.delta.id}">
<field column="ENTITY" name="entity" />
<field column="ID" name="id" />
<field column="FOO" name="foo" />
<field column="BAR" name="bar" />
<field column="BAZ" name="baz" />
<field column="UPDATED" name="updated" />
</entity>
</document>
This is a sample of what was imported :
{
"responseHeader":{
"status":0,
"QTime":9,
"params":{
"q":"*:*",
"_":"1623166185835"}},
"response":{"numFound":147,"start":0,"numFoundExact":true,"docs":[
{
"id":"214768.0",
"_version_":1702016739810738176},
{
"id":"296594.0",
"_version_":1702016739840098304},
...
Does anyone knows what I'm missing here? Thanks for any help.
I've two MySQL tables book and author, they have many-to-many relationship, done via book_author_mapper whose row contain columns book_id / author_id.
In Solr, I have a query to get book list, for each book I need to get an array of author_id for the book.
Currently, I am thinking about to use a multi-valued field to store book ids.
My question is:
How to define the field, and how to write the SQL in DIH, it seems need multiple SQL, right? Thx.
If I want to get not just the author_id list, but as well as author_name for each author_id, is that possible?
After viewing doc & googling, I have kind solved the problem.
Tables
book
author
book_author_map (this is the middle table for many-to-many relationship)
DIH config file
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/test?characterEncoding=utf8&zeroDateTimeBehavior=convertToNull" user="root"
password="123456" />
<document>
<entity name="book" pk="id"
query="SELECT * FROM book where status = 0 limit 200000;"
deltaImportQuery="SELECT * FROM book where status = 0 and id='${dih.delta.id}' limit 200000;"
deltaQuery="select id from book where status = 0 and CONVERT_TZ(`update_date`, ##session.time_zone, '+00:00') > '${dih.last_index_time}'"
>
<entity name="author"
query="SELECT au.cn_name as author_cn_name FROM author AS au JOIN book_author_map AS bam ON au.id = bam.author_id WHERE bam.book_id = ${book.id} limit 10;"
>
<field name="authors" column="author_cn_name" />
</entity>
</entity>
</document>
</dataConfig>
Field definition
<field name="cn_name" type="textComplex" indexed="true" stored="true" />
<field name="en_name" type="textComplex" indexed="true" stored="true" />
<field name="status" type="int" indexed="true" stored="true" />
<field name="authors" type="textComplex" indexed="true" stored="true" multiValued="true" />
TODOs
parentDeltaQuery It get pk of parent entity, but when it is called, and what is do? Is that necessary?
Does deltaQuery and parentDeltaQuery necessary in sub entity?
I did a basic solr setup, Configured dataImportHandler and create very simple data config file with two fields and indexed it. It all worked fine.. But now I am adding new fields there and doing full import after that but for some reason new fields are just not showing in search result ( using solr interface for search). I have tried restarting solr, running config-reload to no effect.
this is my data config file. Not sure what's wrong here.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/msl4" user="root" password=""/>
<document>
<entity name="hub_contents" query="select * from hub_contents" deltaQuery="select * from hub_contents where last_modified > '${dataimporter.last_index_time}'">
<field column="id_original" name="id" />
<field column="title" name="title" />
<field column="parent_id" name="parent_id" />
<field column="item_type" name="item_type" />
<field column="status" name="status" />
<field column="updated_at" name="updated_at" />
</entity>
</document>
</dataConfig>
You can add the below fields in your schema.xml
<field name="id" type="long" indexed="true" stored="true"/>
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="parent_id" type="long" indexed="true" stored="true"/>
<field name="item_type" type="text_general" indexed="true" stored="true"/>
<field name="status" type="text_general" indexed="true" stored="true" />
<field name="updated_at" type="date" indexed="true" stored="true"/>
It is left to you what type(fieldType) you want to add depending on your requirement.
indexed: true if this field should be indexed (searchable or
sortable)
stored: true if this field should be retrievable
Add the below tag:
<uniqueKey>id</uniqueKey>
This is to use to determine and enforce document uniqueness.
I have to import some mdb files to solr. some of mdb files are indexed well as document but there are others don't.
I use Solr 4.10.0 and ucanaccess ver. 2.0.9 The follwoing is a screen shot from the log:
For some missing fields values (in the screen shot case 6 fields) I have set onError="continue" in the dataimport-config:
<document>
<entity name="Book" dataSource="a" query="select bkid AS id, bkid AS BookID,bk AS BookTitle, betaka AS BookInfo, cat as cat from 0bok WHERE bkid = 29435">
<field column="id" name="id"/>
<field column="BookID" name="BookID"/>
<field column="BookTitle" name="BookTitle"/>
<field column="cat" name="cat"/>
<entity name="Category" dataSource="a" query="select name as CatName, catord as CatWeight, Lvl as CatLevel from 0cat where id = ${Book.CAT}">
<field column="CatName" name="CatName"/>
<field column="CatWeight" name="CatWeight"/>
<field column="CatLevel" name="CatLevel"/>
</entity>
<entity name="Pages" dataSource="a1" onError="continue" query="SELECT nass AS PageContent, page AS PageNum FROM book ORDER BY page">
<field column="PageContent" name="PageContent"/>
<field column="PageNum" name="PageNum"/>
<entity name="Titles" dataSource="a1" onError="continue" query="SELECT * FROM title WHERE id = ${Pages.PAGENUM} ORDER BY sub">
<field column="ID" name="TitleID"/>
<field column="TIT" name="PageTitle"/>
<field column="SUB" name="TitleWeight"/>
<field column="LVL" name="TitleLevel"/>
</entity>
</entity>
</entity>
</document>
This is a screen shot for the table regarded in the database with the 6 undefined data fields:
At the end of dataimporting for this mdb file I got the following response:
Last Update: 09:12:04 Requests: 31,952, Fetched: 78,980, Skipped: 0,
Processed: 0 Started: 18 minutes ago
Which it is shown that 0 processed!
There are other mdb files are proceed i.e 1 processed is generated in the response but I have got the folwing errors in the log:
10/7/2014 9:28:08 AM ERROR SolrWriter Exception while solr commit.
this writer hit an OutOfMemoryError; cannot commit...
and
SolrIndexWriter Error closing IndexWriter this writer hit an
OutOfMemoryError; cannot flush...
How could I solve this issue? and why Solr requests and fetched all this records and then process and index none?!
I am trying to use solr for indexing data from my data base.
After I index data, when I query *.*
I get just the id field in result. not all the fields which I had in my query.
My data-config.xml
<document name="content">
<entity name="documen" query="SELECT indexId ,brand_id, category_id, product_name from Production">
<field column="indexId" name="id" />
<field column="category_id" name="categoryid" />
<field column="brand_id" name="brandid" />
<field column="product_name" name="id" />
</entity>
</document>
My schema.xml looks like this :
<field name="id" type="int" indexed="true" stored="true" required="true"/>
<field name="categoryid" type="int" indexed="true" stored="true"/>
<field name="brandid" type="int" indexed="true" stored="true" />
<field name="productname" type="string" indexed="true" stored="true"/>
When I query using *.* I get
<doc>
<str name="id">1</str>
<long name="_version_">1426653005792411648</long></doc>
<doc>
<str name="id">2</str>
<long name="_version_">1426653005793460224</long></doc>
<doc>
I get only "id" field as result.
Actually, whatever field is in "uniquekey" tag is returned as query result