How to index csv files using dih in solr

How to index csv files using dih in solr - csv

I am new to solr so I might have wrongly written the dih.I have data already in my solr db and I need to extend my dih file such that it can index csv file which contain more information and csv and solr db data are mapped by common id.What I have done is -find in the code below
This is my csv file-->
Node_IP probe_name Global_ID
10.53.3.87 ILRNAPSUD03 4b44aaff1e09f3d793fe9
10.224.47.26 ILRNAPSUD03 47eebea2c2d485b59
Here is a dih snippet-->
<entity name="tmin"
processor="XPathEntityProcessor"
dataSource="FileDataSource"
stream="true"
url="${pickupdir.fileAbsolutePath}"
onError="skip"
forEach="/execution/"
transformer="script:makePair,script:makeLogPair, TemplateTransformer,
RegexTransformer, HTMLStripTransformer"
>
<field column="jobid_t" xpath="/execution/#jobId" />
<field column="destinationid_t" xpath="/execution/#destinationid" />
<field column="id" template="${tmin.destinationid_t}" />
<field column="log_param" xpath="/execution/log/#severity" />
<field column="log" xpath="/execution/log" />
<entity name="importcsv"
processor="LineEntityProcessor"
url="C:\Users\arpiagar\Desktop\IP Probe name_ILRNAPSUD01.csv"
rootEntity="false"
dataSource="FileDataSource"
header="true"
separator=","
transformer="TemplateTransformer, RegexTransformer,script:mapcsv"
>
<field column="rawLine" groupNames="Node_IP,probe_name,Global_ID"/>
<field column="id" name="Global_ID" />
<field column="probe_name" name="probe_name" />
</entity>
</entity>
I need to map id in the tmin entity with the id which we will get after indexing csv data and index probe_name and node_ip at that particular id.

Related

Solr-MySQL DataImportHandler only retrieves ID instead of SELECT * query

I'm new to SEO and having an issue with Mysql data extraction within Solr 8.8; despite that below declaration, the document is only retrieving ID instead of the whole bunch.
<document>
<entity name="foobars"
query="SELECT *, 'test' AS ENTITY FROM foobar"
deltaquery="SELECT ID FROM foobar WHERE updated >= '${dataimporter.last_index_time}'"
deltaimportquery="SELECT *, 'MAT' AS ENTITY FROM foobar WHERE ID = ${dataimporter.delta.id}">
<field column="ENTITY" name="entity" />
<field column="ID" name="id" />
<field column="FOO" name="foo" />
<field column="BAR" name="bar" />
<field column="BAZ" name="baz" />
<field column="UPDATED" name="updated" />
</entity>
</document>
This is a sample of what was imported :
{
"responseHeader":{
"status":0,
"QTime":9,
"params":{
"q":"*:*",
"_":"1623166185835"}},
"response":{"numFound":147,"start":0,"numFoundExact":true,"docs":[
{
"id":"214768.0",
"_version_":1702016739810738176},
{
"id":"296594.0",
"_version_":1702016739840098304},
...
Does anyone knows what I'm missing here? Thanks for any help.

Solr - DIH define & import many-to-many field

I've two MySQL tables book and author, they have many-to-many relationship, done via book_author_mapper whose row contain columns book_id / author_id.
In Solr, I have a query to get book list, for each book I need to get an array of author_id for the book.
Currently, I am thinking about to use a multi-valued field to store book ids.
My question is:
How to define the field, and how to write the SQL in DIH, it seems need multiple SQL, right? Thx.
If I want to get not just the author_id list, but as well as author_name for each author_id, is that possible?

After viewing doc & googling, I have kind solved the problem.
Tables
book
author
book_author_map (this is the middle table for many-to-many relationship)
DIH config file
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/test?characterEncoding=utf8&zeroDateTimeBehavior=convertToNull" user="root"
password="123456" />
<document>
<entity name="book" pk="id"
query="SELECT * FROM book where status = 0 limit 200000;"
deltaImportQuery="SELECT * FROM book where status = 0 and id='${dih.delta.id}' limit 200000;"
deltaQuery="select id from book where status = 0 and CONVERT_TZ(`update_date`, ##session.time_zone, '+00:00') > '${dih.last_index_time}'"
>
<entity name="author"
query="SELECT au.cn_name as author_cn_name FROM author AS au JOIN book_author_map AS bam ON au.id = bam.author_id WHERE bam.book_id = ${book.id} limit 10;"
>
<field name="authors" column="author_cn_name" />
</entity>
</entity>
</document>
</dataConfig>
Field definition
<field name="cn_name" type="textComplex" indexed="true" stored="true" />
<field name="en_name" type="textComplex" indexed="true" stored="true" />
<field name="status" type="int" indexed="true" stored="true" />
<field name="authors" type="textComplex" indexed="true" stored="true" multiValued="true" />
TODOs
parentDeltaQuery It get pk of parent entity, but when it is called, and what is do? Is that necessary?
Does deltaQuery and parentDeltaQuery necessary in sub entity?

New FIELDS are not showing in search

I did a basic solr setup, Configured dataImportHandler and create very simple data config file with two fields and indexed it. It all worked fine.. But now I am adding new fields there and doing full import after that but for some reason new fields are just not showing in search result ( using solr interface for search). I have tried restarting solr, running config-reload to no effect.
this is my data config file. Not sure what's wrong here.
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/msl4" user="root" password=""/>
<document>
<entity name="hub_contents" query="select * from hub_contents" deltaQuery="select * from hub_contents where last_modified > '${dataimporter.last_index_time}'">
<field column="id_original" name="id" />
<field column="title" name="title" />
<field column="parent_id" name="parent_id" />
<field column="item_type" name="item_type" />
<field column="status" name="status" />
<field column="updated_at" name="updated_at" />
</entity>
</document>
</dataConfig>

You can add the below fields in your schema.xml
<field name="id" type="long" indexed="true" stored="true"/>
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="parent_id" type="long" indexed="true" stored="true"/>
<field name="item_type" type="text_general" indexed="true" stored="true"/>
<field name="status" type="text_general" indexed="true" stored="true" />
<field name="updated_at" type="date" indexed="true" stored="true"/>
It is left to you what type(fieldType) you want to add depending on your requirement.
indexed: true if this field should be indexed (searchable or
sortable)
stored: true if this field should be retrievable
Add the below tag:
<uniqueKey>id</uniqueKey>
This is to use to determine and enforce document uniqueness.

Solr some mdb files are not process as documents in dataimport

I have to import some mdb files to solr. some of mdb files are indexed well as document but there are others don't.
I use Solr 4.10.0 and ucanaccess ver. 2.0.9 The follwoing is a screen shot from the log:
For some missing fields values (in the screen shot case 6 fields) I have set onError="continue" in the dataimport-config:
<document>
<entity name="Book" dataSource="a" query="select bkid AS id, bkid AS BookID,bk AS BookTitle, betaka AS BookInfo, cat as cat from 0bok WHERE bkid = 29435">
<field column="id" name="id"/>
<field column="BookID" name="BookID"/>
<field column="BookTitle" name="BookTitle"/>
<field column="cat" name="cat"/>
<entity name="Category" dataSource="a" query="select name as CatName, catord as CatWeight, Lvl as CatLevel from 0cat where id = ${Book.CAT}">
<field column="CatName" name="CatName"/>
<field column="CatWeight" name="CatWeight"/>
<field column="CatLevel" name="CatLevel"/>
</entity>
<entity name="Pages" dataSource="a1" onError="continue" query="SELECT nass AS PageContent, page AS PageNum FROM book ORDER BY page">
<field column="PageContent" name="PageContent"/>
<field column="PageNum" name="PageNum"/>
<entity name="Titles" dataSource="a1" onError="continue" query="SELECT * FROM title WHERE id = ${Pages.PAGENUM} ORDER BY sub">
<field column="ID" name="TitleID"/>
<field column="TIT" name="PageTitle"/>
<field column="SUB" name="TitleWeight"/>
<field column="LVL" name="TitleLevel"/>
</entity>
</entity>
</entity>
</document>
This is a screen shot for the table regarded in the database with the 6 undefined data fields:
At the end of dataimporting for this mdb file I got the following response:
Last Update: 09:12:04 Requests: 31,952, Fetched: 78,980, Skipped: 0,
Processed: 0 Started: 18 minutes ago
Which it is shown that 0 processed!
There are other mdb files are proceed i.e 1 processed is generated in the response but I have got the folwing errors in the log:
10/7/2014 9:28:08 AM ERROR SolrWriter Exception while solr commit.
this writer hit an OutOfMemoryError; cannot commit...
and
SolrIndexWriter Error closing IndexWriter this writer hit an
OutOfMemoryError; cannot flush...
How could I solve this issue? and why Solr requests and fetched all this records and then process and index none?!

Solr data indexing , returns only one field

I am trying to use solr for indexing data from my data base.
After I index data, when I query *.*
I get just the id field in result. not all the fields which I had in my query.
My data-config.xml
<document name="content">
<entity name="documen" query="SELECT indexId ,brand_id, category_id, product_name from Production">
<field column="indexId" name="id" />
<field column="category_id" name="categoryid" />
<field column="brand_id" name="brandid" />
<field column="product_name" name="id" />
</entity>
</document>
My schema.xml looks like this :
<field name="id" type="int" indexed="true" stored="true" required="true"/>
<field name="categoryid" type="int" indexed="true" stored="true"/>
<field name="brandid" type="int" indexed="true" stored="true" />
<field name="productname" type="string" indexed="true" stored="true"/>
When I query using *.* I get
<doc>
<str name="id">1</str>
<long name="_version_">1426653005792411648</long></doc>
<doc>
<str name="id">2</str>
<long name="_version_">1426653005793460224</long></doc>
<doc>
I get only "id" field as result.
Actually, whatever field is in "uniquekey" tag is returned as query result

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to index csv files using dih in solr - csv

Related

Solr-MySQL DataImportHandler only retrieves ID instead of SELECT * query

Solr - DIH define & import many-to-many field

New FIELDS are not showing in search

Solr some mdb files are not process as documents in dataimport

Solr data indexing , returns only one field

Categories

Resources