Need pass userID parameter in apache solr.
Example:
http://localhost.com:8983/solr/collection1/select?q=abc&wt=json&indent=true&userID=THIS-PARAMETR-NEED-PASS
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/mydatabase" user="root" password="root"/>
<document name="t">
<entity name="act" dataSource="ds-1" query="SELECT * FROM mytable WHERE UserID='THIS-PARAMETR-NEED-PASS'">
<field column="Ac" name="acid"/>
<field column="UserID" name="userid"/>
<field column="Comment" name="comment"/>
<entity name="m"
query="SELECT * FROM `table2`WHERE `tid` = '${mytable.tid}'">
<field column="Title" name="title"/>
</entity>
</document>
</dataConfig>
The example you give is a bit mixed up as the url you show hints to a search request, but the configuration shows that you want to access a request parameters within a dataimport handler.
Your concrete parameter could be accessed like ${dataimporter.request.userID}. Referring to the wiki you would need to alter your dataconfig like this
<entity name="act" dataSource="ds-1" query="SELECT * FROM mytable WHERE UserID='${dataimporter.request.userID}'">
Related
I'm new to SEO and having an issue with Mysql data extraction within Solr 8.8; despite that below declaration, the document is only retrieving ID instead of the whole bunch.
<document>
<entity name="foobars"
query="SELECT *, 'test' AS ENTITY FROM foobar"
deltaquery="SELECT ID FROM foobar WHERE updated >= '${dataimporter.last_index_time}'"
deltaimportquery="SELECT *, 'MAT' AS ENTITY FROM foobar WHERE ID = ${dataimporter.delta.id}">
<field column="ENTITY" name="entity" />
<field column="ID" name="id" />
<field column="FOO" name="foo" />
<field column="BAR" name="bar" />
<field column="BAZ" name="baz" />
<field column="UPDATED" name="updated" />
</entity>
</document>
This is a sample of what was imported :
{
"responseHeader":{
"status":0,
"QTime":9,
"params":{
"q":"*:*",
"_":"1623166185835"}},
"response":{"numFound":147,"start":0,"numFoundExact":true,"docs":[
{
"id":"214768.0",
"_version_":1702016739810738176},
{
"id":"296594.0",
"_version_":1702016739840098304},
...
Does anyone knows what I'm missing here? Thanks for any help.
I've two MySQL tables book and author, they have many-to-many relationship, done via book_author_mapper whose row contain columns book_id / author_id.
In Solr, I have a query to get book list, for each book I need to get an array of author_id for the book.
Currently, I am thinking about to use a multi-valued field to store book ids.
My question is:
How to define the field, and how to write the SQL in DIH, it seems need multiple SQL, right? Thx.
If I want to get not just the author_id list, but as well as author_name for each author_id, is that possible?
After viewing doc & googling, I have kind solved the problem.
Tables
book
author
book_author_map (this is the middle table for many-to-many relationship)
DIH config file
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/test?characterEncoding=utf8&zeroDateTimeBehavior=convertToNull" user="root"
password="123456" />
<document>
<entity name="book" pk="id"
query="SELECT * FROM book where status = 0 limit 200000;"
deltaImportQuery="SELECT * FROM book where status = 0 and id='${dih.delta.id}' limit 200000;"
deltaQuery="select id from book where status = 0 and CONVERT_TZ(`update_date`, ##session.time_zone, '+00:00') > '${dih.last_index_time}'"
>
<entity name="author"
query="SELECT au.cn_name as author_cn_name FROM author AS au JOIN book_author_map AS bam ON au.id = bam.author_id WHERE bam.book_id = ${book.id} limit 10;"
>
<field name="authors" column="author_cn_name" />
</entity>
</entity>
</document>
</dataConfig>
Field definition
<field name="cn_name" type="textComplex" indexed="true" stored="true" />
<field name="en_name" type="textComplex" indexed="true" stored="true" />
<field name="status" type="int" indexed="true" stored="true" />
<field name="authors" type="textComplex" indexed="true" stored="true" multiValued="true" />
TODOs
parentDeltaQuery It get pk of parent entity, but when it is called, and what is do? Is that necessary?
Does deltaQuery and parentDeltaQuery necessary in sub entity?
Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file.
I'm basically doing what is shown in this forum question, which is taking a file path from a database and using TikaEntityProcessor to analyze the document.
His problem was version issues with Tika but I'm using a version that is about five years older so I'm not sure if it's still issues with the current version of Tika or if I'm missing something extremely obvious (which is possible I'm extremely new to Solr) This is my data configuration. TextContentURL is the filepath!
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/EDMS_Metadata" user="root" password="**************" />
<dataSource name="ds-file" type="BinFileDataSource"/>
<document name="doc1">
<entity name="db-data" dataSource="ds-db" query="select TextContentURL as 'id',ID,Title,AuthorCreator from MasterIndex" >
<field column="TextContentURL" name="id" />
<field column="Title" name="title" />
</entity>
<entity name="file" dataSource="ds-file" processor="TikaEntityProcessor" url="${db-data.TextContentURL}" format="text">
<field column="text" name="text" />
</entity>
</document>
</dataConfig>
I'd like to note that when I delete the second entity and just run the database draw it works fine. I can run and query and I get this output when I run a faceted search
"response": {
"numFound": 283,
"start": 0,
"docs": [
{
"id": "/home/paden/Documents/LWP_Files/BIGDATA/6220106.pdf",
"title": "ENGINEERING INITIATION",
},
This means that it is pulling the document filepath JUST FINE. The id is the correct filepath. But when I re-add the second entity it logs errors saying it can't find the file? Am I missing something obvious?
Solr is logging these errors:
WARN FileDataSource FileDataSource.basePath is empty. Resolving to: /home/paden/Downloads/solr-5.1.0/server/.
ERROR DocBuilder
Exception while processing: file document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: (resolved to: /home/paden/Downloads/solr-5.1.0/server/.
ERROR DataImporter
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: (resolved to: /home/paden/Downloads/solr-5.1.0/server/.
Try with the below data-config.
<dataConfig>
<dataSource name="ds-db" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/EDMS_Metadata" user="root" password="**************" />
<dataSource name="ds-file" type="BinFileDataSource"/>
<document name="doc1">
<entity name="db-data" dataSource="ds-db" query="select TextContentURL,ID,Title,AuthorCreator from MasterIndex" >
<field column="TextContentURL" name="TextContentURL" />
<field column="Title" name="title" />
<entity name="file" dataSource="ds-file" processor="TikaEntityProcessor" url="${db-data.TextContentURL}" format="text" onError="continue">
<field column="text" name="text" />
</entity>
</entity>
</document>
</dataConfig>
I have to import some mdb files to solr. some of mdb files are indexed well as document but there are others don't.
I use Solr 4.10.0 and ucanaccess ver. 2.0.9 The follwoing is a screen shot from the log:
For some missing fields values (in the screen shot case 6 fields) I have set onError="continue" in the dataimport-config:
<document>
<entity name="Book" dataSource="a" query="select bkid AS id, bkid AS BookID,bk AS BookTitle, betaka AS BookInfo, cat as cat from 0bok WHERE bkid = 29435">
<field column="id" name="id"/>
<field column="BookID" name="BookID"/>
<field column="BookTitle" name="BookTitle"/>
<field column="cat" name="cat"/>
<entity name="Category" dataSource="a" query="select name as CatName, catord as CatWeight, Lvl as CatLevel from 0cat where id = ${Book.CAT}">
<field column="CatName" name="CatName"/>
<field column="CatWeight" name="CatWeight"/>
<field column="CatLevel" name="CatLevel"/>
</entity>
<entity name="Pages" dataSource="a1" onError="continue" query="SELECT nass AS PageContent, page AS PageNum FROM book ORDER BY page">
<field column="PageContent" name="PageContent"/>
<field column="PageNum" name="PageNum"/>
<entity name="Titles" dataSource="a1" onError="continue" query="SELECT * FROM title WHERE id = ${Pages.PAGENUM} ORDER BY sub">
<field column="ID" name="TitleID"/>
<field column="TIT" name="PageTitle"/>
<field column="SUB" name="TitleWeight"/>
<field column="LVL" name="TitleLevel"/>
</entity>
</entity>
</entity>
</document>
This is a screen shot for the table regarded in the database with the 6 undefined data fields:
At the end of dataimporting for this mdb file I got the following response:
Last Update: 09:12:04 Requests: 31,952, Fetched: 78,980, Skipped: 0,
Processed: 0 Started: 18 minutes ago
Which it is shown that 0 processed!
There are other mdb files are proceed i.e 1 processed is generated in the response but I have got the folwing errors in the log:
10/7/2014 9:28:08 AM ERROR SolrWriter Exception while solr commit.
this writer hit an OutOfMemoryError; cannot commit...
and
SolrIndexWriter Error closing IndexWriter this writer hit an
OutOfMemoryError; cannot flush...
How could I solve this issue? and why Solr requests and fetched all this records and then process and index none?!
I am using Solr 4.6.0, indexing about 10'000 elements at a time and I suffer bad import performance. That means that importing those 10'000 documents takes about 10 minutes. Of course I know, that this hardly depends on the server hardware, but I still would like to know, how any performance boosts could be done and which of them are actually useful in real-world situations (joins etc.)? I am also very thankful for precise examples and not just links to the official documentation.
Here is the data-config.xml
<dataConfig>
<dataSource name="mysql" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://xxxx"
batchSize="-1"
user="xxxx" password="xxxx" />
<document name="publications">
<entity name="publication" transformer="RegexTransformer" pk="id" query="
SELECT
sm_publications.id AS p_id,
CONCAT(sm_publications.title, ' ', sm_publications.abstract) AS p_text,
sm_publications.year AS p_year,
sm_publications.doi AS p_doi,
sm_conferences.full_name AS c_fullname,
sm_journals.full_name AS j_fullname,
GROUP_CONCAT(DISTINCT sm_query_publications.query_id SEPARATOR '_-_-_-_-_') AS q_id
FROM sm_publications
LEFT JOIN sm_conferences ON sm_conferences.id = sm_publications.conference_id
LEFT JOIN sm_journals ON sm_journals.id = sm_publications.journal_id
INNER JOIN sm_query_publications ON sm_query_publications.publication_id = sm_publications.id
WHERE '${dataimporter.request.clean}' != 'false' OR
sm_publications.modified > '${dataimporter.last_index_time}' GROUP BY sm_publications.id">
<field column="p_id" name="id" />
<field column="p_text" name="text" />
<field column="p_text" name="text_tv" />
<field column="p_year" name="year" />
<field column="p_doi" name="doi" />
<field column="c_fullname" name="conference" />
<field column="j_fullname" name="journal" />
<field column="q_id" name="queries" splitBy="_-_-_-_-_" />
<entity name="publication_authors" query="
SELECT
CONCAT(
IF(sm_authors.first_name != '',sm_authors.first_name,''),
IF(sm_authors.middle_name != '',CONCAT(' ',sm_authors.middle_name),''),
IF(sm_authors.last_name != '',CONCAT(' ',sm_authors.last_name),'')
) AS a_name,
sm_affiliations.display_name AS aa_display_name,
CONCAT(sm_affiliations.latitude, ',', sm_affiliations.longitude) AS aa_geo,
sm_affiliations.country_name AS aa_country_name
FROM sm_publication_authors
INNER JOIN sm_authors ON sm_authors.id = sm_publication_authors.author_id
LEFT JOIN sm_affiliations ON sm_affiliations.id = sm_authors.affiliation_id
WHERE sm_publication_authors.publication_id = '${publication.p_id}'">
<field column="a_name" name="authors" />
<field column="aa_display_name" name="affiliations" />
<field column="aa_geo" name="geo" />
<field column="aa_country_name" name="countries" />
</entity>
<entity name="publication_keywords" query="
SELECT sm_keywords.name FROM sm_publication_keywords
INNER JOIN sm_keywords ON sm_keywords.id = sm_publication_keywords.keyword_id
WHERE sm_publication_keywords.publication_id = '${publication.p_id}'">
<field column="name" name="keywords" />
</entity>
</entity>
</document>
</dataConfig>
By query caching, I meant the CachedSqlEntityProcessor. I favor the merged solution as in your other question MySQL GROUP_CONCAT duplicate entries. But CachedSqlEntityProcessor will help too, if p_id repeated over and over in the resultset of the main query publication_authors, and you have less concern on the extra memory usage.
Update: It looks like you have two other questions solved, probably you can go either way, I post the short example/pointer as you requested anyway in case others find it handy to have
<entity name="x" query="select * from x">
<entity name="y" query="select * from y" processor="CachedSqlEntityProcessor" where="xid=x.id">
</entity>
<entity>
This example was taken from wiki. This will still run each query "select * from y where xid=id" per id from the main query "select * from x". But it won't send in the same query repeatedly.