How to speed up solr DIH with subqueries - mysql

I would like to speed up the DIH for a solr configuration that has the following structure:
user entity (mapped to user table)
user entity has 1..n values mapped to field of the user entity. so n additional fields
every field is gathered through a subquery on value table.
example:
entity:user (select * from user)
user has the following fields:
value_1: (select * from value where uid=user.id and category=1)
value_2: (select * from value where uid=user.id and category=2)
value_3: (select * from value where uid=user.id and category=3)
As there is many subqueries, import takes too long.
whats the best approach to this using SOLR and DIH (MySQL) ?

You can speed things up using SortedMapBachedCache
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors

I have dealt with this exact same issue, and at heart the problem is that neither MySQL nor Solr's SQL DIH has the built-in capacity to use a field value to help name a MySQL result column or a Solr field.
Ideally, you could do something like this: THIS DOESN'T WORK!
<entity name="value" query="select myfield, category from t1 where uid=${user.id}">
<field column="myfield" name="value_${value.category}">
</entity>
Without this wonderful, non-existent feature, there are several ways to get similar functionality with less convenience.
This page shows a great analysis of two different methods for creating this functionality, either with the ScriptTransformer (he found it simple to implement but it slowed down the import badly), or the TemplateTransformer (which requires you to compile a very short Java snippet, but is apparently much more efficient.
Again, this is likely the solution you want.
In my own case, I hadn't found this solution, and instead wrote a short Java program to make the SQL requests, build the SolrInputDocument and then submit them in batches to Solr. And then later the whole thing was made irrelevant when we decided to put all of the values into Solr as a single JSON-encoded field.
Good luck!

Related

Sorting ResultSet obtained from SpringJpa ExampleMatcher from most matching to least to be used in an Advanced Search

I am in the process of writing an advanced search function using Spring boot and MySQL for a Book Management system.
My Book object contains various information such as material id,book name, author, publisher, description, product type (as in a story book or a reference material etc.)
I managed to write an ExampleMatcher as follows;
ExampleMatcher exampleMatcher = ExampleMatcher.matchingAny().
withIgnoreCase()
.withIgnorePaths("material_id")
.withStringMatcher(ExampleMatcher.StringMatcher.CONTAINING)
.withStringMatcher(ExampleMatcher.StringMatcher.STARTING)
.withIgnoreNullValues();
Example example = Example.of(book, exampleMatcher);
List<Book> all = bookRepository.findAll(example);
But when i get the results set, the results are sorted according to the material id. And records that have attributes matching almost all the fields are also there, but sorted according to the id.
Is there a way for me to sort the results in a way that the most matching records are in the first few records in the list and then the other records? As in, to sort from most matching to least matching?
As far as i understood, JpaSort allows ascending and descending sorting and also we can allow specific sorting for specific attributes.
But in the advanced search, the searching is done dynamically according to the attributes that the user fills in. Therefore, i cannot program which fields of the table to sort right? For example, if i program the book name field to be sorted in ascending order and if the user did not specify any value for that particular field, then sorting under that field is useless right?
That is why i want to know if there is any way to dynamically sort the results from most matching to least matching. Any way of achieving this task is much appreciated. Thank you.
After two whole days of reading more than 50-70 articles and posts on the Internet, i was able to implement the Advanced Search in a more optimized manner.
I was not able to find how to sort the results obtained from most-matching to least-matching as i originally asked in the question. So if someone can still answer my original question, i am happy to accept.
The workaround i used is as follows.
From an idea i got to dynamically generate the SQL query, i was able to find a lead and referred articles on that.
In Dynamic Query in Spring Boot, the author has used Java Reflection API to manually go through the non-null fields of the entity class and to generate the SQL query. But as we all know, when you are using Springboot and when all the configurations are done for you by Springboot, i don't think it is really an effective way to have the Hibernate dependency explicitly, to manage sessions and run your SQL query. The HibernateJpaSessionFactoryBean used in the above article is now deprecated. Therefore, i referred various articles and the Spring Data Jpa Documentation but could not resolve the error that i always got saying that Springboot cannot find the entityManagerFactory bean.
Therefore, i searched for ways to dynamically generate queries using Spring Data JPA itself and not use Hibernate and facing a hassle on session managing etc. Dynamic Queries with Spring Data JPA Specifications and Using Spring Data JPA Specification has enough information on how to implement JpaSpecification in order to generate queries dynamically in Springboot.
So at the end, i used information from all these 3 articles sited here to come up with my implementation. I used Java Reflection to create the Specification according to the Class type of the non-null fields in my entity object.
The new part i added by myself was, i grouped all the separate Specifications together to a List, and wrote a loop to dynamically generate the final Specification to be used in retrieving the data. It is as follows.
List<BookSpecification> bookSpecifications = createDynamicQuery(book);
if (bookSpecifications.size() != 0) {
Specification<Book> dynamicQuery = Specification.where(bookSpecifications.get(0));
for (int i = 1; i < bookSpecifications.size(); i++) {
dynamicQuery = dynamicQuery.or(bookSpecifications.get(i));
}
List<Book> all = bookRepository.findAll(dynamicQuery);
all.forEach(System.out::println);
return all;
}
The createDynamicQuery() method above, which i used in my own way is inspired from the information in the cited articles.
Using this way, i was able to obtain much more accurate Advanced Search results rather than using ExampleMatcher for the same advanced search criteria. And since i am searching by specific field names, the search results were also sorted in an accurate way.

LUIS to MySQL query - Azure Chatbot

How to generate MySQL Querys with LUIS and fetch data from the DB hosted in Azure?
Should generate a natural language query to an MySQL Query.
e.g.
How much beer was drunken on the oktoberfest 2018?
--> GET amountOfBeer FROM Oktoberfest WHERE Year ==2018;
Does anyone has an idea how to get this to work?
Already generated small Intents in LUIS e.g. GetAmountOfBeer
Dont know how to generate the MySQL Statements and how to get the data from the DB.
Thanks.
You should be able to achieve this, or something similar, using intents and entities. How successful this can be depends on how many and how diverse your queries need to be. First lets start with the phrase you mentioned: "How much beer was drunken on the oktoberfest 2018". You can easily (as you've done) add this as an utterance for an intent, GetAmountOfBeer. Though I'm a fan of intent names that you can read as "I want to GetAmountOfBeer", here you may want to name the intent amountOfBeer so you can use it in your query directly.
Next you need to set up you entities. For year (or datetime rather) that should be easy, as I believe there are some predefined entities for this. I think you need to use a datetime recognizer to parse out the right attribute (like year), but I haven't tried to do this before. Next, Oktoberfest seems to be a specific holiday or event in your DB, so you could create a list entity of all the events you have.
What you are left with is something like (pseudocode) GET topIntent FROM eventEntity WHERE Year ==datetime.Year, or something like that.
If your query set is more complex, you might have to have multiple GET statements, but you could put those in a switch statement by topIntent so that, no matter what the intent is, you can parse out the correct values. You also might want to build this into a dialog where you can check if the entities exist, and if not, you can prompt the user for the missing data.

Solr index multiple tables from MySQL

I have following mysql tables
1. user(user_id,email)
2. tweets(tweet_id,user_id,tweet)
3. tags(tag_id,tag)
4. tweets_tags(tweet_id,tag_id)
I want to show current user's tweets under "My Tweets" Tab in application. I want to get following data from Solr
user_id
email
tweet where user_id=x
tags where tweet_id=xx
How to index those mysql table on Solr? I only what to know the code of schema.xml and data-config.xml for Full/Delta import.
Note : I am not asking about MySQL connector etc, I have done already.
The use case you've described doesn't seem to justify using solr. You would just make sure you have proper keys and indexes and do it in mysql directly.
If for some reason you MUST use solr, you could probably prepare all the data and feed it to solr in a tag/tweet/user structure like this
user1 - tweet1 - tag1
user1 - tweet1 - tag2
user1 - tweet2 - tag1
and so on.
Then from solr you query by user, and then sort and group by tweet and then tag.
However I must state again that the solution I just described is implemented much safer with a higher confidence on the result by using plain sql.
Should you provide more details on your desired outcome, I'd be happy to suggest the database structure along with the necessary foreign keys and indexes and the queries you need to get your data out.
If you are using DIH (dataimporterhandler), I guess that link should be the solution for you:
Import with sub entities
If you have problem with writing the exact configurations, please let me know, I can assist you.

MySql - Select * from 2 tables, but Prefix Table Names in the Resultset?

I'd like to select * from 2 tables, but have each table's column name be prefixed with a string, to avoid duplicate column name collissions.
For example, I'd like to have a view like so:
CREATE VIEW view_user_info as (
SELECT
u.*,
ux.*
FROM
user u,
user_ex ux
);
where the results all had each column prefixed with the name of the table:
e.g.
user_ID
user_EMAIL
user_ex_ID
user_ex_TITLE
user_ex_SIN
etc.
I've put a sql fiddle here that has the concept, but not the correct syntax of course (if it's even possible).
I'm using MySql, but would welcome generic solutions if they exist!
EDIT: I am aware that I could alias each of the fields, as mentioned in one of the comments. That's what I'm currently doing, but I find at the start of a project I keep having to sync up my tables and views as they change. I like the views to have everything in them from each table, and then I manually select out what I need. Kind of a lazy approach, but this would allow me to iterate quicker, and only optimize when it's needed.
I find at the start of a project I keep having to sync up my tables and views as they change.
Since the thing you're trying to do is not really supported by standard SQL, and you keep modifying database structures in development, I wonder if your best approach would be to write a little script that recreates that SELECT statement for you. Maybe wrap it in a method call in the development language of your choice?
Essentially you'd need to query INFORMATION_SCHEMA for the tables and columns of interest, probably via a join, and write the results out in SQL style.
Then just run the script every time you make database structural changes that are important to you, and watch your code magically keep up.

Replace MySQL values in over 120 tables

We have redirect huge internet service from domain .de to domain .com - this is discussion board (vBulletin). At the moment we need to change all phrases like "domainame.de" to "domainame.com":
Over 120 tables (posts, threads)
A lot of MySQL fields
Anyone have suggestion how do something like this? We need replace string "domainame.de" to "domainname.com" - everywhere.
What you want to do sounds dangerous, as it could hit some false-positives, and change things unintentionally. Suppose your old domain is 'acme.de' and the new one 'acme.com', and some random visitor posted the following (this is an over-simplified example):
I enjoy working with Acme.Depending on my mood.
It would be very easy to convert this to:
I enjoy working with Acme.compending on my mood.
Therefore, my suggestions, in order of preference:
Don't update the DB at all, just configure your web server to redirect the .de traffic to the .com traffic. You're less likely to make mistakes this way.
If you must update the discussion board, do it in your display logic, rather than in the database--then you'll have no chance of making irrevocable mistakes.
Write a script in perl, or your favorite text-processing language, language, which does a regex replacement on every table/field. I suggest the following strategy:
a. Do a SELECT id,<field name> FROM <table> WHERE <field name> LIKE '%domain.de%'
b. Store output in a CSV, or other format that is easy to parse.
c. Run your Regex script to change domain.de to domain.com
d. Check the output.
e. Do an UPDATE <table> SET <field>=? WHERE id=?, with the output of your script.
Do it in small chunks at first (a few tens or hundreds of posts at a time), and visually check the results before committing your changes to the database, to make sure you haven't made mistakes.
I dont know whether this might work for you but have a look at this
The following query would give you the list of tables (120 tables)
SELECT DISTINCT table_name
FROM
(
SELECT *
FROM information_schema.`COLUMNS` C
) t1
WHERE TABLE_SCHEMA='schema_name'
next
you can use UPDATE query for each table. you can achieve this using CURSORS
I am not good in cursors but I think it will help in this situation