How do I speed up this Select Distinct query in mysql - mysql

I am implementing an autocomplete field in a website, and my endpoint queries a mysql to get the data. This particular field can only be retrieved from a database 'list' which has a about 600,000 entries, where only 4000 of them are unique. Once a day or so, a new unique list item will be added.
I'm using a call like this, where query is whatever the user has started typing in the autocomplete box:
SELECT DISTINCT item FROM list WHERE item like '%query%' LIMIT 10;
This query takes about .8 milliseconds, and the majority of the time is due to the SELECT DISTINCT, I believe. Is there any way to increase the performance of this task?
I'm fine to create new tables/views, but I'm not sure what will help.

When you use the % sign at the beginning in a query with LIKE no index will be used, and every single row of your table will be scanned, no matter how your indexes look like.
Think about an address book with letter tabs, if I'm looking for 'Mads%', I can point my finger to the letter M and read from there, but If I look for '%dse%', I've to read every entry.
If you can you should drop the percent sign at the beginning of the string, in this case the index on item will be used (be sure to add it, you can also index a part of the field in case it's a long field).
Otherwise, you should use full text search instead.

Related

Deduplication of records without sorting in a mainframe sequential dataset with already sorted data

This is a query on deduplicating an already sorted mainframe dataset without re-sorting it.
The input sequential dataset has the following structure. 'KEYn' in the first 4 bytes represents the key and the remainder of each row represents the rest of the record's data. There are records in which the same key is repeated though the remaining data is different in each record. The records are already sorted on 'KEYn'.
KEY1aaaaaa
KEY1bbbbbb
KEY2cccccc
KEY3xxxxxx
KEY3yyyyyy
KEY3zzzzzz
KEY3wwwwww
KEY4uuuuuu
KEY5hhhhhh
KEY5ffffff
My requirement is to pick up the first record of each key and drop the remaining 'duplicates'. so the output file for the above input should look like this:
KEY1aaaaaa
KEY2cccccc
KEY3xxxxxx
KEY4uuuuuu
KEY5hhhhhh
Since the data is already sorted, I don't want to use SORT utility with SUM FIELDS=NONE or ICETOOL with SELECT - FIRST operand since both of these will actually end up re-sorting the data on the deduplication key (KEYn). Also the actual dataset I am referring to is huge (1.6 billion records, AVGRLEN 900 VB) and a job actually ran out of sort work space trying to sort it in one go.
My query is: Is there any option available in JCL based utilities to do this deduplication without resorting and using sort work space? I am trying to avoid writing a COBOL/Assembler program to do this.
Try this untested.
OPTION COPY
INREC BUILD=(1,4,SEQNUM,3,ZD,RESTART=(5,4),5)
OUTFIL INCLUDE=(5,3,ZD,EQ,1),BUILD=(1,4,8)

what is the best approach to find duplicates in my Db table

In my app the user can select multiple filter options. I store this in a DB table.
For example
User 1 can select filters A^B
User 2 can select filters AORC^D
and so forth.
The way it is stored in Db is
user filter_selected
user1 A^B
user2 AORC^D
Now the criteria is no user can have the same filters selected. So if user 3 comes and select A^B or B^A it should throw a error.
I am trying to come up with a smart logic to validate this in javascript.
One approach is go through all the users in the DB (can be many) and sort alphabetically and check if its the same. So in our example A^B and B^A will be the same AB^. This way I can check. Any other better approach may be using mysql command itself ?
you can sort your filter rule based on character and then insert it to do
for example, B^A will convert to AB^ and when you want to check you can sort your filter and then search it
if you want to have an original filter you don't care about the size of your database and more you care about speed you can save original as another column too.if you are care about size of database you can just save the original filter and when you want to search select the rows that have the same length as your filter and then you need to sort alphabetically or you can save index of every filter chars for example when you change A^B to AB^ you can save this filter AB^|021 but this will need to some more space too like original column and I don't suggest this method. also if your filters are always in small length you can don't fetch all record and compare to all. you can just create all possible way of the filter(for example AB^ A^B B^A BA^ ^AB ^BA) but you must be careful because in this method you are creating n! string and this is not good at all, just for too small length string its ok and that's when you have too many records in your database this method can be good

MySQL query performance for paginating

Ok, so what is the best practice when it comes down to paginating in mysql. Let me make it more clear, let's say that a given time I have 2000 records and there are more being inserted. And I am displaying 25 at a time, I know I have to use limit to paginate through the records. But what am I supposed to do for the total count of my records? Do I count the records every time users click to request the next 25 records. Please, don't tell me the answer straight up but rather point me in the right direction. Thanks!
The simplest solution would be to just continue working with the result set normally as new records are inserted. Presumably, each page you display will use a query looking something like the following:
SELECT *
FROM yourTable
ORDER BY someCol
LIMIT 25
OFFSET 100
As the user pages back and forth, if new data were to come in it is possible that a page could change from what it was previously. From a logical point of view, this isn't so bad. For example, if you had an alphabetical list of products and a new product appeared, then the user would receive this information in a fairly nice way.
As for counting, your code can allow moving to the next page so long as data is there to support a new page being added. Having new records added might mean more pages required to cover the entire table, but it should not affect your logic used to determine when to stop allowing pages.
If your table has a date or timestamp column representing when a record was added, then you might actually be able to restrict the entire result set to a snapshot in time. In this case, you could prevent new data from entering over a given session.
3 sugggestions
1. Only refreshing the data grid, while clicking the next button via ajax (or) storing the count in session for the search parameters opted .
2. Using memcache which is advanced, can be shared across all the users. Generate a unique key based on the filter parameters and keep the count. So you won't hit the data base. When a new record, gets added then you need to clear the existing memcache key. This requires a memache to be running.
3. Create a indexing and if you hit the db for getting the count alone. There won't be much any impact on performance.

Creating a global variable in Talend to use as a filter in another component

I have job in Talend that is designed to bring together some data from different databases: one is a MySQL database and the other a MSSQL database.
What I want to do is match a selection of loan numbers from the MySQL database (about 82,000 loan numbers) to the corresponding information we have housed in the MSSQL database.
However, the tables in MSSQL to which I am joining the data from MySQL are much larger (~ 2 million rows), are quite wide, and thus cost much more time to query. Ideally I could perform an inner join between the two tables based on the loan number, but since they are in different databases this is not possible. The inner join that is performed inside a tMap occurs after the Lookup input has already returned its data set, which is quite large (especially since this particular MSSQL query will execute a user-defined function for each loan number).
Is there any way to create a global variable out of the output from the MySQL query (namely, the loan numbers selected by the MySQL query) and use that global variable as an IN clause in the MSSQL query?
This should be possible. I'm not working in MySQL but I have something roughly equivalent here that I think you should be able to adapt to your needs.
I've never actually answered a Stackoverflow question and while I was typing this the page started telling me I need at least 10 reputation to post more than 2 pictures/links here and I think I need 4 pics, so I'm just going to write it out in words here and post the whole thing complete with illustrations on my blog in case you need more info (quite likely, I should think!)
As you can see, I've got some data coming out of the table and getting filtered by tFilterRow_1 to only show the rows I'm interested in.
The next step is to limit it to just the field I want to use in the variable. I've used tMap_3 rather than a tFilterColumns because the field I'm using is a string and I wanted to be able to concatenate single quotes around it but if you're using an integer you might not need to do that. And of course if you have a lot of repetition you might also want to get a tUniqueRows in there as well to save a lot of unnecessary repetition
The next step is the one that does the magic. I've got a list like this:
'A1'
'A2'
'B1'
'B2'
etc, and I want to turn it into 'A1','A2','B1','B2' so I can slot it into my where clause. For this, I've used tAggregateRow_1, selecting "list" as the aggregate function to use.
Next up, we want to take this list and put it into a context variable (I've already created the context variable in the metadata - you know how to do that, right?). Use another tMap component, feeding into a tContextLoad widget. tContextLoad always has two columns in its schema, so map the output of the tAggregateRows to the "value" column and enter the name of the variable in the "key". In this example, my context variable is called MyList
Now your list is loaded as a text string and stored in the context variable ready for retrieval. So open up a new input and embed the variable in the sql code like this
"SELECT distinct MY_COLUMN
from MY_SECOND_TABLE where the_selected_row in ("+
context.MyList+")"
It should be as easy as that, and when I whipped it up it worked first time, but let me know if you have any trouble and I'll see what I can do.

pass select value and retrieve data from mysql

I'm working on dynamic pages. The scenario goes like this.
Page 1 consists of a drop down menu for me to select.
The selected menu value will pass to the next page URL.
Then this value will be retrieved by the page to compare and filter the MYSQL Table field name.
The table consists of 4 main categories which was the option given to the user to select in page 1.
For each category there will be 30 words stored.
When the user searches for a certain word, it is supposed to ask for an option by displaying the drop down and checking if this value is the same as the value of field id of the table and the word is now used to narrow down to just search in this field instead of all 4 categories.
How to do this? I was working on this for few weeks but I can't solve it.
$ret1 = "SELECT * FROM $db_tb_name WHERE MATCH
('categoryid'='.$selectvalueid.') AND AGAINST ('%$str%' IN BOOLEAN MODE)";
and I have this piece of code and it has wrong syntax.
Please someone help me with some example.
So let me get this straight: at the time you issue your request, you have both a category id $selectvalueid (resulting from the drop down) and a search term $str (resulting from a free text entry?). And you want those rows where the categoryid column of your table exactly matches the $selectvalueid string from your script, while some other column (with what name?) contains the $str value. Is that correct so far?
If it is, your query might look like this:
SELECT * FROM $db_tb_name
WHERE categoryid = '$selectvalueid'
AND wordcolumn LIKE '%$str%'
Note that simply pasting user input variables into SQL queries most likely will make your site vulnerable to SQL injection attacks. So a better way would be using queries with placeholders, and using prepared statements to fill in those placeholders in a safe way when executing the query. How to do that depends on the application language.