Question is: How to rank keywords that have been used in search queries in my web application based on time and number of search?
A user types his search query in the text box. Via AJAX I need to return some suggestions to the user. These suggestions are based on number of search done for that keyword, and should be sorted by most recently searched.
For example if a user enters the search term as "hang" the suggestions should be in this order: "hangover part 2", "hangover".
How should I design the database to store the search queries?How should I write the sql query to get the suggestions?
For query suggestion a good way is to count the number of occurrences of each search query (it is probably better to not count repeated queries made by the same user). You'll have a file/table/something (query, count) like this:
"britney spears" 12
"kelly clarkson" 5
"billy joel" 27
"query abcdef" 2
"lady gaga" 39
...
Then you can sort by descending order of occurrence:
"lady gaga" 39
"billy joel" 27
"britney spears" 12
"lady xyz" 5
"query abcdef" 2
...
Then when someone is searching "lady", for example, do a prefix search on all strings from the top of the file/table/something to the bottom. If you only want K suggestions you'll go only until you find the Top-K suggestions.
You could implement this using a simple file, or you can also have a counting query table and do a query similar to:
SELECT q.query from (SELECT * from search_queries order by query_count DESC) as q where q.query LIKE "prefix%" LIMIT 0,K
Two notes:
There are better (and more difficult) ways of doing this. Amazon, for example, has a pretty nice query suggestion.
The provided solution will only suggest queries that starts with the user query. Like:
"lady" => ["lady gaga", "lady xyz"]
Query "lady" won't match "gaga lady". For them to match you will need query indexing, through the Full-Text Search support of your database or an external library such as Lucene.
Ideally, you'd sort on something like the following:
order by sum(# of searches / (how long ago that search was performed + 1))
This would have to be modified so that how long ago would be base on an appropriate base time. For example, if you want searches to count as half after a week, you'd make a week = 1.
This will clearly be inefficient, because calculating how long ago each search was performed for all search results will be time consuming. Thus, you might want to keep a running total for each search and multiply the totals by a certain value each time period. For example, if you want searches to count as half after a week, you would add one to that column for every search. Then, you would have a process that multiplies the search column by .5 every week. Then you just sort on that column.
Do you need something like autosuggestion? There is an JQuery plugin called autocomplete which only looks for similar words as soon as the user types in the letters. However, if you want to get the suggestions based on the number of times that keyword is searched by user, then you need to store the keywords in a separate table and then fetch it later for other user?
Related
I want to make my search result better, so for this i want that substring should also be matched.
For example..
In Mysql query when i search bag in product_name column as an output it give me 50 results but when i search bags in product_name it give me 20 results. But i want 50 results in second case also.
Select * from table where product_name like %bag%
Select * from table where product_name like %bags%
May be my question is duplicate but i didn't find any solution yet.
if you need to return all result whether the user enter plural or singular you can remove s or es from the keyword before search, but this way not accurate and you should make a complex function to loop all plural words rules. the best way to resolve this to help user enter 1 type in insertion process and display autocomplete if this entered before and avoid enter the same word you want to be the same twice into your database.
I am using Full Text searching for a website I am making to order a users search query by relevance. This is working great with one problem. When a search term is populated in my table more than 50% of the time, that query is ignored. I am looking for a solution to NOT ignore words that are in more than 50% of the rows of a table.
For example, in my "items" table, it may look something like this:
item_name
---------
poster 1
poster 2
poster 3
poster 4
another item
If a user searches for "poster", the query returns 0 results because it appears too many times in the table. How can I stop this from happening?
I've tried using IN BOOLEAN MODE, but that takes away the functionality I need (which is ordering by relevance).
Here's an example of my SQL:
SELECT item_title
FROM items
WHERE MATCH(item_title, tags, item_category) AGAINST('poster')
You have to recompile MySQL to change this. From the documentation on Fine-Tuning MySQL Full-Text Search
The 50% threshold for natural language searches is determined by the particular weighting scheme chosen. To disable it, look for the following line in storage/myisam/ftdefs.h:
#define GWS_IN_USE GWS_PROB
Change that line to this:
#define GWS_IN_USE GWS_FREQ
Then recompile MySQL. There is no need to rebuild the indexes in this case.
I want to perform a very complicated Query on a MySQL Table. Currently this MySQL Table stores user info like IP, Country, event_id and many other statistics like date_start date_end for specific events.
A specific event_id starts with date_start and when the user ends it a time() value is being written to the date_end column.
I want a query to find somehow all the suspicous users (ids return). Below are the rules that defines a suspicous user.
There are rows in the database for the user_id that has been connected from multiple countries. In this case where the country column has different values
There are many rows in the database for a specific event_id that the SUM OF (date_end-date_start) has a value for example +50% than all the other SUM of (date-end-date_start) of others events. With a simple words, the query should report the user_ids that have spent too much time on some events whereas they didn't spend too much time on all the others. The % percent value should be configurable.
I know it sounds crazy, however i tried to do it and i failed so much. I did that using PHP but it's slow and i'm sure that it can be done with queries.
Hope you understand me
Thank you
This problem is too big. Figure out how to find the users who have come from multiple countries. Then figure out how to get statistics on event durations. Then figure out how to identify outliers. Then, finally, try to merge all three solutions.
In general, use SQL to filter the data down to a manageable size, then PHP to do any further processing.
I am a website developer and I need help for an analyse: My (future) website is more or less a villa directory. People can add their villas there. Each villa will be stored in database.
I need to show 15 villas per page but I want a "turn over" (not sure it's the correct word in English) of the villas: every hour the villa that appears first on first page becomes the last villa of last page (so every villa rank increase of 1 except the first one that become the last). I want every villa to have the same chance (more or less) to appear on the first page. I don’t want a totally random system.
I need help on how to make a simple system that would not take a lot of resources (should be working with a few millions of records).
Note: I don’t want to use the ID of the villa because if a person posts 3 different villas at the same time, they will be all shown next to each other.
My proposition:
I create a field (INTEGER) called “random_order” for each villa and I put a random number between 0 and Max(INTEGER) and I create an Index on the column “random_order”.
Then to get the records in the order I want, I store (dunno where yet) a variable that point to a record in the index. Then every hours, I increase by 1 this variable (with a modulo).
I’m not an expert on indexes so I’m not really sure if it’s possible to do it and how to do it. I don’t know if there is a better way to do it as well…
Could you please tell me if this is correct or if you have better ideas?
Thank you
Another thing you could do, is store a count variable - from 0 to MAX, and constantly update that. Then query the server for the top 15 villas (using ORDER BY ASC/DESC) on (random_order + count). This will prevent the need to update the column every hour - only the count variable needs to be updated.
EDIT:
First you would get the count (from where you have stored it) and store it in a variable - count.
Then execute a query like
SELECT *, (random_order + <count>)%MAX_VAL AS villa_order
FROM villa_table
ORDER BY villa_order ASC
LIMIT 15
This will prevent constant unnecessary updations to your indexed column.
EDIT 2:
Ok after further analyzing, this is how i would do this.
Execute a simple select query
SELECT * FROM villa_table
WHERE random_order > count
ORDER BY random_order
LIMIT 15
If the number of rows in the result set is < 15 then fill in the remaining records from the beginning using.
SELECT *
FROM villa_table
ORDER BY random_order ASC
LIMIT <number of rows to be filled>
Even on 20m rows on an indexed column this takes < .5s.
I'm trying to query a Wordpress database and get the post titles to sort in a correct order.
The titles are formatted like this: Title 1, Title 2.. I need to sort them in ascending order, how can I do this? If I just sort them ascending they will come out like: 1,10,11...
Right now my order by statement is this but it does nothing:
ORDER BY CONVERT(p.post_title,SIGNED) ASC;
Per-row functions are a bad idea in any database that you want to scale well. That's because they have to perform the calculation on every row you retrieve every time you do a select.
The intelligent DBA's way of doing this is to create a whole new column containing the computed sort key, and use an insert/update trigger to ensure it's set correctly. The means the calculation is performed only when needed and amortises its cost across all selects.
This is one of the few cases where it's okay to revert from third normal form since the use of the triggers prevents data inconsistency. Hardly anyone complains about the disk space taken up by their databases, the vast majority of questions concern speed.
And, by using this method and indexing the new column, your queries will absolutely scream along.
So basically, you create another column called natural_title mapped as follows:
title natural_title
----- -------------
title 1 title 00001
title 2 title 00002
title 10 title 00010
title 1024 title 01024
ensuring that the mapping function used in the trigger allows for the maximum value allowed. Then you use a query like:
select title from articles
order by natural_title asc
If the # is always at the end like that you can do some string manipulation to make it work:
SELECT *, CAST(RIGHT(p.post_title,2) AS UNSIGNED) AS TITLE_INDEX
FROM wp_posts p
ORDER BY TITLE_INDEX asc
Might have to tweak it a bit assuming you may have 100+ or a 1000+ numbers as well.