Search with relevance ranking using containstable and freetext - sql-server-2008

I've read that you can rank the result from a search using containstable along with contains and freetext under SQL 2008 server. I've just recently used freetext for the first time. Free text loops through the words separately and compares to the indexed column. I want to be able to search for phrases first and then single words.
Let's say the description column is indexed. I'm using a stored procedure query like this:
SELECT id, description, item from table where (FREETEXT(description,#strsearch))
Example if 3 rowsets contains words with apples in them and I search for 'apple cake', the row-set with id2 should be first, then the other two should follow:
id1 apple pie 4/01/2012
id2 apple cake 2/29/2011
id3 candy apple 5/9/2011
Example if 4 rowsets contains words with food in them and I search for 'fast food restaurant', the row-set with id3 should be first, followed by id1 (not an exact match but because it has 'fast food' in the column), then the other two should follow:
id1 McDonalds fast food
id2 healthy food
id3 fast food restaurant
id4 Italian restaurant

Does this article help?
MSDN : Limiting Ranked Result Sets (Full-Text Search)
It implies, in part, that using an additional parameter will allow you to limit the result to the ones with the greatest relevance (which you can influence using WEIGHT) and also order by that relevance (RANK).
top_n_by_rank is an integer value, n, that specifies that only the n
highest ranked matches are to be returned, in descending order.
The doc doesn't have an example for FREETEXT; it only references CONTAINSTABLE. But it definitely implies that CONTAINSTABLE outputs a RANK column that you could use to ORDER BY.
I don't know if there is any way to enforce your own definition of relevance. It may make sense to pull out the top 10 relevant matches according to FTS, then apply your own ranking on the output, e.g. you can split up the search terms using a function, and order by how many of the words matched. For simplicity and easy repro in the following example I am not using Full-Text in the subquery but you can replace it with whatever you're actually doing. First create the function:
IF OBJECT_ID('dbo.SplitStrings') IS NOT NULL
DROP FUNCTION dbo.SplitStrings;
GO
CREATE FUNCTION dbo.SplitStrings(#List NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN ( SELECT Item FROM
( SELECT Item = x.i.value('(./text())[1]', 'nvarchar(max)')
FROM ( SELECT [XML] = CONVERT(XML, '<i>'
+ REPLACE(#List, ' ', '</i><i>') + '</i>').query('.')
) AS a CROSS APPLY [XML].nodes('i') AS x(i) ) AS y
WHERE Item IS NOT NULL
);
GO
Then a simple script that shows how to perform the matching:
DECLARE #foo TABLE
(
id INT,
[description] NVARCHAR(450)
);
INSERT #foo VALUES
(1,N'McDonalds fast food'),
(2,N'healthy food'),
(3,N'fast food restaurant'),
(4,N'Italian restaurant'),
(5,N'Spike''s Junkyard Dogs');
DECLARE #searchstring NVARCHAR(255) = N'fast food restaurant';
SELECT x.id, x.[description]--, MatchCount = COUNT(s.Item)
FROM
(
SELECT f.id, f.[description]
FROM #foo AS f
-- pretend this actually does full-text search:
--where (FREETEXT(description,#strsearch))
-- and ignore how I actually matched:
INNER JOIN dbo.SplitStrings(#searchstring) AS s
ON CHARINDEX(s.Item, f.[description]) > 0
GROUP BY f.id, f.[description]
) AS x
INNER JOIN dbo.SplitStrings(#searchstring) AS s
ON CHARINDEX(s.Item, x.[description]) > 0
GROUP BY x.id, x.[description]
ORDER BY COUNT(s.Item) DESC, [description];
Results:
id description
-- -----------
3 fast food restaurant
1 McDonalds fast food
2 healthy food
4 Italian restaurant

Related

SQL: is there a single query to get "all rows with X, or if none, all rows with Y"?

I'm wondering how to get "all rows where col='X'; if there are none, all rows where col='Y'"
Simplfied database;
CREATE TABLE CHARACTER_NAMES(CHARACTER_ID, LANG VARCHAR(3), NAME);
INSERT INTO CHARACTER_NAMES(1, "ENG", "DONALD DUCK");
INSERT INTO CHARACTER_NAMES(1, "ENG", "GOOD OL' DONALD");
INSERT INTO CHARACTER_NAMES(1, "SWE", "KALLE ANKA");
INSERT INTO CHARACTER_NAMES(1, "SWE", "KALLEN");
INSERT INTO CHARACTER_NAMES(2, "ENG", "MICKEY MOUSE");
INSERT INTO CHARACTER_NAMES(2, "SWE", "MUSSE PIGG");
INSERT INTO CHARACTER_NAMES(2, "SWE", "MUSEN");
INSERT INTO CHARACTER_NAMES(3, "ENG", "GOOFY");
INSERT INTO CHARACTER_NAMES(3, "NOR", "FEDTMULE");
(It's a bit forced that the characters have several names in the same language, but that's how the real database looks like. Also, "CHARACTER_ID" is also a foreign key to the CHARACTER table, but that's not part of the problem, so omitted.)
The user has a language setting, and when there is a database query for a specific character, the query should return the names in the selected language, or the names in English, if the selected language has no results. In the above example, if the setting was "Swedish" and the users selected character 3 (Goofy) the name search should return "Goofy", as there is no Swedish name registered. If the user selected Mickey Mouse, the search should return 2 rows: "Musse Pigg" and "Musen".
I wonder if this is possible to express in an SQL query.
If I just wanted the FIRST in the selected language, if none, english, I could use:
SELECT NAME
FROM CHARACTER_NAMES
WHERE CHARACTER_ID=?
ORDER BY CASE
WHEN LANG='NOR' THEN 1
WHEN LANG='ENG' THEN 2
END
LIMIT 1;
But as I can't know how many names there will be in the selected language, I have to let this LIMIT vary, and I don't really know how to do that in a nice and proper way.
I'm wondering how to get "all rows where col='X'; if there are none, all rows where col='Y'"
SELECT *
FROM table
WHERE col='X'
UNION ALL
SELECT *
FROM table
WHERE col='Y'
AND NOT EXISTS ( SELECT NULL
FROM table
WHERE col='X' )
If a row(s) with col='X' exists then WHERE EXISTS will give FALSE and 2nd subquery will return nothing - i.e. only output of 1st subquery only will be returned.
And backward, if there is no row with col='X' then 1st subquery won't return rows, but WHERE EXISTS will give TRUE and 2nd subquery will return all rows with col='Y' - i.e. only output of 2nd subquery only will be returned.
Or you may use
SELECT *
FROM table
WHERE col = CASE WHEN EXISTS ( SELECT NULL
FROM table
WHERE col='X' )
THEN 'X'
ELSE 'Y'
END;
There are more variants, of course...
In MySQL 8 you can use CASE expression with RANK to get the desired result:
WITH cte AS (
SELECT *, RANK() OVER (PARTITION BY character_id ORDER BY CASE
WHEN lang = 'NOR' THEN 1
WHEN lang = 'ENG' THEN 2
ELSE 3
END) AS rnk
FROM character_names
)
SELECT *
FROM cte
WHERE rnk = 1
Identical result could be achieved with NOT EXISTS:
SELECT *
FROM character_names AS t1
WHERE lang = 'NOR'
OR lang = 'ENG' AND NOT EXISTS (
SELECT *
FROM character_names AS t2
WHERE t2.character_id = t1.character_id
AND t2.lang = 'NOR'
)
Result:
character_id
lang
name
rnk
1
ENG
DONALD DUCK
1
1
ENG
GOOD OL' DONALD
1
2
ENG
MICKEY MOUSE
1
3
NOR
FEDTMULE
1

What is the proper MySQL way to take data from 4 rows, 1 column, and separate into 9 columns?

I've studied and tried days worth of SQL queries to find "something" that will work. I have a table, apj32_facileforms_subrecords, that uses 7 columns. All the data I want to display is in 1 column - "value". The "record" displays the number of the entry. The "title" is what I would like to appear in the header row, but that's not as important as "value" to display in 1 row based upon "record" number.
I've tried a lot of CONCAT and various Pivot queries, but nothing seems to do more than "get close" to what I'd like as the end result.
Here's a screen shot of the table:
The output "should" be linear, so that 1 row contains 9 columns:
Project; Zipcode; First Name; Last Name; Address; City; Phone; E-mail; Trade (in that order). And the values in the 9 columns come from "value" as they relate to the "record" number.
I know there are LOT of examples that are similar, but nothing I've found covers taking all the values from "value" and CONCAT to 1 row.
This works to get all the data I want - SELECT record,value FROM apj32_facileforms_subrecords WHERE (record IN (record,value)) ORDER BY record
But the values are still in multiple rows. I can play with that query to get just the values, but I'm still at a loss to get them into 1 row. I'll keep playing with that query to see if I can figure it out before one of the experts here shows me how simple it is to do that.
Any help would be appreciated.
Using SQL to flatten an EAV model representation into a relational representation can be somewhat convoluted, and not very efficient.
Two commonly used approaches are conditional aggregation and correlated subqueries in the SELECT list. Both approaches call out for careful indexing for suitable performance with large sets.
correlated subqueries example
Here's an example of the correlated subquery approach, to get one value of the "zipcode" attribute for some records
SELECT r.id
, ( SELECT v1.value
FROM `apj32_facileforms_subrecords` v1
WHERE v1.record = r.id
AND v1.name = 'zipcode'
ORDER BY v1.value LIMIT 0,1
) AS `Zipcode`
FROM ( SELECT 1 AS id ) r
Extending that, we repeat the correlated subquery, changing the attribute identifier ('firstname' in place of 'zipcode'. looks like we we could also reference it by element, e.g. v2.element = 2
SELECT r.id
, ( SELECT v1.value
FROM `apj32_facileforms_subrecords` v1
WHERE v1.record = r.id
AND v1.name = 'zipcode'
ORDER BY v1.value LIMIT 0,1
) AS `Zipcode`
, ( SELECT v2.value
FROM `apj32_facileforms_subrecords` v2
WHERE v2.record = r.id
AND v2.name = 'firstname'
ORDER BY v2.value LIMIT 0,1
) AS `First Name`
, ( SELECT v3.value
FROM `apj32_facileforms_subrecords` v3
WHERE v3.record = r.id
AND v3.name = 'lastname'
ORDER BY v3.value LIMIT 0,1
) AS `Last Name`
FROM ( SELECT 1 AS id UNION ALL SELECT 2 ) r
returns something like
id Zipcode First Name Last Name
-- ------- ---------- ---------
1 98228 David Bacon
2 98228 David Bacon
conditional aggregation approach example
We can use GROUP BY to collapse multiple rows into one row per entity, and use conditional tests in expressions to "pick out" attribute values with aggregate functions.
SELECT r.id
, MIN(IF(v.name = 'zipcode' ,v.value,NULL)) AS `Zip Code`
, MIN(IF(v.name = 'firstname' ,v.value,NULL)) AS `First Name`
, MIN(IF(v.name = 'lastname' ,v.value,NULL)) AS `Last Name`
FROM ( SELECT 1 AS id UNION ALL SELECT 2 ) r
LEFT
JOIN `apj32_facileforms_subrecords` v
ON v.record = r.id
GROUP
BY r.id
For more portable syntax, we can replace MySQL IF() function with more ANSI standard CASE expression, e.g.
, MIN(CASE v.name WHEN 'zipcode' THEN v.value END) AS `Zip Code`
Note that MySQL does not support SQL Server PIVOT syntax, or Oracle MODEL syntax, or Postgres CROSSTAB or FILTER syntax.
To extend either of these approaches to be dynamic, to return a resultset with a variable number of columns, and variety of column names ... that is not possible in the context of a single SQL statement. We could separately execute SQL statements to retrieve information, that would allow us to dynamically construct a SQL statement of a form show above, with an explicit set of columns to be returned.
The approaches outline above return a more traditional relational model, (individual columns each with a value).
non-relational munge of attributes and values into a single string
If we have some special delimiters, we could munge together a representation of the data using GROUP_CONCAT function
As a rudimentary example:
SELECT r.id
, GROUP_CONCAT(v.title,'=',v.value ORDER BY v.name) AS vals
FROM ( SELECT 1 AS id ) r
LEFT
JOIN `apj32_facileforms_subrecords` v
ON v.record = r.id
AND v.name in ('zipcode','firstname','lastname')
GROUP
BY r.id
To return two columns, something like
id vals
-- ---------------------------------------------------
1 First Name=David,Last Name=Bacon,Zip Code=98228
We need to be aware that the return from GROUP_CONCAT is limited to group_concat_max_len bytes. And here we have just squeezed the balloon, moving the problem to some later processing, to parse the resulting string. If we have any equal signs or commas that appear in the values, it's going to make a mess of parsing the result string. So we will have to properly escape any delimiters that appear in the data, so that GROUP_CONCAT expression is going to get more involved.

MySQL lookup based on round(1 + rand() * x) produces NULL and multiple results

I'm trying to select first names from a lookup table at random in MySQL to build a test dataset. I have a table with 200 first names, genders and a row id going from 1 to 200. Something like this:
id firstname gender
1 Aaron m
2 Adam m
3 Alan m
etc...
I'm selecting from this table using a random generator with the following query:
SELECT id, firstname FROM firstname WHERE id = round(1 + (rand() * 199));
I am expecting the random number to tally up with exactly one id from the lookup table, thus producing a single results like
id firstname
43 Jason
Running the code again and again instead gives me a selection of
single rows (as above)
or multiple rows like
id firstname
29 Ethan
147 Jean
or no results (just NULL in both fields).
If I run the random generator on its own, it will always generate a number between 1 and 200. As you can see below, the id field is INT, and the query behaves the same way if I cast the result as SIGNED. I have also tried to use FLOOR instead of ROUND, just to see if that worked any differently - alas, no.
Can anyone tell my why the anomaly? What am I missing?
Here is some code to create the first 20 rows of the original table for testing purposes:
-- First Name --
drop table if exists firstname;
CREATE TABLE firstname (
id INT NOT NULL,
firstname VARCHAR(20) NOT NULL,
gender VARCHAR(1) NOT NULL,
PRIMARY KEY (id),
UNIQUE (firstname)
);
INSERT INTO firstname
(id,firstname,gender)
VALUES
(1,"Aaron","m"),
(2,"Adam","m"),
(3,"Alan","m"),
(4,"Albert","m"),
(5,"Alexander","m"),
(6,"Andrew","m"),
(7,"Anthony","m"),
(8,"Arthur","m"),
(9,"Austin","m"),
(10,"Benjamin","m"),
(11,"Billy","m"),
(12,"Bobby","m"),
(13,"Brandon","m"),
(14,"Brian","m"),
(15,"Bruce","m"),
(16,"Bryan","m"),
(17,"Carl","m"),
(18,"Charles","m"),
(19,"Christian","m"),
(20,"Christopher","m");
Since RAND() is not deterministic, the WHERE condition is evaluated/executed once per each row. Thus each row has a chance of 1/199 to be selected. You can use a subquery in the FROM clause (derived table) instead to generate exactly one random number:
SELECT f.id, f.firstname
FROM firstname f
JOIN (SELECT floor(rand()*200)+1 as rnd) r ON r.rnd = f.id

Searching for data in SQL

Please take a look at the following table:
I am building a search engine which returns card_id values, based on search of category_id and value_id values.
To better explain the search mechanism, imagine that we are trying to find a car (card_id) by supplying information what part (value_id) the car should has in every category (category_id).
In example, we may want to find a car (card_id), where category "Fuel Type" (category_id) has a value "Diesel" (value_id), and category "Gearbox" (category_id) has a value "Manual" (value_id).
My problem is that my knowledge is not sufficient to build a query, which will returns card_ids which contains more than one pair of category_id and value_id.
For example, if I want to search a car with diesel engine, I could build a query like this:
SELECT card_id FROM cars WHERE category_id=1 AND value_id=2
where category_id = 1 is a category "Fuel Type" and value_id = 2 is "Diesel".
My question is, how can I build a query, which will look for more category-value pairs? For example, I want to look for diesel cars with manual gearbox.
Any help will be very appreciated. Thank you in advance.
You can do this using aggregation and a having clause:
SELECT card_id
FROM cars
GROUP BY card_id
HAVING SUM(category_id = 1 AND value_id = 2) > 0 AND
SUM(category_id = 3 and value_id = 43) > 0;
Each condition in the having clause counts the number of rows that match a given condition. You can add as many conditions as you like. The first, for instance, says that there is at least one row where the category is 1 and the value is 2.
SQL Fiddle
Another approach is to create a user defined function that takes a table of attribute/value pairs and returns a table of matching cars. This has the advantage of allowing an arbitrary number of attribute/value pairs without resorting to dynamic SQL.
--Declare a "sample" table for proof of concept, replace this with your real data table
DECLARE #T TABLE(PID int, Attr Int, Val int)
--Populate the data table
INSERT INTO #T(PID , Attr , Val) VALUES (1,1,1), (1,3,5),(1,7,9),(2,1,2),(2,3,5),(2,7,9),(3,1,1),(3,3,5), (3,7,9)
--Declare this as a User Defined Table Type, the function would take this as an input
DECLARE #C TABLE(Attr Int, Val int)
--This would be populated by the code that calls the function
INSERT INTO #C (Attr , Val) VALUES (1,1),(7,9)
--The function (or stored procedure) body begins here
--Get a list of IDs for which there is not a requested attribute that doesn't have a matching value for that ID
SELECT DISTINCT PID
FROM #T as T
WHERE NOT EXISTS (SELECT C.ATTR FROM #C as C
WHERE NOT EXISTS (SELECT * FROM #T as I
WHERE I.Attr = C.Attr and I.Val = C.Val and I.PID = T.PID ))

How to find the next record after a specified one in SQL?

I'd like to use a single SQL query (in MySQL) to find the record which comes after one that I specify.
I.e., if the table has:
id, fruit
-- -----
1 apples
2 pears
3 oranges
I'd like to be able to do a query like:
SELECT * FROM table where previous_record has id=1 order by id;
(clearly that's not real SQL syntax, I'm just using pseudo-SQL to illustrate what I'm trying to achieve)
which would return:
2, pears
My current solution is just to fetch all the records, and look through them in PHP, but that's slower than I'd like. Is there a quicker way to do it?
I'd be happy with something that returned two rows -- i.e. the one with the specified value and the following row.
EDIT: Sorry, my question was badly worded. Unfortunately, my definition of "next" is not based on ID, but on alphabetical order of fruit name. Hence, my example above is wrong, and should return oranges, as it comes alphabetically next after apples. Is there a way to do the comparison on strings instead of ids?
After the question's edit and the simplification below, we can change it to
SELECT id FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
SELECT * FROM table WHERE id > 1 ORDER BY id LIMIT 1
Even simpler
UPDATE:
SELECT * FROM table WHERE fruit > 'apples' ORDER BY fruit LIMIT 1
So simple, and no gymnastics required
Select * from Table
where id =
(Select Max(id) from Table
where id < #Id)
or, based on the string #fruitName = 'apples', or 'oranges' etc...
Select * from Table
where id =
(Select Max(id) from Table
where id < (Select id from Table
Where fruit = #fruitName))
I'm not familiar with the MySQL syntax, but with SQL Server you can do something with "top", for example:
SELECT TOP 1 * FROM table WHERE id > 1 ORDER BY id;
This assumes that the id field is unique. If it is not unique (say, a foreign key), you can do something similar and then join back against the same table.
Since I don't use MySQL, I am not sure of the syntax, but would imagine it to be similar.
Unless you specify a sort order, I don't believe the concepts of "previous" or "next" are available to you in SQL. You aren't guaranteed a particular order by the RDBMS by default. If you can sort by some column into ascending or descending order that's another matter.
This should work. The string 'apples' will need to be a parameter.
Fill in that parameter with a string, and this query will return the entire record for the first fruit after that item, in alphabetical order.
Unlike the LIMIT 1 approach, this should be platform-independent.
--STEP THREE: Get the full record w/the ID we found in step 2
select *
from
fruits fr
,(
--STEP TWO: Get the ID # of the name we found in step 1
select
min(vendor_id) min_id
from
fruits fr1
,(
--STEP ONE: Get the next name after "apples"
select min(name) next_name
from fruits frx
where frx.name > 'apples'
) minval
where fr1.name = minval.next_name
) x
where fr.vendor_id = x.min_id;
The equivalent to the LIMIT 1 approach in Oracle (just for reference) would be this:
select *
from
(
select *
from fruits frx
where frx.name > 'apples'
order by name
)
where rownum = 1
I don't know MySQL SQL but I still try
select n.id
from fruit n
, fruit p
where n.id = p.id + 1;
edit:
select n.id, n.fruitname
from fruits n
, fruits p
where n.id = p.id + 1;
edit two:
Jason Lepack has said that that doesn't work when there are gaps and that is true and I should read the question better.
I should have used analytics to sort the results on fruitname
select id
, fruitname
, lead(id) over (order by fruitname) id_next
, lead(fruitname) over (order by fruitname) fruitname_next
from fruits;
If you are using MS SQL Server 2008 (not sure if available for previous versions)...
In the event that you are trying to find the next record and you do not have a unique ID to reference in an applicable manner, try using ROW_NUMBER(). See this link
Depending on how savvy your T-SQL skill is, you can create row numbers based on your sorting order. Then you can find more than just the previous and next record. Utilize it in views or sub-queries to find another record relative to the current record's row number.
SELECT cur.id as id, nxt.id as nextId, prev.id as prevId FROM video as cur
LEFT JOIN video as nxt ON nxt.id > cur.id
LEFT JOIN video as prev ON prev.id < cur.id
WHERE cur.id = 12
ORDER BY prev.id DESC, nxt.id ASC
LIMIT 1
If you want the item with previous and next item this query lets you do just that.
This also allows You to have gaps in the data!
How about this:
Select * from table where id = 1 + 1