Search comma separated string in Column t-sql - sql-server-2008

In a property mgt system, I'm saving buyers based on their preferences. Say a person interested in houses which has more than 2 & less than 4. So I saved it as 2,3,4. Please see the attachment.
When searching, say someone searching the buyers who are interested in houses which has more than 2, how should i write the select statement to check the bedroom column.
If someone search buyers who are interested in houses which has more than 2 bathrooms; what could be the select statement?

I still think a min/max is the much better table structure, but if you can't change it, try this.
First, let's come up with some base rules. If any of these is violated, then the final answer will need modification (and will probably be more complicated).
If a string contains + anywhere in it, the maximum is infinity.
The + can only occur at the end of the string.
The list will always be comma separated integers.
The smallest number will always be first in the list.
The largest number will always be last in the list.
If all those are true, then after a LOT of work, I think I have something you can use. The basic idea was to come up with a pair of functions that will get the min/max values out of your strings. Once you have these functions, you can use them in WHERE clauses.
See this SQL Fiddle for starters. Function definitions on the left, a sample query to give you the gist of how they work on the right.
CREATE FUNCTION dbo.list_min(#list_str AS VARCHAR(MAX))
RETURNS INT
WITH RETURNS NULL ON NULL INPUT
AS BEGIN
DECLARE #comma_index INT;
SET #comma_index = CHARINDEX(',', #list_str);
DECLARE #result INT;
IF (0 < #comma_index)
SET #result = CONVERT(INT, LEFT(#list_str, #comma_index - 1));
ELSE
SET #result = CONVERT(INT, REPLACE(#list_str, '+', ''));
RETURN #result;
END;
and
CREATE FUNCTION dbo.list_max(#list_str AS VARCHAR(MAX))
RETURNS INT
WITH RETURNS NULL ON NULL INPUT
AS BEGIN
IF (#list_str LIKE '%+')
RETURN 2147483647; -- Max INT
DECLARE #comma_index INT;
SET #comma_index = CHARINDEX(',', REVERSE(#list_str));
DECLARE #result INT;
IF (0 < #comma_index)
SET #result = CONVERT(INT, RIGHT(#list_str, #comma_index - 1));
ELSE
SET #result = CONVERT(INT, #list_str);
RETURN #result;
END;
(If anyone can think of a way to get rid of that ridiculous result variable, please let me know. I was getting errors about the last statement "must be a return statement" when putting the return inside IF/ELSE, and I couldn't get a CASE syntax working.)
With these in hand, you can do queries like this:
SELECT *
FROM stuff
WHERE dbo.list_min(carspace) <= 2 AND 2 <= dbo.list_max(carspace)
which will only select your second row. (SQL Fiddle of this query.)
A third function you might find useful is one that gives you the max in the list but ignores the +. To do that, it's essentially the list_max function, but without the IF block that checks for +. The get that functionality, you might want to just remove the + check from list_max, and create another function that checks for + and calls list_max if there's no +.
I'm not sure about the performance characteristics here. I imagine they aren't great. You might want to consider some function based indexing if you have a large amount of data to search through.
Good luck. Hope this helps.

Wouldn't it make more sense to store a min and max and then query with <= and >=?

Your table design is not optimal. You can simply store the number of bedrooms as an integer. Instead of "1,2,3,4", you would be looking at just 4 in that case.
To answer the particular question though, you can do the replace trick to count the number of commas in the column as such:
SELECT * FROM myTable WHERE LEN(col) - LEN(REPLACE(col, ',','')) >= someNumber

Related

In MySQL is there some way to calculate average with a list of numbers?

I would like to do something like the following:
SELECT AVG([1,2,3]);
This would of course return 2. The closest I've got is the following which is less than ideal but can be put in one line:
drop table nums;
create temporary table nums (num int);
insert into nums(num) values(1),(2),(3);
select avg(num) from nums;
If there's a way I would assume this would also be possible with other functions such as variance() and others.
Edit: This idea is out of curiosity not a real problem I need to solve.
AVG can only have 1 argument. You'd need to do SELECT AVG(num) FROM nums
You could also do SELECT SUM(num) / COUNT(num) FROM nums
Just know since you're dividing using ints that it will not be precise.
You are using the wrong tools to solve your problem.
If you want to calculate the variance of a list, use some kind of a scripting language, be it Php, Python, etc. If you want to firstly store the data and only then calculate the variance, of course, use something like MySql.
I also think that MySQL may not be the right tool (since you didn't want to store the numbers), but the answer to your question is: yes, you can do it with MySQL without creating tables if you want.
I didn't find any built-in functions/structures for this, but I came up with the following solution. My idea is to create a custom function which accepts the numbers in a delimited string, then it splits the string and calculates the average of the numbers. Here's an implementation which works with integers. The input should be num,num,num and so on, it should end with a number (see the examples at the end).
DROP FUNCTION IF EXISTS AVGS;
DELIMITER $$
CREATE FUNCTION AVGS(s LONGTEXT) RETURNS DOUBLE
DETERMINISTIC
BEGIN
DECLARE sum BIGINT DEFAULT 0;
DECLARE count BIGINT DEFAULT 0;
DECLARE pos BIGINT DEFAULT 0;
DECLARE lft TEXT DEFAULT '';
-- can we split?
SET pos = LOCATE(',', s);
WHILE 0 < pos DO -- while we can split
SET lft = LEFT(s, pos - 1); -- get the first number
SET s = SUBSTR(s FROM pos + 1); -- store the rest
SET sum = sum + CAST(lft AS SIGNED);
SET count = count + 1;
SET pos = LOCATE(',', s); -- split again
END WHILE;
-- handle last number
SET sum = sum + CAST(s AS SIGNED);
SET count = count + 1;
RETURN sum / count;
END $$
DELIMITER ;
SELECT AVGS("1"); -- prints: 1
SELECT AVGS("1,2"); -- prints: 1.5
SELECT AVGS("1,2,3"); -- prints: 2
See the live working demo here.
Variance may be much more complex, but I hope you get the idea.

Max of Max on MYSQL

I'm making an Acyclic Graph database.
TABLE Material (id_item,id_collection,...)
PRIMARY KEY(id_item,id_collection)
(item can be collection itself, item can be collection of collection)
My constraint is id_collection > id_item (to prevent some cycle - 1st step)
So before inserting i need to know "Max(Max(id_item), Max(id_collection))"
I can get the 2 values them by doing. But can't get max of this :
SELECT max(id_collection)
FROM material
UNION
SELECT max(id_item)
FROM Material
I tried to do that aswell :
DELIMITER $$
CREATE PROCEDURE `findmax`
(
)
BEGIN
DECLARE max_item SMALLINT;
DECLARE max_collection SMALLINT;
DECLARE max_of_both SMALLINT;
SELECT MAX(id_item)
INTO max_item
FROM material
SELECT MAX(id_collection)
INTO max_collection
FROM material
SET max_of_both = MAX(max_item, max_collection)
END$$
DELIMITER ;
I'm running out of Gas. Anyone got an idea plz?
Best regards,
Falt
N.B. 2 useful sources about acyclic graph :
Database Soup : Trigger prevent cycles in PostgreSQL
CodeProject : Acyclic Graph Modelisation
You should be able to use the GREATEST() function in MySQL.
Try this:
SELECT GREATEST(
(SELECT MAX(id_item) FROM material),
(SELECT MAX(id_collection) FROM material));
This will select the largest item, whether that's the MAX(id_item) or MAX(id_collection).
EDIT
Something that may look a little cleaner. The GREATEST() function takes the largest of the parameters that it is passed, so if you used it by itself it would return however many rows are in the table, but selecting the id_item or id_collection, which ever is larger. That being said, you can wrap GREATEST inside of MAX() to achieve the same task:
SELECT MAX(GREATEST(id_item, id_collection))
FROM material;
Here is an SQL Fiddle example with both.

Save data from table into a variable and use it inside a function (make a data set)

Basically I want to make a data set like in PHP, where I can store the return of a select statement in a variable and then use it to do logical decisions.
here is what I am trying:
DROP FUNCTION cc_get_balance(date);
CREATE OR REPLACE FUNCTION cc_get_balance(theDate date) RETURNS TABLE(balance numeric(20,10), rate numeric(20,10), final_balance numeric(20,10)) AS $$
DEClARE
currency1_to_EUR numeric(20,10);
currency2_to_EUR numeric(20,10);
table_ret record;
BEGIN
currency1_to_EUR := (SELECT rate FROM cc_getbalancesfordatewitheurs(theDate) WHERE from_currency = 'currency1' AND to_currency = 'EUR');
currency2_to_EUR := (SELECT rate FROM cc_getbalancesfordatewitheurs(theDate) WHERE from_currency = 'currency2' AND to_currency = 'EUR');
SELECT * INTO table_ret FROM cc_getbalancesfordatewitheurs(theDate);
END;
$$ LANGUAGE 'plpgsql';
SELECT * FROM cc_get_balance('2014-02-15'::date);
I don't know if this is right. I want to be able to use table_ret as a data set like:
select * from table_ret ...
So I don't have to make a lot queries to the database. I have looked for examples doing this and have not found anything like what I need or want to do.
the version is 9.3.4, cc_getbalancesfordatewitheurs() returns a table with columns from_currency, to_currency, rate, exchange, balance and converted_amount. with around 30 rows. I need to run through the to_currency column and run some other conversions based on the currency list in the column. So I did not want to have to query the database 30 times for the conversions. All the data I need is collected together in the table returned by cc_getbalancesfordatewitheurs().
cc_get_balance() should return all the rows found in the table from the other function along with a column that does a final conversion of the to_currency into EUR
Generally, there is not "table variable". You could use a cursor or a temporary table.
Better yet, use the implicit cursor of a FOR loop.
Even better, still, if possible, do it all in a single set-based operation. A query.
Related example:
Cursor based records in PostgreSQL

Counting occurrences of a word in a single row

I have a search query that is able to sort results by relevance according to how many of the words from the query actually show up.
SELECT id,
thesis
FROM activity p
WHERE p.discriminator = 'opinion'
AND ( thesis LIKE '%gun%'
OR thesis LIKE '%crucial%' )
ORDER BY ( ( CASE
WHEN thesis LIKE '%gun%' THEN 1
ELSE 0
end )
+ ( CASE
WHEN thesis LIKE '%crucial%' THEN 1
ELSE 0
end ) )
DESC
This query however, does not sort according to how many times 'gun' or 'crucial' show up. I want to make it so records with more occurrences of 'gun' show up above records with less occurrences. (I.E, add a point for every time gun shows up rather than adding a point because gun shows up at least once)
I might be wrong but without use of stored procedures or UDF You won't be able to count string occurrences. Here's sample stored function that counts substrings:
drop function if exists str_count;
delimiter |
create function str_count(sub varchar(255), str varchar(255)) RETURNS INTEGER
DETERMINISTIC NO SQL
BEGIN
DECLARE count INT;
DECLARE cur INT;
SET count = 0;
SET cur = 0;
REPEAT
SET cur = LOCATE(sub, str, cur+1);
SET count = count + (cur > 0);
UNTIL (cur = 0)
END REPEAT;
RETURN(count);
END|
You might want to change varchar(255) to varchar(65536) or TEXT. You can now use it in order by query:
SELECT id,
thesis
FROM activity p
WHERE p.discriminator = 'opinion'
AND ( thesis LIKE '%gun%'
OR thesis LIKE '%crucial%' )
ORDER BY STR_COUNT('gun',thesis) + STR_COUNT('crucial', thesis)
If Your dataset is large and performance is important for You I suggest to write custom UDF in C.
Depending on how your database is set up, you may find MySQL's full text indexing to be a better fit for your use case. It allows you to index fields and search for words in them, ordering the results by relevance related to the number of occurrences.
See the documentation here: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
This is a useful question that gives some examples, and may help: How can I manipulate MySQL fulltext search relevance to make one field more 'valuable' than another?
Finally, if full text searches aren't an option for you, the comment posted by Andrew Hanna on the string functions reference may do the trick: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html (search the page for "Andrew Hanna"). They create a function on the server which can count the number of times a string occurs.
Hope this helps.

Using SQL LIKE and IN together

Is there a way to use LIKE and IN together?
I want to achieve something like this.
SELECT * FROM tablename WHERE column IN ('M510%', 'M615%', 'M515%', 'M612%');
So basically I want to be able to match the column with a bunch of different strings. Is there another way to do this with one query or will I have to loop over the array of strings I am looking for?
How about using a substring with IN.
select * from tablename where substring(column,1,4) IN ('M510','M615','M515','M612')
You can do it by in one query by stringing together the individual LIKEs with ORs:
SELECT * FROM tablename
WHERE column LIKE 'M510%'
OR column LIKE 'M615%'
OR column LIKE 'M515%'
OR column LIKE 'M612%';
Just be aware that things like LIKE and per-row functions don't always scale that well. If your table is likely to grow large, you may want to consider adding another column to your table to store the first four characters of the field independently.
This duplicates data but you can guarantee it stays consistent by using insert and update triggers. Then put an index on that new column and your queries become:
SELECT * FROM tablename WHERE newcolumn IN ('M510','M615','M515','M612');
This moves the cost-of-calculation to the point where it's necessary (when the data changes), not every single time you read it. In fact, you could go even further and have your new column as a boolean indicating that it was one of the four special types (if that group of specials will change infrequently). Then the query would be an even faster:
SELECT * FROM tablename WHERE is_special = 1;
This tradeoff of storage requirement for speed is a useful trick for larger databases - generally, disk space is cheap, CPU grunt is precious, and data is read far more often than written. By moving the cost-of-calculation to the write stage, you amortise the cost across all the reads.
You'll need to use multiple LIKE terms, joined by OR.
Use the longer version of IN which is a bunch of OR.
SELECT * FROM tablename
WHERE column LIKE 'M510%'
OR column LIKE 'M615%'
OR column LIKE 'M515%'
OR column LIKE 'M612%';
SELECT * FROM tablename
WHERE column IN
(select column from tablename
where column like 'M510%'
or column like 'M615%'
OR column like 'M515%'
or column like'M612%'
)
substr([column name],
[desired starting position (numeric)],
[# characters to include (numeric)]) in ([complete as usual])
Example
substr([column name],1,4) in ('M510','M615', 'M515', 'M612')
I tried another way
Say the table has values
1 M510
2 M615
3 M515
4 M612
5 M510MM
6 M615NN
7 M515OO
8 M612PP
9 A
10 B
11 C
12 D
Here cols 1 to 8 are valid while the rest of them are invalid
SELECT COL_VAL
FROM SO_LIKE_TABLE SLT
WHERE (SELECT DECODE(SUM(CASE
WHEN INSTR(SLT.COL_VAL, COLUMN_VALUE) > 0 THEN
1
ELSE
0
END),
0,
'FALSE',
'TRUE')
FROM TABLE(SYS.DBMS_DEBUG_VC2COLl('M510', 'M615', 'M515', 'M612'))) =
'TRUE'
What I have done is using the INSTR function, I have tried to find is the value in table matches with any of the values as input. In case it does, it will return it's index, i.e. greater than ZERO. In case the table's value does not match with any of the input, then it will return ZERO. This index I have added up, to indicate successful match.
It seems to be working.
Hope it helps.
You can use a sub-query with wildcards:
SELECT 'Valid Expression'
WHERE 'Source Column' LIKE (SELECT '%Column' --FROM TABLE)
Or you can use a single string:
SELECT 'Valid Expression'
WHERE 'Source Column' LIKE ('%Source%' + '%Column%')
u can even try this
Function
CREATE FUNCTION [dbo].[fn_Split](#text varchar(8000), #delimiter varchar(20))
RETURNS #Strings TABLE
(
position int IDENTITY PRIMARY KEY,
value varchar(8000)
)
AS
BEGIN
DECLARE #index int
SET #index = -1
WHILE (LEN(#text) > 0)
BEGIN
SET #index = CHARINDEX(#delimiter , #text)
IF (#index = 0) AND (LEN(#text) > 0)
BEGIN
INSERT INTO #Strings VALUES (#text)
BREAK
END
IF (#index > 1)
BEGIN
INSERT INTO #Strings VALUES (LEFT(#text, #index - 1))
SET #text = RIGHT(#text, (LEN(#text) - #index))
END
ELSE
SET #text = RIGHT(#text, (LEN(#text) - #index))
END
RETURN
END
Query
select * from my_table inner join (select value from fn_split('M510', 'M615', 'M515', 'M612',','))
as split_table on my_table.column_name like '%'+split_table.value+'%';
For a perfectly dynamic solution, this is achievable by combining a cursor and a temp table. With this solution you do not need to know the starting position nor the length, and it is expandable without having to add any OR's to your SQL query.
For this example, let's say you want to select the ID, Details & creation date from a table where a certain list of text is inside 'Details'.
First create a table FilterTable with the search strings in a column called Search.
As the question starter requested:
insert into [DATABASE].dbo.FilterTable
select 'M510' union
select 'M615' union
select 'M515' union
select 'M612'
Then you can filter your data as following:
DECLARE #DATA NVARCHAR(MAX)
CREATE TABLE #Result (ID uniqueIdentifier, Details nvarchar(MAX), Created datetime)
DECLARE DataCursor CURSOR local forward_only FOR
SELECT '%' + Search + '%'
FROM [DATABASE].dbo.FilterTable
OPEN DataCursor
FETCH NEXT FROM DataCursor INTO #DATA
WHILE ##FETCH_STATUS = 0
BEGIN
insert into #Result
select ID, Details, Created
from [DATABASE].dbo.Table (nolock)
where Details like #DATA
FETCH NEXT FROM DataCursor INTO #DATA
END
CLOSE DataCursor
DEALLOCATE DataCursor
select * from #Result
drop table #Result
Hope this helped
select *
from tablename
where regexp_like (column, '^M510|M615|^M515|^M612')
Note: This works even if say, we want the code M615 to match if it occurs in the middle of the column. The rest of the codes will match only if the column starts with it.