I want to find rows in table having rows which contains a string
For example, I m having rows in a column names 'atest' in a table named 'testing' -
test
a
cool
another
now I want to select the rows having a word from the string 'this is a test' using a sql
select * from testing where instr(atext, 'this is a test') >0;
but this is not selecting any row.
Reverse the arguments to INSTR.
WHERE INSTR('this is a test', atext)
with full text index -
select * from anti_spam where match (atext) against ("this is a test" in boolean mode);
This is a 'reversed' like:
select * from testing where 'this is a test' LIKE CONCAT('%',atext,'%');
It can be slow on tables having a lot of records.
This returns the rows, where the value of the atext column can be found in the given string.
(for example matches when atext = 'is a t' because it can be found in the given string)
Or you can write a regex.
select * from testing where atext REGEXP '^(this|is|a|test)$';
This matching all rows what contains exactly the specified words.
In your scripting or programming language, you should only replace spaces with | and add ^ to the begining of the string and $ to the ending of the string, and REGEXP, not equation.
("this is a test" -> ^this|is|a|test$ )
If you have a lot of records in the table, this queries can be slow. Because the sql engine does not use indexes in regexp queries.
So if you have a lot of rows on your table and does not have more than 4 000 000 words i recommend to make an indexing table. Example:
originalTable:
tid | atext (text)
1 | this is
2 | a word
3 | a this
4 | this word
5 | a is
....
indexTable:
wid | word (varchar)
1 | this
2 | is
3 | a
4 | word
switchTable:
tid | wid
1 | 1
1 | 2
2 | 3
2 | 4
3 | 1
3 | 3
...
You should set indexes, tid, wid and word fields.
Than the query is:
SELECT o.*
FROM originalTable as o
JOIN switchTable as s ON o.tid = s.tid
JOIN indexTable as i on i.wid=s.wid
WHERE i.word = 'this' or i.word='is' or i.word='a' or i.word='test'
This query can be mutch faster if your originalTable have 'a lot' records, because here the sql engine can make indexed searches. But there is a bit more work when insert a row in the original table you must make insertions in the other two tables.
The result between the runtime of the 3 queries depends on your database table size. And that you want to optimize for insertions or selections. ( the rate between insert/update and select queryes )
Related
I have a table with ORC Serde in Athena. The table contains a string column named greeting_message. It can contain null values as well. I want to find how many rows in the table have a particular text as the pattern.
Let's say my sample data looks like below:
|greeting_message |
|-----------------|
|hello world |
|What's up |
| |
|hello Sam |
| |
|hello Ram |
|good morning, hello |
| |
|the above row has null |
| Good morning Sir |
Now for the above table, if we see there are a total of 10 rows. 7 of them are having not null values and 3 of them just has null/empty value.
I want to know what percentage of rows contain a specific word.
For example, consider the word hello. It is present in 4 rows, so the percentage of such rows is 4/10 which is 40 %.
Another example: the word morning is present in 2 messages. So the percentage of such rows is 2/10 which is 20 %.
Note that I am considering null also in the count of the denominator.
SELECT SUM(greeting_message LIKE '%hello%') / COUNT(*) AS hello_percentage,
SUM(greeting_message LIKE '%morning%') / COUNT(*) AS morning_percentage
FROM tablename
The syntax of prestoDB (Amazon Athena engine) is different than MySQL. The following example is creating a temp table WITH greetings AS and then SELECT from that table:
WITH greetings AS
(SELECT 'hello world' as greeting_message UNION ALL
SELECT 'Whats up' UNION ALL
SELECT '' UNION ALL
SELECT 'hello Sam' UNION ALL
SELECT '' UNION ALL
SELECT 'hello Ram' UNION ALL
SELECT 'good morning, hello' UNION ALL
SELECT '' UNION ALL
SELECT 'the above row has null' UNION ALL
SELECT 'Good morning Sir')
SELECT count_if(regexp_like(greeting_message, '.*hello.*')) / cast(COUNT(1) as real) AS hello_percentage,
count_if(regexp_like(greeting_message, '.*morning.*')) / cast(COUNT(1) as real) AS morning_percentage
FROM greetings
will give the following results
hello_percentage
morning_percentage
0.4
0.2
The regex_like function can support many regex options including spaces (\s) and other string matching requirements.
Let's say we have:
table 1
a (int) | b (int)
--------|--------
1 | 4
2 | 4
table 2
c (text) d (text)
---------|---------
hoi | hi
Query:
SELECT * FROM table1
UNION
SELECT * FROM table2
yields
a | b
------|--------
1 | 4
2 | 4
hoi | hi
At least, from the query I just ran on mysql
I'd expect (1, 4, NULL, NULL). Why doesn't this give an error?
UNION just appends the rows of one query to the rows of the other. As long as the two queries return the same number of columns, there's no error. The column names always come from the fist query. If the datatypes are different, it finds a common type that they can all be converted to; in your example, it converts the int columns to text (MySQL is loose about this, some other databases require that you use explicit CAST() calls to get everything to the same type).
Since your queries each return two columns, the result contains two columns, using the column names from table1.
This is a bit long for a comment.
I just tested this on MySQL 8.0 and SQLite and it returns:
a b
1 4
2 4
hoi hi
I find this surprising. I would expect the columns to be given an integer type and for there to be either a type conversion error or 0 for the third row. Well, actually the SQLite results isn't that strange, because types are much more fungible in SQLite.
SQL Server and Postgres give errors that I would expect -- type conversion errors that cause the query to fail.
Consider this SQL table
id | name | numbers
------------------------
1 | bob | 1 3 5
2 | joe | 7 2 15
This query returns the whole table as its result:
SELECT * FROM table WHERE numbers LIKE '%5%'
Is there an SQL operator so that it only returns row 1 (only columns with the number 5)?
Use regexp with word boundaries. (But you should ideally follow Gordon's comment)
where numbers REGEXP '[[:<:]]5[[:>:]]'
It's a pity that you are not using the comma as a separator in your numbers column, because it would be possible to use the FIND_IN_SET function, but you can use it together with REPLACE, like this:
SELECT * FROM table WHERE FIND_IN_SET(5, REPLACE(numbers, ' ', ','));
I have a user table containing a column(say interests) with comma separated interest ids as a value.
e.g.
user interests
A 12,13,15
B 10,11,12,15
C 9,13
D 10,12
Now, I have a string with comma separated values as "13,15".
I want to fetch the users who has the interest 13,15 from above table means it should return the user A, B & C as user A contains both interest(13,15), user B matched the interest with 15 & user matched the interest with 13.
what will be the SQL as I have a lots of users in my table.
It can be done with regexp as #1000111 said, but with more complicated regexp. Look at this, for example:
(^|,)(13|15)(,|$)
This will not match 13 from 135, or 1 from 13 and so on. For example, for number 13 this will match next strings:
1,13,2
13,1,2
1,13
13,2
13
But will not match these
1,135,2
131,2
1,113
And this is query:
SET #search = '13,15';
SELECT *
FROM test
WHERE interests REGEXP CONCAT('(^|,)(', REPLACE(#search, ',', '|'), ')(,|$)')
If you want to get the result based on loose matching then you can follow this query:
Loose matching means interests like 135,151 would also appear while searching for '13,15'.
SET #inputInterest := "13,15";
SELECT
*
FROM userinterests
WHERE interests REGEXP REPLACE(#inputInterest,',','|');
For the given data you will get an output like below:
| ID | user | interests |
|----|------|-------------|
| 1 | A | 12,13,15 |
| 2 | B | 10,11,12,15 |
| 3 | C | 9,13 |
SQL FIDDLE DEMO
EDIT:
If you want result based on having at least one of the interests exactly then you can use regex as #Andrew mentioned in this answer:
Here's I've modified my query based on his insight:
SET #inputInterest := "13,15";
SELECT
*
FROM userinterests
WHERE interests REGEXP CONCAT('(^|,)(', REPLACE(#inputInterest, ',', '|'), ')(,|$)')
SEE DEMO OF IT
Note:
You need to replace the #inputInterest variable by your input string.
Suggestion:
Is storing a delimited list in a database column really that bad?
is it possible to always respect an expected number of element constraint by filling the remaining of a SQL dataset with previous written data, keeping the data insertion in order? Using MySQL?
Edit
In a web store, I always want to show n elements. I update the show elements every w seconds and I want to loop indefinitely.
By example, using table myTable:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
Something like
SELECT id FROM myTable WHERE id > 3 ORDER BY id ALWAYS_RETURN_THIS_NUMBER_OF_ELEMENTS 5
would actually return (where ALWAYS_RETURN_THIS_NUMBER_OF_ELEMENTS doesn't exist)
+----+
| id |
+----+
| 4 |
| 5 |
| 4 |
| 5 |
| 4 |
+----+
This is a very strange need. Here is a method:
select id
from (SELECT id
FROM myTable
WHERE id > 3
ORDER BY id
LIMIT 5
) t cross join
(select 1 as n union all select 2 union all select 3 union all select 4 union all select 5
) n
order by n.n, id
limit 5;
You may need to extend the list of numbers in n to be sure you have enough rows for the final limit.
No, that's not what LIMIT does. The LIMIT clause is applied as the last step in the statement execution, after aggregation, after the HAVING clause, and after ordering.
I can't fathom a use case that would require the type of functionality you describe.
FOLLOWUP
The query that Gordon Linoff provided will return the specified result, as long as there is at least one row in myTable that satisfies the predicate. Otherwise, it will return zero rows.
Here's the EXPLAIN output for Gordon's query:
id select_type table type key rows Extra
-- ------------ ---------------- ----- ------- ---- -------------------------------
1 PRIMARY <derived2> ALL 5 Using temporary; Using filesort
1 PRIMARY <derived3> ALL 5 Using join buffer
3 DERIVED No tables used
4 UNION No tables used
5 UNION No tables used
6 UNION No tables used
7 UNION No tables used
UNION RESULT <union3,4,5,6,7> ALL
2 DERIVED myTable range PRIMARY 10 Using where; Using index
Here's the EXPLAIN output for the original query:
id select_type table type key rows Extra
-- ----------- ----------------- ----- ------- ---- -------------------------------
1 SIMPLE myTable range PRIMARY 10 Using where; Using index
It just seems like it would be a whole lot more efficient to reprocess the resultset from the original query, if that resultset contains fewer than five (and more than zero) rows. (When that number of rows goes from 5 to 1,000 or 150,000, it would be even stranger.)
The code to get multiple copies of rows from a resultset is quite simple: fetch the rows, and if the end of the result set is reached before you've fetched five (or N) rows, then just reset the row pointer back to the first row, so the next fetch will return the first row again. In PHP using mysqli, for example, you could use:
$result->data_seek(0);
Or, for those still using the deprecated mysql_ interface:
mysql_data_seek($result,0);
But if you're returning only five rows, it's likely you aren't even looping through the result at all, and you already stuffed all the rows into an array. Just loop back through the beginning of the array.
For MySQL interfaces that don't support a scrollable cursor, we'd just store the whole resultset and process it multiple times. With Perl DBI, using the fetchall_arrayref, with JDBC (which is going to store the whole result set in memory anyway without special settings on the connection), we'd store the resultset as an object.
Bottom line, squeezing this requirement (to produce a resultset of exactly five rows) back to the database server, and pulling back duplicate copies of a row and/or storing duplicate copies of a row in memory just seems like the wrong way to satisfy the use case. (If there's rationale for storing duplicate copies of a row in memory, then that can be achieved without pulling duplicate copies of rows back from the database.)
It's just very odd that you say you're using/implementing a "circular buffer", but that you choose not to "circle" back around to the beginning of a resultset which contains fewer than five rows, and instead need to have MySQL return you duplicate rows. Just very, very strange.