I have a mysql database table with rows like this
id | values
1 | 5,6,8,1,9
2 | 12,22,5,20
3 | 18,55,3,2
I want a help in SELECT statement
To select rows that contain Numbers 1 OR 2
Without selecting rows that contain numbers like 12 or 22
SELECT * FROM test WHERE values REGEXP '/(^[,])?(1)(^[,])?/';
This is the regex you should use: (^|,)[12]($|,)
SELECT * FROM test WHERE values REGEXP '/(^|,)[12]($|,)/';
Related
I have a table with ORC Serde in Athena. The table contains a string column named greeting_message. It can contain null values as well. I want to find how many rows in the table have a particular text as the pattern.
Let's say my sample data looks like below:
|greeting_message |
|-----------------|
|hello world |
|What's up |
| |
|hello Sam |
| |
|hello Ram |
|good morning, hello |
| |
|the above row has null |
| Good morning Sir |
Now for the above table, if we see there are a total of 10 rows. 7 of them are having not null values and 3 of them just has null/empty value.
I want to know what percentage of rows contain a specific word.
For example, consider the word hello. It is present in 4 rows, so the percentage of such rows is 4/10 which is 40 %.
Another example: the word morning is present in 2 messages. So the percentage of such rows is 2/10 which is 20 %.
Note that I am considering null also in the count of the denominator.
SELECT SUM(greeting_message LIKE '%hello%') / COUNT(*) AS hello_percentage,
SUM(greeting_message LIKE '%morning%') / COUNT(*) AS morning_percentage
FROM tablename
The syntax of prestoDB (Amazon Athena engine) is different than MySQL. The following example is creating a temp table WITH greetings AS and then SELECT from that table:
WITH greetings AS
(SELECT 'hello world' as greeting_message UNION ALL
SELECT 'Whats up' UNION ALL
SELECT '' UNION ALL
SELECT 'hello Sam' UNION ALL
SELECT '' UNION ALL
SELECT 'hello Ram' UNION ALL
SELECT 'good morning, hello' UNION ALL
SELECT '' UNION ALL
SELECT 'the above row has null' UNION ALL
SELECT 'Good morning Sir')
SELECT count_if(regexp_like(greeting_message, '.*hello.*')) / cast(COUNT(1) as real) AS hello_percentage,
count_if(regexp_like(greeting_message, '.*morning.*')) / cast(COUNT(1) as real) AS morning_percentage
FROM greetings
will give the following results
hello_percentage
morning_percentage
0.4
0.2
The regex_like function can support many regex options including spaces (\s) and other string matching requirements.
If, I have a string:
'#name#user#user2#laugh#cry'
I would like to print,
name
user
user2
laugh
cry
All the strings are different and have a different number of '#'.
I have tried using Regex but it's not working. What logic has to be applied for this query?
The first thing to say is that storing delimited list of values in text columns is, in many ways, not a good database design. You should basically rework your database structure, or prepare for a potential world of pain.
A quick and dirty solution is to use a numbers table, or an inline suquery, and to cross join it with the table ; REGEXP_SUBSTR() (available in MySQL 8.0), lets you select a given occurence of a particular pattern.
Here is a query that will extract up to 10 values from the column:
SELECT
REGEXP_SUBSTR(t.val, '[^#]+', 1, numbers.n) name
FROM
mytable t
INNER JOIN (
SELECT 1 n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
) numbers
ON REGEXP_SUBSTR(t.val, '[^#]+', 1, numbers.n) IS NOT NULL
Regexp [^#]+ means: as many consecutive characters as possible other than #.
Ths demo on DB Fiddle, when given input string '#name#user#user2#laugh#cry', returns:
| name |
| ----- |
| name |
| user |
| user2 |
| laugh |
| cry |
I have a table with columns. I'm storing numbers in a VARCHAR(245) column. The numbers change all the time. For example, the number can be 42 or 5 or whatever. It can also have multiple numbers, like 42,5,20 and so on.
I want to select if one of the numbers exists and not all. For example, if the numbers are 42,5,20, I want to select if the number 42 exists in the column, or select if the number 4 or the number 5 appear.
I currently have a query that will select only if there's only one number:
SELECT COUNT(*) FROM TABLE WHERE COLUMN1='42' AND COLUMN2='1';
When there are multiple numbers, the query can't find it.
You should be checking with like-wise operators with multiple checks for an exact value, for the value in between, for the value at the end and at the start.
SELECT COUNT(*) FROM TABLE WHERE (COLUMN1='42' OR COLUMN1 LIKE'%,42,%' OR COLUMN1 LIKE'%,42' OR COLUMN1 LIKE '42,%') AND COLUMN2='1';
You can use a regular expression to solve this. If you use the check for word boundaries you will avoid partial matches.
e.g.
mysql> select '42,5,20' REGEXP '[[:<:]]42[[:>:]]' AS 'Found';
+-------+
| Found |
+-------+
| 1 |
+-------+
But doesn't find a partial match
mysql> select '42,5,20' REGEXP '[[:<:]]2[[:>:]]' AS 'Found';
+-------+
| Found |
+-------+
| 0 |
+-------+
This would make your query
SELECT COUNT(*)
FROM TABLE
WHERE COLUMN1 REGEXP '[[:<:]]42[[:>:]]'
AND COLUMN2 = '1';
Consider this SQL table
id | name | numbers
------------------------
1 | bob | 1 3 5
2 | joe | 7 2 15
This query returns the whole table as its result:
SELECT * FROM table WHERE numbers LIKE '%5%'
Is there an SQL operator so that it only returns row 1 (only columns with the number 5)?
Use regexp with word boundaries. (But you should ideally follow Gordon's comment)
where numbers REGEXP '[[:<:]]5[[:>:]]'
It's a pity that you are not using the comma as a separator in your numbers column, because it would be possible to use the FIND_IN_SET function, but you can use it together with REPLACE, like this:
SELECT * FROM table WHERE FIND_IN_SET(5, REPLACE(numbers, ' ', ','));
I want to find rows in table having rows which contains a string
For example, I m having rows in a column names 'atest' in a table named 'testing' -
test
a
cool
another
now I want to select the rows having a word from the string 'this is a test' using a sql
select * from testing where instr(atext, 'this is a test') >0;
but this is not selecting any row.
Reverse the arguments to INSTR.
WHERE INSTR('this is a test', atext)
with full text index -
select * from anti_spam where match (atext) against ("this is a test" in boolean mode);
This is a 'reversed' like:
select * from testing where 'this is a test' LIKE CONCAT('%',atext,'%');
It can be slow on tables having a lot of records.
This returns the rows, where the value of the atext column can be found in the given string.
(for example matches when atext = 'is a t' because it can be found in the given string)
Or you can write a regex.
select * from testing where atext REGEXP '^(this|is|a|test)$';
This matching all rows what contains exactly the specified words.
In your scripting or programming language, you should only replace spaces with | and add ^ to the begining of the string and $ to the ending of the string, and REGEXP, not equation.
("this is a test" -> ^this|is|a|test$ )
If you have a lot of records in the table, this queries can be slow. Because the sql engine does not use indexes in regexp queries.
So if you have a lot of rows on your table and does not have more than 4 000 000 words i recommend to make an indexing table. Example:
originalTable:
tid | atext (text)
1 | this is
2 | a word
3 | a this
4 | this word
5 | a is
....
indexTable:
wid | word (varchar)
1 | this
2 | is
3 | a
4 | word
switchTable:
tid | wid
1 | 1
1 | 2
2 | 3
2 | 4
3 | 1
3 | 3
...
You should set indexes, tid, wid and word fields.
Than the query is:
SELECT o.*
FROM originalTable as o
JOIN switchTable as s ON o.tid = s.tid
JOIN indexTable as i on i.wid=s.wid
WHERE i.word = 'this' or i.word='is' or i.word='a' or i.word='test'
This query can be mutch faster if your originalTable have 'a lot' records, because here the sql engine can make indexed searches. But there is a bit more work when insert a row in the original table you must make insertions in the other two tables.
The result between the runtime of the 3 queries depends on your database table size. And that you want to optimize for insertions or selections. ( the rate between insert/update and select queryes )