How to extract strings occurring after a certain character in MySQL? - mysql

If, I have a string:
'#name#user#user2#laugh#cry'
I would like to print,
name
user
user2
laugh
cry
All the strings are different and have a different number of '#'.
I have tried using Regex but it's not working. What logic has to be applied for this query?

The first thing to say is that storing delimited list of values in text columns is, in many ways, not a good database design. You should basically rework your database structure, or prepare for a potential world of pain.
A quick and dirty solution is to use a numbers table, or an inline suquery, and to cross join it with the table ; REGEXP_SUBSTR() (available in MySQL 8.0), lets you select a given occurence of a particular pattern.
Here is a query that will extract up to 10 values from the column:
SELECT
REGEXP_SUBSTR(t.val, '[^#]+', 1, numbers.n) name
FROM
mytable t
INNER JOIN (
SELECT 1 n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
) numbers
ON REGEXP_SUBSTR(t.val, '[^#]+', 1, numbers.n) IS NOT NULL
Regexp [^#]+ means: as many consecutive characters as possible other than #.
Ths demo on DB Fiddle, when given input string '#name#user#user2#laugh#cry', returns:
| name |
| ----- |
| name |
| user |
| user2 |
| laugh |
| cry |

Related

contains function for String in presto Athena

I have a table with ORC Serde in Athena. The table contains a string column named greeting_message. It can contain null values as well. I want to find how many rows in the table have a particular text as the pattern.
Let's say my sample data looks like below:
|greeting_message |
|-----------------|
|hello world |
|What's up |
| |
|hello Sam |
| |
|hello Ram |
|good morning, hello |
| |
|the above row has null |
| Good morning Sir |
Now for the above table, if we see there are a total of 10 rows. 7 of them are having not null values and 3 of them just has null/empty value.
I want to know what percentage of rows contain a specific word.
For example, consider the word hello. It is present in 4 rows, so the percentage of such rows is 4/10 which is 40 %.
Another example: the word morning is present in 2 messages. So the percentage of such rows is 2/10 which is 20 %.
Note that I am considering null also in the count of the denominator.
SELECT SUM(greeting_message LIKE '%hello%') / COUNT(*) AS hello_percentage,
SUM(greeting_message LIKE '%morning%') / COUNT(*) AS morning_percentage
FROM tablename
The syntax of prestoDB (Amazon Athena engine) is different than MySQL. The following example is creating a temp table WITH greetings AS and then SELECT from that table:
WITH greetings AS
(SELECT 'hello world' as greeting_message UNION ALL
SELECT 'Whats up' UNION ALL
SELECT '' UNION ALL
SELECT 'hello Sam' UNION ALL
SELECT '' UNION ALL
SELECT 'hello Ram' UNION ALL
SELECT 'good morning, hello' UNION ALL
SELECT '' UNION ALL
SELECT 'the above row has null' UNION ALL
SELECT 'Good morning Sir')
SELECT count_if(regexp_like(greeting_message, '.*hello.*')) / cast(COUNT(1) as real) AS hello_percentage,
count_if(regexp_like(greeting_message, '.*morning.*')) / cast(COUNT(1) as real) AS morning_percentage
FROM greetings
will give the following results
hello_percentage
morning_percentage
0.4
0.2
The regex_like function can support many regex options including spaces (\s) and other string matching requirements.

Mysql-> Group after rand()

I have the following table in Mysql
Name Age Group
abel 7 A
joe 6 A
Rick 7 A
Diana 5 B
Billy 6 B
Pat 5 B
I want to randomize the rows, but they should still remain grouped by the Group column.
For exmaple i want my result to look something like this.
Name Age Group
joe 6 A
abel 7 A
Rick 7 A
Billy 6 B
Pat 5 B
Diana 5 B
What query should i use to get this result? The entire table should be randomised and then grouped by "Group" column.
What you describe in your question as GROUPing is more correctly described as sorting. This is a particular issue when talking about SQL databases where "GROUP" means something quite different and determines the scope of aggregation operations.
Indeed "group" is a reserved word in SQL, so although mysql and some other SQL databases can work around this, it is a poor choice as an attribute name.
SELECT *
FROM yourtable
ORDER BY `group`
Using random values also has a lot of semantic confusion. A truly random number would have a different value every time it is retrieved - which would make any sorting impossible (and databases do a lot of sorting which is normally invisible to the user). As long as the implementation uses a finite time algorithm such as quicksort that shouldn't be a problem - but a bubble sort would never finish, and a merge sort could get very confused.
There are also degrees of randomness. There are different algorithms for generating random numbers. For encryption it's critical than the random numbers be evenly distributed and completely unpredictable - often these will use hardware events (sometimes even dedicated hardware) but I don't expect you would need that. But do you want the ordering to be repeatable across invocations?
SELECT *
FROM yourtable
ORDER BY `group`, RAND()
...will give different results each time.
OTOH
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(age, name, `group`))
...would give the results always sorted in the same order. While
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(DATE(), age, name, `group`))
...will give different results on different days.
DROP TABLE my_table;
CREATE TABLE my_table
(name VARCHAR(12) NOT NULL
,age INT NOT NULL
,my_group CHAR(1) NOT NULL
);
INSERT INTO my_table VALUES
('Abel',7,'A'),
('Joe',6,'A'),
('Rick',7,'A'),
('Diana',5,'B'),
('Billy',6,'B'),
('Pat',5,'B');
SELECT * FROM my_table ORDER BY my_group,RAND();
+-------+-----+----------+
| name | age | my_group |
+-------+-----+----------+
| Joe | 6 | A |
| Abel | 7 | A |
| Rick | 7 | A |
| Pat | 5 | B |
| Diana | 5 | B |
| Billy | 6 | B |
+-------+-----+----------+
Do the random first then sort by column group.
select Name, Age, Group
from (
select *
FROM yourtable
order by RAND()
) t
order by Group
Try this:
SELECT * FROM table order by Group,rand()

ASCII sum of all the all the characters in column Mysql

I have a table users but i have shown only 2 columns I want to sum all the characters of name column.
+----+-------+
| id | name |
+----+-------+
| 0 | user |
| 1 | admin |
| 3 | edit |
+----+-------+
for example ascii sum of user will be
sum(user)=117+115+101+114=447
i have tired this
SELECT ASCII(Substr(name, 1,1)) + ASCII(Substr(name, 2, 1)) FROM user
but it only sums 2.
You are going to have to fetch one character at a time to do the sum. One method is to write a function with a while loop. You can do this with a SELECT, if you know the longest string:
SELECT name, SUM(ASCII(SUBSTR(name, n, 1)))
FROM user u JOIN
(SELECT 1 as n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 -- sufficient for your examples
) n
ON LENGTH(name) <= n.n
GROUP BY name;
If your goal is to turn the string as something that can be easily compared or a fixed length, then you might consider the encryption functions in MySQL. Adding up the ASCII values is not a particularly good hash function (because strings with the same characters in different orders produce the same value). At the very least, multiplying each ASCII value by the position is a bit better.

Generating "Fake" Records Within A Query

I have a very basic statement, e.g.:
SELECT pet, animal_type, number_of_legs
FROM table
However, where table currently is, I want to insert some fake data, along the lines of:
rufus cat 3
franklin turtle 1
norm dog 5
Is it possible to "generate" these fake records, associating each value with the corresponding field, from within a query so that they are returned as the result of the query?
SELECT pet, animal_type, number_of_legs FROM table
union select 'rufus', 'cat', 3
union select 'franklin', 'turtle', 1
union select 'norm', 'dog', 5
This gives you the content of table plus the 3 records you want, avoiding duplicates, if duplicates are OK, then replace union with union all
edit: per your comment, for tsql, you can do:
select top 110 'franklin', 'turtle', 1
from sysobjects a, sysobjects b -- this cross join gives n^2 records
Be sure to chose a table where n^2 is greater than the needed records or cross join again and again
I'm not entirely sure what you're trying to do, but MySQL is perfectly capable of selecting "mock" data and printing it in a table:
SELECT "Rufus" AS "Name", "Cat" as "Animal", "3" as "Number of Legs"
UNION
SELECT "Franklin", "Turtle", "1"
UNION
SELECT "Norm", "Dog", "5";
Which would result in:
+----------+--------+----------------+
| Name | Animal | Number of Legs |
+----------+--------+----------------+
| Rufus | Cat | 3 |
| Franklin | Turtle | 1 |
| Norm | Dog | 5 |
+----------+--------+----------------+
Doing this query this way prevents actually having to save information in a temporary table, but I'm not sure if it's the correct way of doing things.

SQL NOT IN [list of ids] (performance)

I'm just wondering if the amount of id's in a list will influence query performance.
query example:
SELECT * FROM foos WHERE foos.ID NOT IN (2, 4, 5, 6, 7)
Where (2, 4, 5, 6, 7) is an indefinitely long list.
And how many is too many (in context of order)?
UPDATE: The reason why i'm asking it because i have two db. On of it (read-only) is the source of items and another one contain items that is processed by operator. Every time when operator asking for new item from read-only db I want to exclude item that is already processed.
Yes, the amount of IDs in the list will impact performance. A network packet is only so big, for example, and the database has to parse all that noise and turn it into a series of:
WHERE foo.ID <> 2
AND foo.ID <> 4
AND foo.ID <> 5
AND ...
You should consider other ways to let your query know about this set.
Here is wacky rewrite of that query that might perform a little better
SELECT * FROM foos
LEFT JOIN
(
SELECT 2 id UNION
SELECT 4 UNION
SELECT 5 UNION
SELECT 6 UNION
SELECT 7
) NOT_IDS
USING (id) WHERE NOT_IDS.id IS NULL;
The NOT_IDS subquery does work as shown by the following:
mysql> SELECT * FROM
-> (
-> SELECT 2 id UNION
-> SELECT 4 UNION
-> SELECT 5 UNION
-> SELECT 6 UNION
-> SELECT 7
-> ) NOT_IDS;
+----+
| id |
+----+
| 2 |
| 4 |
| 5 |
| 6 |
| 7 |
+----+
5 rows in set (0.00 sec)
mysql>
Just for fun, and given your update, I'm going to suggest a different strategy:
You could join across tables like so ...
insert into db1.foos (cols)
select cols
from db2.foos src
left join db1.foos dst
on src.pk = dst.pk
where dst.othercolumn is null
I'm not sure how the optimizer will handle this or if it's going to be faster (depends on your indexing strategy, I guess) than what you're doing.
The db's are in the same server? If yes you can make a multi-db query with a left join and take the null ones. (here an example: Querying multiple databases at once ) . Otherwise you can make a stored procedure, pass the id's with a string, and split them inside with a regular expression. I have a similar problem, but within an in-memory db and a postgres db. Luckly my situation is (In...)