How to select all distinct filename extensions from table of filenames? - mysql

I have a table of ~20k filenames. How do I select a list of the distinct extensions? A filename extension can be considered the case insensitive string after the last .

You can use substring_index:
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table
-1 means it will start searching for the '.' from the right side.

there is A very cool and powerful capability in MySQL and other databases is the ability to incorporate regular expression syntax when selecting data example
SELECT something FROM table WHERE column REGEXP 'regexp'
see this http://www.tech-recipes.com/rx/484/use-regular-expressions-in-mysql-select-statements/
so you can write pattern to select what you want.

The answer given by #bnvdarklord is right but it would include file names which does not have extensions as well in result set, so if you want only extension patterns use below query.
SELECT DISTINCT substring_index(column_containing_file_names,'.',-1) FROM table where column_containing_file_names like '%.%';

Related

Excluding records using regex

I'm trying to get exclude the email_id that has any name and end with either #abcd.in or #abcd.live and only include the email's having mobile numbers, but not sure if this is the correct regex I'm using, can you help?
the statement I'm using to filter is below
(NOT(lower(`table`.`user_email`) like '[a-z].*#Abcd.in$'|'[a-z].*#Abcd.live$')
If you want to do filtering based on a regular expression, you should be using REGEXP or REGEXP_LIKE (both are synonyms). Assuming you just want to exclude the two domains mentioned, you could use:
SELECT *
FROM yourTable
WHERE email NOT REGEXP '[a-z]+#Abcd\.(in|live)$';
Assuming you wanted to enhance the above by also whitelisting certain email patterns, you could make another call to REGEXP.
I'll probably do something like this:
SELECT *
FROM mytable
WHERE SUBSTRING_INDEX(user_email,'#',-1) IN ('Abcd.live','Abcd.in')
AND SUBSTRING_INDEX(user_email,'#',1) REGEXP '[0-9]'
Using SUBSTRING_INDEX() to separate the email name and domain by using # as delimiter. The first condition is simply just filtering the domain with IN so other than the ones being defined, it will be omitted. Then the second condition is using REGEXP to check if numerical values are present in the email name.
Demo fiddle

Select rows in SQL partially matching an input input

I would like to select rows in my table (I'm using Google Sheet for that purpose) which content is included in the string.
For example, rows included in table called Jobportal, column Test:
How to find work
Work permit
Jobs
Temporary jobs
I want to select all the rows that contain any word of my input, so if I write "i'm looking for a job", I need to select rows Jobs and Temporary jobs. If I write "where is my work?", I need to select How to find work and Work permit.
I've tried this query, but it's returning wrong/unexpected results.
select * from Jobportal where 'im looking for a job' LIKE CONCAT('%',Test,'%');
You can use regular expressions. Assuming that what the user types does not have special characters:
where test regexp replace('im looking for a job', ' ', '|')
That said, for performance you might want to consider using full text search capabilities.

passing a variable to SQL statement in KNIME

Using KNIME, I would like to analyze data in a specific subset of columns in my database
but without using limiting SQL queries such as
Select *
From table
Where name like 'PAIN%'
Is there a way to do this in KNIME?
Try to find specific value within the column of choice by using:
Select distinct(column_name) from table;
You can pick from the expected result to filter your data
Select * from table column_name like 'result_one';
Assuming the column_name data type is in character.
To filer columns use the "Column Filter" node. You can filter the columns specifically, by RegEx on the column name or by column type (int, double, etc.) To filter rows based on content, use the "Row Filter" node, and select column to test and "filter based on collection elements" using pattern matching. This can also use RegEx. For mulitple columns use multiple nodes.
the knime did not support like for now, so I used the mysql locate or FIND_IN_SET function
SELECT id FROM address where LOCATE($street_Arr[0]$,street) > 0
SELECT id FROM address where FIND_IN_SET($street_Arr[0]$,street) > 0
however in the same situation u might be able to use knime joins much faster.

MySQL: Select regex group [duplicate]

How to reference to a group using a regex in MySQL?
I tried:
REGEXP '^(.)\1$'
but it does not work.
How to do this?
(Old question, but top search result)
For MySQL 8:
SELECT REGEXP_REPLACE('stackoverflow','(.{5})(.*)','$2$1');
-- "overflowstack"
You can create capture groups with (), and you can refer to them using $1, $2, etc.
For MariaDB, capturing is done in REGEXP_REPLACE with \\1, \\2, etc. respectively.
You can't, there is no way to reference regex capturing groups in MySql.
You can solve this problem by nesting the function calls in your query. Say you have this string in your column:
'100 SOME ST,THE VILLAGES,FL 32163,USA'
and you want to capture the city name. A Capture Group like this would work if MySQL supported it (but it doesn't):
'^[0-9A-Z\s]+,\s*([a-zA-Z\s]*)'
You CAN nest function calls to strip off the part you don't want, and then grab the part you DO want like this:
SELECT REGEXP_SUBSTR(REGEXP_REPLACE(column_name, '^[0-9\\sA-Z]+,', ''), '^[0-9\\sA-Z]+') FROM table_name;
THE VILLAGES
...

MySQL regex only returns a single row

I have been writing a REGEX in MySQL to identify those domains that have a .com TLD. The URLs are usually of the form
http://example.com/
The regex I came up with looks like this:
REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'
The reason we match the :// is so that we don't pick up URLs such as http://example.com/error.com/wrong.com
Therefore my query is
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com[[./.]]'"
However, this is returning only a single row when it should really be returning many more (upwards of a thousand). What mistake am I making with the query?
Not sure if that's the problem, but it should be [[:alnum:]], not [:alnum:]
Your current query only matches names that end with .com/ rather than .com followed by anything that starts with a slash. Try the following:
SELECT DISTINCT name
FROM table
WHERE name REGEXP '[[.colon.]][[.slash.]][[.slash.]]([:alnum:]+)[[...]]com([[./.]].*)?'"
It might be clearer to split the URL rather than regexing it
SELECT DISTINCT name FROM table
WHERE SUBSTRING_INDEX((SUBSTRING_INDEX(name,'/',3),'.',-1)='com';