Find matching substrings within table row using regex

Find matching substrings within table row using regex - mysql

I have two table columns, one with an id and the other with the webpage content storing href links. I would like to write an SQL query using regex that finds all href links within the table row and strips all other characters. Currently stuck with the code below.
SELECT id,web_data FROM web_data_table WHERE web_data REGEXP 'href'
Current output:
+----+----------------------------------------------------------------+
| id | web_data |
+----+----------------------------------------------------------------+
| 1 | random txt,href="link1" |
| 2 | random txt, random txt, href="link2", href="link3", random txt |
+----+----------------------------------------------------------------+
Desired output:
+----+---------------------------+
| id | web_data |
+----+---------------------------+
| 1 | href="link1" |
| 2 | href="link2" href="link3" |
+----+---------------------------+

Related

Mysql - BIGINT value is out of range in error using substring_index

select substring_index(SUBSTRING_INDEX(title, ' ', title+1), ' ',-1) as word ,
COUNT(*) AS counter
from feed_collections
group by word
ORDER BY counter DESC;
The table has 1785123 rows and I thing this is the problem.
This is the error query (1690): BIGINT value is out of range in '(feed_collections.title + 1)' and I don't know how to fix it.
The query worked until around 1500000 rows.
The table contains 3 columns: title(text), url(text), date(datetime).
The code is finding most common words in column title
Example:
Table
+----------------------------------+-----------------+
| title | url |
+----------------------------------+-----------------+
| the world of ukraine | www.ab |
| count the days until christmas | www.abc.com |
| EU and NATO wants to use bombs | www.abcd.com |
| Ukraine needs help from NATO | www.abce.com |
+----------------------------------+-----------------+
Result
+------+-------+
| word | total |
+------+-------+
| nato | 5 |
| of | 14 |
| and | 11 |
| To | 9 |
| that | 7 |
| ukraine | 2 |
| EU | 1 |
+------+-------+
I adapted the code from here:
How to find most popular word occurrences in MySQL?
This works with small data. Seems to be a problem when tries to filter large data.
What I'm trying to achive in the future is to find most common words in the title column grouped by 1,2,3,4,5,6,7 words.
It will exists a select box to select how many words to use.
Example:
I will select to find most common words with 4 words.
Title: 1. Nato is using force , 2. Eu and Nato is using force.
Results with 4 words:
'nato is using force' found 2 times in title.
Any idea how to fix or how to do a query for this?
I'm working with laravel, a solution would be to create a php method...

Populating a table with the results of a FREETEXT search

I am new to text searching etc and would appreciate any help or guidance with the following problem.
I have set up a Full-Text catalog with specific search words (about 500).
This is an example of my tables
id.............| Course...........| cat
===============|==================|====
1..............| ACE..............| 2
2..............| CCE..............| 3
3..............| CCFP.............| 2
4..............| GIAC.............| 2
5..............| CDFE.............| 3
6..............| CFCE.............| 1
I have a second table that I store descriptions and documents in:
id.....| Descr........| Document
=======|==============|=========
1......| Advert.......| html
2......| Book.........| html
3......| Report.......| html
4......| Report.......| html
5......| Book.........| html
6......| Report.......| html
The Document field is currently a blob and stores both pdf, docx and html files in. (I can change it if necessary.)
How would I get the FREETEXT search to search for the words in the catalog which are in the Document field and place the results in a separate table like this:
DocID |DocTerm | Status |CatTermID| Descr
======|========|=============|=========|================
1 | CHFI | Notfound | |
2 | CCFP | Exact Match | 3 |
3 | ACE | Exact Match | 1 |
3 | ACEF | Notfound | |
1 | CDFE | Exact Match | 5 |
3 | ACE | Notfound | 1 |
I would really appreciate for your suggestions
Thanks

Select a record n times which n = times of occurrences

I want to select a record n times in which n is the number of times a string has occurred in a field.
Example:
mytable:
+--------+------------------------------------+
| id | content |
+--------+------------------------------------+
| 1 | This string contains two strings. |
| 2 | This is a string. |
| 3 | This does not contain our keyword. |
+--------+------------------------------------+
Now I want the result of such a hypothetical query to be like the following result:
/* hypothetical: this won't yield the desired result obviously */
SELECT * FROM mytable WHERE content LIKE "%string%";
+--------+------------------------------------+
| id | content |
+--------+------------------------------------+
| 1 | This string contains two strings. |
| 1 | This string contains two strings. |
| 2 | This is a string. |
+--------+------------------------------------+
Is this even possible?
Thanks

Optimize SQL-Query that is using REGEXP in a JOIN

I have the following situation:
Table Words:
| ID | WORD |
|----|--------|
| 1 | us |
| 2 | to |
| 3 | belong |
| 4 | are |
| 5 | base |
| 6 | your |
| 7 | all |
| 8 | is |
| 9 | yours |
Table Sentence:
| ID | SENTENCE |
|----|-------------------------------------------|
| 1 | <<7>> <<6>> <<5>> <<4>> <<3>> <<2>> <<1>> |
| 2 | <<7>> <<8>> <<9>> |
And i want to replace the <<(\d)>> with the equivalent word from the Word-Table.
So the result should be
| ID | SENTENCE |
|----|--------------------------------|
| 1 | all your base are belong to us |
| 2 | all is yours |
What i came up with is the following SQL-Code:
SELECT id, GROUP_CONCAT(word ORDER BY pos SEPARATOR ' ') AS sentence FROM (
SELECT sentence.id, words.word, LOCATE(words.id, sentence.sentence) AS pos
FROM sentence
LEFT JOIN words
ON (sentence.sentence REGEXP CONCAT('<<',words.id,'>>'))
) AS TEMP
GROUP BY id
I made a sqlfiddle for this:
http://sqlfiddle.com/#!2/634b8/4
The code basically is working, but i'd like to ask you pros if there is a way without a derived table or without filesort in the execution plan.

You should make a table with one entry per word, so your sentense (sic) can be made by joining on that table. It would look something like this
SentenceId, wordId, location
2, 7, 1
2, 8, 2
2, 9, 3
They way you have it set up, you are not taking advantage of your database, basically putting several points of data in 1 table-field.
The location field (it is tempting to call it "order", but as this is an SQL keyword, don't do it, you'll hate yourself) can be used to 'sort' the sentence.
(and you might want to rename sentense to sentence?)

get first 3 alphanumeric characters (only numbers or letters)

I have a table which holds a field, title, I need to get first 3 alphanumeric characters of each title. Some of the values of title have ",',\t,\n, or whitespace prepended - this should be ignored.
+--------+-----------------------------------------+---------------------+
| id | title | desired output |
+--------+-----------------------------------------+---------------------+
| 1 | "abcd" | abc |
| 2 | 'lostworld | los |
| 3 | \tsonof | son |
| 4 | 12amrt | 12a |
+--------+-----------------------------------------+---------------------+
desired output is the output I am looking for. If anyone can suggest generic query which can handle all cases that would be great.
Looking for solution using MySQL only.

Your best bet is to use a regex user-defined function.
The built-in regexp functions only support matching; not string replacing like you want here

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Find matching substrings within table row using regex - mysql

Related

Mysql - BIGINT value is out of range in error using substring_index

Populating a table with the results of a FREETEXT search

Select a record n times which n = times of occurrences

Optimize SQL-Query that is using REGEXP in a JOIN

get first 3 alphanumeric characters (only numbers or letters)

Categories

Resources