Comparing two url-slugs to find count of same words - mysql

I want to find similar posts on my website depending on the url slug.
say i have the following five slugs
i-am-a-slug /*the slug i want to compare*/
i-am-another-slug /* 3 same words */
i-am-an-ant /* 2 same words */
the-slug-life /* 1 same word */
foo-bar /* 0 same words */
at the moment i am using the following code to find out if there are any similar words in the compared slug
SELECT *
FROM News
WHERE News.slug != "i-am-a-slug"
ORDER BY CASE
WHEN News.slug REGEXP "i|am|a|slug" THEN 1
ELSE 2
it doesn't even work very well... words like the a in the example give back hits in nearly every slug i have in my database... in the example, even the slug foo-bar would be returned.
i can't seem to figure out how to select a variable same-words-count that counts all the same words within each tested slug (see comments first code-block for the solutions i would like to get), so i could
SELECT *
FROM News
WHERE News.slug != "i-am-a-slug"
ORDER BY CASE
WHEN same-words-count > 2 THEN 1
WHEN same-words-count = 2 THEN 2
WHEN same-words-count = 1 THEN 3
ELSE 4
or is there an even better way to do this?
thank you very much in advance, sorry my mysql is a bit rusty lately...

What you are looking for is called inverted index: http://en.wikipedia.org/wiki/Inverted_index
Depending on the data amount and what you actually are going to do with the result (do you need to keep it up to date, do you need to show it on the site, etc) you might want to solve the problem using a programming language of your choice or use some full-blown fulltext search solution. Plain SQL just isn't a right tool for this.

Related

What does the "ved" parameter in a google search refer to?

I've spent like two hours or more trying to figure out what a "ved" parameter on a Google search means. A curious person I am.
My finds so far:
$ved value changes-
1 - every different search result (diff keywords)
2 - every different resulted block (the url blocks/boxed on the resulted google search, but they are quite similar, as I'll write down below)
3 - every different geolocation perhaps
Consider these tests or lookups:
1-
Diff keywords, but first block/position in list:
&ved=2ahUKEwidsaSd4M_1AhVlk_0HHUxOCQYQFnoECAsQAg
&ved=2ahUKEwj2pZyN5s_1AhVRmuYKHZ5IB5EQFnoECAcQAg
I thought the "ved" value refers to the block/position of a url in the result list, but no.
2-
Twree different urls, first and second from the 1st and 2nd blocks of first page, then third from a "much farther on the list" block:
ved=2ahUKEwjq1-Wb1s_1AhW6SWwGHZwpBMwQFnoECD8QAQ
ved=2ahUKEwjq1-Wb1s_1AhW6SWwGHZwpBMwQFnoECCAQAQ
ved=2ahUKEwiZ2NDe1s_1AhVaTmwGHThIA5U4PBAWegQIGRAB
The same website url, from different countries (not considering blocks or position in list):
&ved=2ahUKEwiopK2X08_1AhUgxzgGHQEbDkcQFnoECBIQAQ
&ved=2ahUKEwjpueqC1M_1AhWJq3IEHYEDAfc4FBAWegQIDBAB
&ved=2ahUKEwih09Wz08_1AhUY7WEKHQYdBB8QFnoECEIQAQ
Very similar they are.
I'd really love to know what they mean. Any ideas are appreciated too!
I found an interesting article explaining the subject : https://moz.com/blog/inside-googles-ved-parameter
TL;DR:
A ved code contains up to five separate parameters, which each tell you something about the link that was clicked on:
1st (parameter1: Link index) gives you an idea of where the link was on the page.
2nd (parameter2: Link type) is a number that corresponds to the 'type' of the link that was clicked.
3rd (parameter7: Start result position) is the cumulative result position of the first result on the page.
4th (parameter 6: Result position) indicates the position of your page in the search results.
5th (parameter 5: Sub-result position) like the (parameter 6), except it tells you the position in a list of sub-results, such as breadcrumbs, or one-page sitelinks.

Extract text from column in select of MySql query

I have a table named sentEmails where the body column contains the body text of an email.
In the body text, there is a substring like:
some link: <a href="https://somelink#somesite.com/somePage.php?someVar=someVal&sentby=agent">Random link text
Using MySql, I need to extract the url from this column like https://somelink#somesite.com/somePage.php?someVar=someVal&sentby=agent
I was thinking something like the below would work by finding the starting location and returning the next 150 chars, of course it actually just returns the first 150 chars.
SELECT LEFT(body, LOCATE('some link: <a href="', body)+150) AS link
FROM sentEmails
WHERE sent between date_sub(now(),INTERVAL 1 WEEK) and now()
AND body like '%some link:%'
AND toEmail = 'email#gmail.com'
Additional info:
the link will always be preceded by the text some link:
Random link text at the end will change
I can live with getting a bit more of the text than need if I have to, for example, getting https://somelink#somesite.com/somePage.php">Random link text would be acceptable
the text shown above is a substring of the full body column which contains much more text
This isnt something Im going to be doing often. Im researching an issue and I need the links from 40-50 of these rows, Im just hoping to avoid having to pull the link manually from each row.
I can only use MySQL Query Browser to access this DB if I could connect with php, this would be trivial
The url in question, can have 6-25 parameters in it
The url in question will always end with this parameter &sentby=agent
If you had two unique delimiters around the URL, then could just use SUBSTRING() to isolate it. One approach would be to replace the two sides of the URL in the anchor tag with a delimeter:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(
REPLACE(REPLACE(body, '<a href="', '~'), '&sentby=agent">', '&sentby=agent~'), '~', -2),
'~', 1)
FROM sentEmails
WHERE sent BETWEEN DATE_SUB(NOW(), INTERVAL 1 WEEK) AND NOW() AND
body LIKE '%some link:%' AND
toEmail = 'email#gmail.com'
I replaced <a href=" and "> with ~. If ~ does not occur anywhere in the body column, and if you only have one HTML tag in the body, then this should work.
If the body column is just a big chunk of HTML, then you should consider using xpath and handling this in your app layer.
if you're just trying to extract the link out, can you do instr() and mid function. something like this
select mid(body,substr(body,'="'),substr(body,'">')-substr(body,'="')) from email...
substr(body,'="') = starting position of the link =" and substr(body,'">') is the end position of the link.
MID function takes (str,pos, len) and len = end position - starting position
Thanks to Tim's help, I was able to get this working with the below query:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(body, 'some link: <a href="', -1) , 'sentby=agent">', 1) AS link
FROM sentEmails
where sent between date_sub(now(),INTERVAL 1 WEEK) and now()
AND body like '%some link:%'
AND toEmail = 'email#gmail.com'
Doing this kind of search is not convenient. As the table with emails grows in size, the query will be less and less performant.
If this is a new application you're building, you're better with keeping a separate table with the list of URLs used on each sent email. You'd write the URLs to the DB as you send the emails.
The reasoning of this is that the App will do more searches in the DB than sending emails. Therefore, by doing a little extra work when sending emails, you help a lot in the most-expensive usage of the feature, which is the search.
If you still decide to keep the current approach, you'll want to have an index containing the columns (toEmail, sent) in this order.
Other than that, your approach makes sense and will work. Did you actually try it? Does it work for you?

Printing a table with sage math

The assignment is to construct a two-column table that starts at x= -4 and ends with x= 5 with one unit increments between consecutive x values. It should have column headings ‘x’ and ‘f(x)’. I can't find anything helpful on html.table(), which is what we're supposed to use.
This what I have so far. I just have no idea what to put into the html.table function.
x = var('x')
f(x) = (5 * x^2) - (9 * x) + 4
html.table()
You might want to have a look at sage's reference documentation page on html.table
It contains the following valuable information :
table(x, header=False)
Print a nested list as a HTML table. Strings of html will be parsed for math inside dollar and double-dollar signs. 2D graphics will be displayed in the cells. Expressions will be latexed.
INPUT:
x – a list of lists (i.e., a list of table rows)
header – a row of headers. If True, then the first row of the table is taken to be the header.
There is also an example for sin (instead of f) with x in 0..3 instead of -4..5, that you can probably adapt pretty easily :
html.table([(x,sin(x)) for x in [0..3]], header = ["$x$", "$\sin(x)$"])
#Cimbali has a great answer. For completeness, I'll point out that you should be able to get this information with
html.table?
or, in fact,
table?
since I would say we want to advocate the more general table function, which has a lot of good potential for you.

SQL query - Replace/Move some parts of content

I need to update about 2000 records in MySQL
I have a column 'my_content' from table 'my_table' with the folowing value
Title: some title here<br />Author: John Smith<br />Size: 2MB<br />
I have created 3 new columns (my_title, my_author and my_size) and now I need to separate the content of 'my_content' like this
'my_title'
some title here
'my_author'
John Smith
'my_size'
2MB
As you can imagine the title, author and size are always different for each row.
What I'm thinking is to query the following, but I'm not great at SQL queries and I'm not sure what the actually query would look like.
This is what I'm trying to do:
Within 'my_content' find everything that starts with "title:..." and ends with "...<br />au" and move it to 'my_title'
Within 'my_content' find everything that starts with "thor:..." and ends with "...<br />s" and move it to 'my_author'
Within 'my_content' find everything that starts with "ize:..." and ends with "...<br />" and move it to 'my_size'
I just don't know how to write a query to do this.
Once all the content is in the new columns, I can just find and delete the content that's not needed any more, for example 'thor:' , etc.
You can use INSTR to find the index of your delimiters and SUBSTRING to select out the part you want. So, for instance, the author would be
SUBSTR(my_content,
INSTR(my_content, "Author: ") + 8,
INSTR(my_content, "Size: ") - INSTR(my_content, "Author: ") - 8)
You'd need a bit more work to trim the <br/> and any surrounding whitespace.
Please try the below:
SELECT SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',1),LOCATE('Title: ',mycontent)+7) as mytitle,
SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',2),LOCATE('Author: ',mycontent)+8) as myauthor,
SUBSTRING(SUBSTRING_INDEX(mycontent,'<br />',3),LOCATE('Size: ',mycontent)+6) as mysize
FROM mytable;

Optimize 2 mysql queries into one

Using php and mysql 5.x. I currently load a banner image in a certain section of my site like so:
SELECT * FROM banners WHERE section = 1 AND pageid = 2
But if no results found I run a second query:
SELECT * FROM banners WHERE section = 1 AND pageid = 0
Basically what Im doing is trying to find banner images assigned to that section for that page. If no results found then I look for any default banner images in the second query. Is there a better way where I can do this in one query?
EDIT
To clarify a little bit more. I want to check if there is any banners assigned to the page, if not then see if there is any banners assigned to 0 (Default). I dont want a mix of both either it shows all banners assigned to that page or show all banners assigned to pageid 0 and there could be a possibility of multiple rows returned not just one.
ADDITIONAL EDIT
To better explain what this is for. In an admin tool I allow someone to assign a banner image to section on the website. In the admin tool they can select the section and the page they want the banner image to show. They can also set the default banner image(s) for that section. So if there were no banner images assigned to a section by default it will load the banner image(s) assigned to 0 for that section throughout the website. So instead of assigning a default banner image to 50 different pages they can just do it one time and it will load the default banner image or images for that section. Just trying to find a way to do this in a more optimal way, instead of 2 queries could it be done in one?
The OR operator will make the conditional (pageid = 2 OR pageid = 0) return true immediately if just the first value is true and since there is a LIMIT 1 I think it should always fetch one with pageid = 2 first since the order of pageid is DESC and 2 is bigger than 0.
SELECT * FROM banners WHERE section = 1 AND (pageid = 2 OR pageid = 0) ORDER BY pageid DESC LIMIT 1
EDIT: I'm not positive if the ORDER BY is necessary, I'd love to see some comments on that
Maybe it's not the most elegant way, but you can try this:
SELECT *
FROM banners
WHERE section = 1
AND pageid = IF(
(select count(*) from banners where section = 1 and pageid = 2) = 0,
0, 2
);