Relevancy MySQL query to replace stored procedure - mysql

I've taken over a project built by my predecessor. That project contains a stored procedure originally taken from here:
https://stackoverflow.com/a/9100182/439925
Currently this is preventing us from updating MySQL, so i have been attempting to remove it and replace the calls to subStringCount() with an adjustment to the top answer on that question.
(( Round (( Char_length(`title`) - Char_length(REPLACE(`title`, 'info', "")) ) / Char_length('info')) * 30 )) AS `title_score`,
The queries are used to count the number of times a search string occurs in a number of fields and then order by the total. Unfortunately i can't get the new query results to match the old one.
The 2 full queries are as follows:
Old WITH the stored Proc:
SELECT SQL_CALC_FOUND_ROWS `temp`.*,
( `title_score` + `source_score`
+ `abstract_score` + `authors_score`
+ `drugs_score` + `uploader_score`
+ `area_score`
+ Ifnull(`document_content_score`, 0) ) AS
`relevance`
FROM (SELECT `kb_uploads`.*,
(( Substringcount(`title`, 'info') * 30 )) AS `title_score`,
(( Substringcount(`source`, 'info') * 15 )) AS`source_score`,
(( Substringcount(`abstract`, 'info') * 20 ) AS`abstract_score`,
(( Substringcount(`authors`, 'info') * 30 )) AS `authors_score`,
(( Substringcount(`drugs`, 'info') * 20 )) AS `drugs_score`,
(( Substringcount(`kb_users`.`name`, 'info') * 20 )) AS `uploader_score`,
(( Substringcount(`kb_upload_areas`.`name`, 'info') * 20 )) AS `area_score`,
( `content_tbl`.`index_score` * 1 ) AS `document_content_score`
FROM `kb_uploads`
LEFT JOIN `kb_users`
ON `kb_users`.`id` = `kb_uploads`.`uploader`
LEFT JOIN `kb_upload_areas`
ON `kb_upload_areas`.`id` = `kb_uploads`.`area`
LEFT JOIN (SELECT `upload`,
Sum(`weighting`) AS `index_score`
FROM `kb_search_index`
WHERE `word` = 'info'
GROUP BY `upload`) AS `content_tbl`
ON `content_tbl`.`upload` = `kb_uploads`.`id`) AS `temp`
WHERE `is_deleted` = 0
HAVING `relevance` > 0
ORDER BY `relevance` DESC
LIMIT 10 OFFSET 0
New WITHOUT the stored Proc:
SELECT SQL_CALC_FOUND_ROWS `temp`.*,
( `title_score` + `source_score`
+ `abstract_score` + `authors_score`
+ `drugs_score` + `uploader_score`
+ `area_score`
+ Ifnull(`document_content_score`, 0) ) AS
`relevance`
FROM (SELECT `kb_uploads`.*,
(( Round (( Char_length(`title`) - Char_length(REPLACE(`title`, 'info', "")) ) / Char_length('info')) * 30 )) AS `title_score`,
(( Round (( Char_length(`source`) - Char_length(REPLACE(`source`,'info' , "")) ) / Char_length('info')) * 15 )) AS `source_score`,
(( Round (( Char_length(`abstract`) - Char_length(REPLACE(`abstract`, 'info', "" ) ) ) / Char_length('info')) * 20 )) AS `abstract_score`,
(( Round (( Char_length(`authors`) - Char_length(REPLACE(`authors`,'info', "")))/Char_length('info')) * 30 )) AS `authors_score`,
(( Round (( Char_length(`drugs`) - Char_length(REPLACE(`drugs`,'info',"")) ) / Char_length('info')) * 20 )) AS `drugs_score`,
(( Round (( Char_length(`kb_users`.`name`) - Char_length(REPLACE(`kb_users`.`name`,'info',""))) / Char_length('info')) * 20 )) AS `uploader_score`,
(( Round (( Char_length(`kb_upload_areas`.`name`) - Char_length(REPLACE(`kb_upload_areas`.`name`,'info', "")) ) / Char_length('info'))* 20 )) AS`area_score`,
( `content_tbl`.`index_score` * 1 ) AS `document_content_score`
FROM `kb_uploads`
LEFT JOIN `kb_users`
ON `kb_users`.`id` = `kb_uploads`.`uploader`
LEFT JOIN `kb_upload_areas`
ON `kb_upload_areas`.`id` = `kb_uploads`.`area`
LEFT JOIN (SELECT `upload`,
Sum(`weighting`) AS `index_score`
FROM `kb_search_index`
WHERE `word` = 'info'
GROUP BY `upload`) AS `content_tbl`
ON `content_tbl`.`upload` = `kb_uploads`.`id`) AS `temp`
WHERE `is_deleted` = 0
HAVING `relevance` > 0
ORDER BY `relevance` DESC
LIMIT 10 OFFSET 0
There are 2 main issues.
1) Ifnull is not working in the NEW query. The table column contains mostly null instead of 0
2) The relevancy numbers in the new query don't match the numbers in the old one, possibly something to do with IFNULL not working.
The full queries are constructed in PHP, i have left the code out as the logic hasnt changed, only the string concats to replace the Stored Proc.

Related

Better choice instead of functions for multiple inserts

in my sql database I have a procedure that needs to insert in a table tons of randomly generated records.
Something like this:
insert into table_AAA
SELECT
, round(X + rand() * 10 - rand() * 10 )
, round(Y + rand() * 10 - rand() * 10 )
FROM db_numbers d /*very big table containing just 1 column with numbers from 1 to 1M*/
limit 100000;
it takes 3 seconds, sound reasonable compared to other procedure of my servers.
Then I had to make that if the result of the round () is >=0
To do it I made a function:
CREATE FUNCTION `fn_normalize`(`p_value` INT) RETURNS int(11)
BEGIN
declare v_output INT;
IF p_value < 0 THEN
SET v_output = 0 ;
ELSE SET v_output = p_value ;
END IF;
RETURN (v_output);
END
and my insert became:
insert into table_AAA
SELECT
, fn_normalize(round(X + rand() * 10 - rand() * 10 ))
, fn_normalize(round(Y + rand() * 10 - rand() * 10 ))
FROM db_numbers d /*very big table containing just 1 column with numbers from 1 to 1M*/
limit 100000;
very slow, 10x the original one, probably because the function works on each value individually
I thought to use CASE WHEN:
insert into table_AAA
SELECT
, case when(round(X + rand() * 10 - rand() * 10 )) < 0 then 0 else (round(X + rand() * 10 - rand() * 10 )) end
, case when(round(Y + rand() * 10 - rand() * 10 )) < 0 then 0 else (round(X + rand() * 10 - rand() * 10 )) end
FROM db_numbers d /*very big table containing just 1 column with numbers from 1 to 1M*/
limit 100000;
But the else will re-run the rand function, so I cannot be sure is a positive number. Using rand(x) is not an option, because I need the most randomness of values.
Make an update after the insert is even worst.
Am I missing some obvious alternative?
Thanks a lot
YOu can use a subquery:
insert into table_AAA
SELECT CASE WHEN v.x<0 THEN 0 ELSE v.x END,
CASE WHEN v.y<0 THEN 0 ELSE v.y END,
FROM (
SELECT
round(X + rand() * 10 - rand() * 10 ) AS x
, round(Y + rand() * 10 - rand() * 10 ) AS y
FROM db_numbers d /*very big table containing just 1 column with numbers from 1 to 1M*/
limit 100000) v;
Much simpler (and faster) function body:
RETURN GREATEST(0, p_value);
Or, applying that to davide's solution:
insert into table_AAA
SELECT
GREATEST(0, round(X + rand() * 10 - rand() * 10 )) AS x,
GREATEST(0, round(Y + rand() * 10 - rand() * 10 )) AS y
FROM db_numbers d /* table of numbers from 1 to 1M*/
limit 100000) v;

UPDATE statement using the same table in subquery

SELECT
vl1.phone_number,
vl1.first_name,
CONCAT(
SUBSTRING(
(
SELECT
vl2.phone_number
FROM
list as vl2
WHERE
vl2.phone_number LIKE CONCAT( SUBSTRING( vl1.phone_number FROM 1 FOR 3 ), "%" )
ORDER BY
RAND( )
LIMIT 1
)
FROM
1 FOR 6
),
FLOOR( RAND( ) * ( 8999 ) ) + 1000
) AS autogenNumber
FROM
list as vl1
LIMIT 1
The results I get are
phone_number | firstname | autogenNumber
The autogenNumber is generated by first searching for other numbers that share the first three digits. Then 6 digits from that number are picked and another 4 random digits are subsituted to the end.
The above sql query generates the autogen number exactly as I need it.
However, now the issue arises when I want to update the column security_phrase in this list using the similar query below.
UPDATE list as vl1
SET vl1.security_phrase = (
CONCAT(
SUBSTRING(
(
SELECT
vl2.phone_number
FROM
list AS vl2
WHERE
vl2.phone_number LIKE CONCAT( SUBSTRING(phone_number FROM 1 FOR 3 ), "%" )
ORDER BY
RAND( )
LIMIT 1
)
FROM
1 FOR 6
),
FLOOR( RAND( ) * ( 8999 ) ) + 1000
)
)
LIMIT 10
Gives me an error:
1093 - Table 'vl1' is specified twice, both as a target for 'UPDATE'
and as a separate source for data
I have also tried
UPDATE list AS vl1
JOIN list AS vl2
SET vl1.security_phrase = (
CONCAT( SUBSTRING( vl2.phone_number FROM 1 FOR 6 ), FLOOR( RAND( ) * ( 8999 ) ) + 1000 )
)
WHERE
vl2.phone_number LIKE CONCAT( SUBSTRING( vl1.phone_number FROM 1 FOR 3 ), "%" )
Not working and does not give the intended results...
Any help
MySQL does not allow referencing the table being updated again in another subquery, unless it is inside the FROM clause (Derived Table).
Now, in your particular case, we will need to put the complete SELECT query block as a Derived Table. As discussed in chat, lead_id is your Primary Key, so we will join back using the PK to update the rows accordingly.
UPDATE list AS t1
JOIN
(
SELECT
vl1.lead_id,
CONCAT(
SUBSTRING(
(
SELECT
vl2.phone_number
FROM
list as vl2
WHERE
vl2.phone_number LIKE CONCAT( SUBSTRING( vl1.phone_number FROM 1 FOR 3 ), "%" )
ORDER BY
RAND( )
LIMIT 1
)
FROM
1 FOR 6
),
FLOOR( RAND( ) * ( 8999 ) ) + 1000
) AS autogenNumber
FROM
list as vl1
) AS dt
ON dt.lead_id = t1.lead_id
SET t1.security_phrase = dt.autogenNumber

Combining SQL Google Search with own SQL query

I am pretty new to SQL queries.
I have a google SQL Search example
SELECT cID,
(6371 * acos
(
cos(radians(51.455643))
* cos(radians(latCord))
* cos(radians(longCord) - radians(7.011555))
+ sin(radians(51.455643))
* sin(radians(latCord))
)
) AS distance
FROM breitengrade
HAVING distance < 50
ORDER BY distance
LIMIT 0, 20
and a own SQL query
SELECT breitengrade.cID
,breitengrade.latCord
,breitengrade.longCord
,Pages.cIsActive
FROM breitengrade
INNER JOIN Pages ON breitengrade.cID = Pages.cID
WHERE cIsActive = '1'
How can I combine these 2 queries into one so that I can get one single result set?
SELECT breitengrade.cID,
breitengrade.latCord,
breitengrade.longCord,
Pages.cIsActive
(6371 * acos
(
cos(radians(51.455643))
* cos(radians(latCord))
* cos(radians(longCord) - radians(7.011555))
+ sin(radians(51.455643))
* sin(radians(latCord))
)
) AS distance
FROM breitengrade
INNER JOIN Pages ON breitengrade.cID = Pages.cID
WHERE cIsActive = '1'
HAVING distance < 50
ORDER BY distance
LIMIT 0, 20

How to use value from two different rows of a table in another table

I have a MySQL table with the following structure and data:
Increments
id emp_id starting_salary increment_rate increment_frequency
2 340 5000 250 1
3 340 5000 250 4
I need to have aliases, a and b which will hold some value based on the following formula:
starting_salary + (increment_rate * increment_frequency)
To be precise, I want a = 5250 (based on a = (5000 + (250 * 1))) and b = 6000 (based on b = (5000 + (250 * 4)))
Now I have another table with the following data:
PaySlips
id employee_id salary_month arrear
173824 340 '2015-06-01' 2350
I want to join a and b that I got from the table Increments with table PaySlips. And I want to use a and b in the following way:
((a * 8) / 30 + (b * 22) / 30)
My alias will be basic_salary. So basic_salary will hold this value from the above calculation:
basic_salary = ((a * 8) / 30 + (b * 22) / 30)
= ((5250 * 8) / 30 + (6000 *22) / 30)
= (1400 + 4400)
= 5800
I've got no idea how to do this. Can anyone please help me?
All I got are the common columns in both tables - emp_id and employee_id and I can join both tables using these columns. I just can't figure out how I can store the above values and organize the calculation inside my query.
Sample query:
SELECT x.id, x.employee_id,
(*my calculation using a and b from table Increments*) AS basic_salary,
x.salary_month, x.arrear
FROM PaySlips x
JOIN Increments y
ON x.employee_id = y.emp_id
For determining a:
SELECT
(
starting_salary +
(increment_rate * increment_frequency)
) AS a
FROM Increments
WHERE id = 2
And for determining b:
SELECT
(
starting_salary +
(increment_rate * increment_frequency)
) AS b
FROM Increments
WHERE id = 3
MySQL version: 5.2
I'm not clear on all the details, for example what should happen if there are three rows for one employee in increments? Anyhow, here's a sketch to start with:
select employee_id
, ((a * 8) / 30 + (b * 22) / 30) as basic_salary
from (
select x.employee_id
, min(starting_salary + (increment_rate * increment_frequency)) as a
, max(starting_salary + (increment_rate * increment_frequency)) as b
, x.salary_month, x.arrear
from payslips x
join increments y
on x.employee_id = y.emp_id
group by x.employee_id, x.salary_month, x.arrear
) as t
If id 2 and 3 are the criteria (I assumed they are examples) you can use a case statement like:
select employee_id
, ((a * 8) / 30 + (b * 22) / 30) as basic_salary
from (
select x.employee_id
, max(starting_salary + (increment_rate * case when y.id = 2 then increment_frequency end )) as a
, max(starting_salary + (increment_rate * case when y.id = 3 then increment_frequency end)) as b
, x.salary_month
, x.arrear
from payslips x
join increments y
on x.employee_id = y.emp_id
group by x.employee_id, x.salary_month, x.arrear
) as t;
In this case it does not matter what aggregate you use, it is to get rid of the row that contains null.
based on the requirements you added i think something like this will solve your problems:
SELECT PS.id, PS.employee_id, ((A.value * 8) / 30 + (B.value * 22) / 30) AS basic_salary
FROM PaySlips AS PS
JOIN (
SELECT I.emp_id, I.starting_salary + (increment_rate * increment_frequency) AS VALUE
FROM Increments AS I
WHERE I.id = 2
) AS A
ON A.emp_id = PS.employee_id
JOIN (
SELECT I.emp_id, I.starting_salary + (increment_rate * increment_frequency) AS value
FROM Increments AS I
WHERE I.id = 3
) AS B
ON B.emp_id = PS.employee_id
This version might need some alteration if it's not working on your real scenario, but please feel free to tell if anything else needs amending.
Hope it helps.
For determining and setting #a variable:
SET #a := (SELECT
(
starting_salary +
(increment_rate * increment_frequency)
) AS a
FROM Increments
WHERE id = 2);
And for determining and setting #b variable:
SET #b := (SELECT
(
starting_salary +
(increment_rate * increment_frequency)
) AS b
FROM Increments
WHERE id = 3);
Then you can use #a and #b in your main query;
you can test simply by
SELECT #a as a;
SELECT #b as b;
SELECT
x.id,
x.employee_id,
(y.a * 8) / 30 + (y.b * 22) / 30 as basic_salary,
x.salary_month,
x.arrear
FROM PaySlips x
JOIN (
select t1.emp_id, t1.a, t2.b
from (
select
emp_id,
starting_salary + increment_rate * increment_frequency as a
from Increments
where id = 2
) as t1
join (
select
emp_id,
starting_salary + increment_rate * increment_frequency as b
from Increments
where id = 3
) as t2
on t1.emp_id = t2.emp_id
) as y
ON x.employee_id = y.emp_id

Slow location based search result query

I have a query that I use to find results that are ordered by location. Results also have to account for VAT so this is also in the query. The query can unfortunately take 4+ seconds to run when not cached. Can anyone spot any glaringly obvious issues or suggest anything I can do to improve it?
Just to clarify what is happening in the query:
The distance is calculation is euclidean distance using lat/long
The incvat fields are used to show the price when vat is included
The WHEN / THEN statement is used to put prices of 0 at the very bottom
The query:
SELECT * , ROUND( SQRT( POW( ( 69.1 * ( company_branch_lat - 52.4862 ) ) , 2 ) + POW( ( 53 * ( company_branch_lng - - 1.8905 ) ) , 2 ) ) , 1 ) AS distance,
hire_car_day + ( hire_car_day * 0.2 * ! hire_car_incvat ) AS hire_car_day_incvat,
hire_car_addday + ( hire_car_addday * 0.2 * ! hire_car_incvat ) AS hire_car_addday_incvat,
hire_car_week + ( hire_car_week * 0.2 * ! hire_car_incvat ) AS hire_car_week_incvat,
hire_car_weekend + ( hire_car_weekend * 0.2 * ! hire_car_incvat ) AS hire_car_weekend_incvat
FROM hire_car
LEFT JOIN company_branch ON company_branch_id = hire_car_branchid
LEFT JOIN hire_cartypelink ON hire_cartypelink_carhireid = hire_car_id
LEFT JOIN users ON company_branch_userid = user_id
WHERE 1
GROUP BY hire_car_id
HAVING distance <=30
ORDER BY CASE hire_car_day_incvat
WHEN 0
THEN 40000
ELSE hire_car_day_incvat
END , distance ASC
LIMIT 0 , 30
You can use the mysql spatial extension and save the latitude and longitude as a point datatype and make it a spatial index. That way you can reorder the coordinates along a curve and reduce the dimension and preserve spatial information. You can use the spatial index as a bounding box to filter the query and then use the harvesine formula to pick the optimal result. Your bounding box should be bigger then the radius of the great circle. Mysql uses a rtree with some spatial index and my example was about a z curve or a hilbert curve: https://softwareengineering.stackexchange.com/questions/113256/what-is-the-difference-between-btree-and-rtree-indexing.
Then you can insert a geocoordinate directly into a point column: http://dev.mysql.com/doc/refman/5.0/en/creating-spatial-values.html. Or you can use a geometry datatype: http://markmaunder.com/2009/10/10/mysql-gis-extensions-quick-start/. Then you can use MBRcontains function like so: http://dev.mysql.com/doc/refman/4.1/en/relations-on-geometry-mbr.html or any other functions: http://dev.mysql.com/doc/refman/5.5/en/functions-for-testing-spatial-relations-between-geometric-objects.html. Hence you need a bounding box.
Here are some examples:
Storing Lat Lng values in MySQL using Spatial Point Type
https://gis.stackexchange.com/questions/28333/how-to-speed-up-this-simple-mysql-points-in-the-box-query
Here is a simple example with point datatype:
CREATE SPATIAL INDEX sx_place_location ON place (location)
SELECT * FROM mytable
WHERE MBRContains
(
LineString
(
Point($x - $radius, $y - $radius),
Point($x + $radius, $y + $radius)
)
location
)
AND Distance(Point($x, $y), location) <= $radius
MySQL latitude and Longitude table setup.
I'm not sure if it works because it's uses a radius variable with a bounding-box function. It's seems to me MBRwithin is a bit simpler, because it doesn't need any argument: Mysql: Optimizing finding super node in nested set tree.
You are using GROUP BY statement together with HAVING, although I don't see anywhere in the query any aggregate functions. I recommend you to re-write the query like this and see if it makes any difference
SELECT * , ROUND( SQRT( POW( ( 69.1 * ( company_branch_lat - 52.4862 ) ) , 2 ) + POW( ( 53 * ( company_branch_lng - - 1.8905 ) ) , 2 ) ) , 1 ) AS distance,
hire_car_day + ( hire_car_day * 0.2 * ! hire_car_incvat ) AS hire_car_day_incvat,
hire_car_addday + ( hire_car_addday * 0.2 * ! hire_car_incvat ) AS hire_car_addday_incvat,
hire_car_week + ( hire_car_week * 0.2 * ! hire_car_incvat ) AS hire_car_week_incvat,
hire_car_weekend + ( hire_car_weekend * 0.2 * ! hire_car_incvat ) AS hire_car_weekend_incvat
FROM hire_car
LEFT JOIN company_branch ON company_branch_id = hire_car_branchid
LEFT JOIN hire_cartypelink ON hire_cartypelink_carhireid = hire_car_id
LEFT JOIN users ON company_branch_userid = user_id
WHERE ROUND( SQRT( POW( ( 69.1 * ( company_branch_lat - 52.4862 ) ) , 2 ) + POW( ( 53 * ( company_branch_lng - - 1.8905 ) ) , 2 ) ) , 1 ) <= 30
ORDER BY CASE hire_car_day_incvat
WHEN 0
THEN 40000
ELSE hire_car_day_incvat
END , distance ASC
LIMIT 0 , 30