Sort Array results based on variable Algorithm - MYSQL & PHP - mysql

I have an array nested within a PHP while loop that outputs a set of forum posts a number of times. I want to sort the array results based on an algorithm - however I do not want to hardcode the algorithm so I can test different variables at a later date. NB - I'm not looking to sort the items within the array, but rather the final output which when looped will output the array 20+ times.
Currently I have 2 Tables - the Forum table with loads of rows (3000 +):
id | name | date_add | votes | ... |
1 | Test Name | 1234567890 | 2 | ... |
... | ... | ... | ... | ... |
The other table contains the Algorithm variables that I want to pass through to the calculation and has only 1 row:
id | vote_reduction | time_variable | gravity |
1 | 1 | 2 | 1.8 |
The specific algorithm I'm using sorts the information based on how log it has been live (in hours), how many votes it has and the gravity factor makes it more sensitive to time. In full:
(votes - vote_reduction)/((Hours Live + time_variable) ^ gravity)
So far I've managed to get this far, and something is going wrong but I can't quite figure it out:
SELECT forum.*,
((forum.votes - algorithm.vote_reduction)/POW(((TIMESTAMPDIFF(HOUR, SYSDATE(), forum.date_add)) + algorithm.time_variable),algorithm.gravity)) AS algorithm.al,
forum.name, forum.id
FROM forum as forum
LEFT JOIN algorithm AS algorithm ON (algorithm.id='1')
ORDER BY algorithm.al
Any ideas?

I haven't tested the results of the algorithm, but the query returns a result for al if you just remove algorithm. from algorithm.al. I don't think you can make a column alias that acts like it's part of a table. What's confusing me is that you say that it's running on your machine. It's not running on SQL Fiddle and is throwing an error.
SELECT forum.*,
((forum.votes - algorithm.vote_reduction)/POW(((TIMESTAMPDIFF(HOUR, SYSDATE(), forum.date_add)) + algorithm.time_variable),algorithm.gravity)) AS al
FROM forum AS forum
LEFT JOIN algorithm AS algorithm ON (algorithm.id='1')
ORDER BY al
Link to SQL fiddle

There are a few errors in the code as follows:
Making an alias with the name "algorithm" clashes with a MySQL
clause also called ALGORITHM
The calculation (at least the way it is
edited above) creates too many values in the POW clause
Encapsulating all declared aliases in ' ' makes the code more full
proof - but the ORDER BY clause doesn't like quotation marks (so remove them there)
The SYSDATE() and forum.date_add fields are in different formats -
the latter being a timestamp
To fix:
SELECT forum.*, TIMESTAMPDIFF(HOUR, from_unixtime(bd.date_add), NOW()) as 'timedif'
((forum.votes - alg.vote_reduction)/POW(('timedif' + alg.time_variable),alg.gravity)) AS 'al'
FROM forum AS forum
LEFT JOIN algorithm AS 'alg' ON (alg.id='1')
ORDER BY al

Related

How to store and evaluate dynamic expressions in MySQL(or any other SQL)

Best way to store a dynamic expression in a table for each row for a searching module.
The expression is dynamic and can have multiple fields which are being compared.
I considered creating a separate column for each type of field and fattening out complex nested logic by getting all possible combinations using dnf and storing them in my table. The disadvantages of doing that is for every new logic and expression, a new column has to be created which would lead to a large table which has too many NULLS in it and also adding a new column would take time & refactoring(we are talking about more than 800 columns here).
The alternate approach which I think would work better is below->
I want to discuss if there are better way to this, and if not, how can we improve and achieve the below suggested approach.
| id | expression | diagnosis |
|------|------------------------------------------------|-------------|
| 1 |`p.age>12 and p.gender==Male` | diseaseA |
| 2 |`p.age>50 and p.bp>20` | diseaseB |
| 3 |`p.age<20 and p.bp<20` | diseaseC |
| 4 |`p.age<30 and p.age>20 and (p.bp<30 or p.bp>50)`| diseaseD |
I want to search in this table, for a patient p with certain properties (age=*something*,bp=*something*,etc).
The resulting rows should return all rows which satisfy the expression and also rows which partially match the expression(i.e the rows which are using properties not supplied in the search criteria).
For example for a search for patient p(age=22,bp=15), the search result should be
| id | disease |
|------|-------------|
| 1 | diseaseA |
| 3 | diseaseC |
| 4 | diseaseD |
Since I am new to SQL, the (newbie) way I think I can do this is
First get all the rows(in-memory would be costly, lets discuss what is best possible way to execute the below said functionality in point 2 row-by-row)
Then row-by-row transform the expression to a logical executable expression(which is later executed using eval) using regex matching & replacement(I hope there is a better way than this) for the search criteria(i.e. substituting the patient details) [in my example for the 2nd row, the expression p.age>50 and p.bp>20 gets converted to "22>50 && 15>20"]
All the rows for which the result of transforming & executing the result was true(or partially matched) should be returned.
The language is not an issue as I would be starting this project from scratch and can use any language
I can answer for MySQL.
First of all, you'll have to write all of your sql code inside sql procedure.
Generally you are interestedin dynamic SQL
https://dev.mysql.com/doc/refman/5.7/en/sql-syntax-prepared-statements.html
So a straight-forward approach is to open a a cursor for your table with expressions and for each expression replace p.age with it's actual value and then execute dynamic SQL. (select 22 > 50 and 15 > 20)
Another approach is to loop through expression table (open cursor for it) and as you probably have patient id (not only it's field values) just generate normal sql that selects from patient table (select patient_id from patients where [expression_from_expression_table] and patient_id = [your_known_patient_id])
And the third one that I can imagine is generating a big single query from whole expression table
select group_concat(concat('if(', expression, ',"', diagnosis, '", "") as ', diagnosis) separator ',') from expressions into somevar;
and then doing replace of p.* with actual values and executing second query:
set somevar = replace(somevar, 'p.age', '15');
...
#qry = concat('select ', somevar);
PREPARE qry FROM #qry;
EXECUTE qry;
The third approach is fastest to my mind but will require aditional work on client as you will recieve diagnosis as columns, not as rows.
But hope you get the general idea.

Compare two DNA-like strings with MySQL

I'm trying to find a way to compare two DNA-like strings with MySQL, stored functions are no problem. Also the string may be changed, but needs to have the following format: [code][id]-[value] like C1-4. (- may be changed aswell)
Example of the string:
C1-4,C2-5,C3-9,S5-2,S8-3,L2-4
If a value not exists in the other string, for example S3-1 it will score 10 (max value). If the asked string has C1-4 and the given string has C1-5 the score has to be 4 - 5 = -1 and if the asked string is C1-4 and the given string has C1-2 the score has to be 4 - 2 = 2.
The reason for a this is that my realtime algorithm is getting slow with 10.000 results. (already optimized with stored functions, indexes, query optimalizations) Because 10.000 x small and quick queries will make a lot.
And the score has to be calculated before I can order my query and get the right limit.
Thanks and if you have any questions let me know by comment.
** EDIT **
I'm thinking that it's also possible to not use a string but a table where the DNA-bits are stored as a 1-n relation table.
ID | CODE | ID | VALUE
----------------------
1. | C... | 2. | 4....

Database design and query optimization/general efficiency when joining 6 tables in mySQL

I have 6 tables. These are simplified for this example.
user_items
ID | user_id | item_name | version
-------------------------------------
1 | 123 | test | 1
data
ID | name | version | info
----------------------------
1 | test | 1 | info
data_emails
ID | name | version | email_id
------------------------
1 | test | 1 | 1
2 | test | 1 | 2
emails
ID | email
-------------------
1 | email#address.com
2 | second#email.com
data_ips
ID | name | version | ip_id
----------------------------
1 | test | 1 | 1
2 | test | 1 | 2
ips
ID | ip
--------
1 | 1.2.3.4
2 | 2.3.4.5
What I am looking to achieve is the following.
The user (123) has the item with name 'test'. This is the basic information we need for a given entry.
There is data in our 'data' table and the current version is 1 as such the version in our user_items table is also 1. The two tables are linked together by the name and version. The setup is like this as a user could have an item for which we dont have data, likewise there could be an item for which we have data but no user owns..
For each item there are also 0 or more emails and ips associated. These can be the same for many items so rather than duplicate the actual email varchar over and over we have the data_emails and data_ips tables which link to the emails and ips table respectively based on the email_id/ip_id and the respective ID columns.
The emails and ips are associated with the data version again through the item name and version number.
My first query is is this a good/well optimized database setup?
My next query and my main question is joining this complex data structure.
What i had was:
PHP
- get all the user items
- loop through them and get the most recent data entry (if any)
- if there is one get the respective emails
- get the respective ips
Does that count as 3 queries or essentially infinite depending on the number of user items?
I was made to believe that the above was inefficient and as such I wanted to condense my setup into using one query to get the same data.
I have achieved that with the following code
SELECT user_items.name,GROUP_CONCAT( emails.email SEPARATOR ',' ) as emails, x.ip
FROM user_items
JOIN data AS data ON (data.name = user_items.name AND data.version = user_items.version)
LEFT JOIN data_emails AS data_emails ON (data_emails.name = user_items.name AND data_emails.version = user_items.version)
LEFT JOIN emails AS emails ON (data_emails.email_id = emails.ID)
LEFT JOIN
(SELECT name,version,GROUP_CONCAT( the_ips.ip SEPARATOR ',' ) as ip FROM data_ips
LEFT JOIN ips as the_ips ON data_ips.ip_id = the_ips.ID )
x ON (x.name = data.name AND x.version = user_items.version)
I have done loads of reading to get to this point and worked tirelessly to get here.
This works as I require - this question seeks to clarify what are the benefits of using this instead?
I have had to use a subquery (I believe?) to get the ips as previously it was multiplying results (I believe based on the complex joins). How this subquery works I suppose is my main confusion.
Summary of questions.
-Is my database setup well setup for my usage? Any improvements would be appreciated. And any useful resources to help me expand my knowledge would be great.
-How does the subquery in my sql actually work - what is the query doing?
-Am i correct to keep using left joins - I want to return the user item, and null values if applicable to the right.
-Am I essentially replacing a potentially infinite number of queries with 2? Does this make a REAL difference? Can the above be improved?
-Given that when i update a version of an item in my data table i know have to update the version in the user_items table, I now have a few more update queries to do. Is the tradeoff off of this setup in practice worthwhile?
Thanks to anyone who contributes to helping me get a better grasp of this !!
Given your data layout, and your objective, the query is correct. If you've only got a small amount of data it shouldn't be a performance problem - that will change quickly as the amount of data grows. However when you ave a large amount of data there are very few circumstances where you should ever see all your data in one go, implying that the results will be filtered in some way. Exactly how they are filtered has a huge impact on the structure of the query.
How does the subquery in my sql actually work
Currently it doesn't work properly - there is no GROUP BY
Is the tradeoff off of this setup in practice worthwhile?
No - it implies that your schema is too normalized.

HowTo: Query MySQL to retrieve search data, while limiting the results and sorting by a field.

I have two simple Mysql tables:
SYMBOL
| id | symbol |
(INT(primary) - varchar)
PRICE
| id | id_symbol | date | price |
(INT(primary), INT(index), date, double)
I have to pass two symbols to get something like:
DATE A B
2001-01-01 | 100.25 | 25.26
2001-01-02 | 100.23 | 25.25
2001-01-03 | 100.24 | 25.24
2001-01-04 | 100.25 | 25.26
2001-01-05 | 100.26 | 25.28
2001-01-06 | 100.27 | 30.29
Where A and B are the symbols i need to search and the date is the date of the prices. (because i need the same date to compare symbol)
If one symbol doesn't have a date that has the other I have to jump it. I only need to retrive the last N prices of those symbols.
ORDER: from the earliest date to latest (example the last 100 prices of both)
How could I implement this query?
Thank you
Implementing these steps should bring you the desired result:
Get dates and prices for symbol A. (Inner join PRICE with SYMBOL to obtain the necessary rows.)
Similarly get dates and prices for symbol B.
Inner join the two result sets on the date column and pull the price from the first result set as the A column and the other one as B.
This should be simple if you know how to join tables.
I think you should update your question to resolve any of the mistakes you made in representing your data. I'm having a hard time following the details. However, I think based on what I am seeing there are four MySQL concepts you need to solve your problem.
The first is JOINS you would use a join to put two tables together so you may select related data using the key that you describe as "id_symbol"
The second would be to use LIMIT which will allow you to specify the number of records to return such as that if you wanted one record you would use the keywould LIMIT 1 or if you wanted a hundred records LIMIT 100
The third would be to use a WHERE clause to allow you to search for a specific value in one of your fields from the table you are querying.
The last is the ORDER BY which will allow you to specify a field to sort your returned records and the direction you want them sorted ASC or DESC
An Example:
SELECT *
FROM table1
JOIN table2 ON table1.id = table2.table1_id
WHERE table1.searchfield = 'search string'
LIMIT 100
ORDER BY table1.orderfield DESC
(This is pseudo code so this query may not actually work but is close and should provide you with the correct idea.)
I suggest referencing the MySQL documentation found here it should provide everything you need to keep going.

creating a series of time periods as rows

I want to write a query that, for any given start date in the past, has as each row a week-long date interval up to the present.
For instance, given the start date of Nov 13th 2010, and the present date of 12-16-2010, I want a result set like
+------------+------------+
| Start | End |
+------------+------------+
| 2010-11-15 | 2010-11-21 |
+------------+------------+
| 2010-11-22 | 2010-11-28 |
+------------+------------+
| 2010-11-29 | 2010-12-05 |
+------------+------------+
| 2010-12-06 | 2010-12-12 |
+------------+------------+
It doesn't go past 12 because the week-long period that the present date occurs in isn't complete.
I can't get a foothold on how I would even start to write this query.
Can I do this in a single query? Or should I use code for looping, and do multiple queries?
It's quite difficult (but not impossible) to create such a result set dynamically in MySQL as it doesn't yet support any of recursive CTEs, CONNECT BY, or generate_series that I would use to do this in other databases.
Here's an alternative approach you can use.
Create and prepopulate a table containing all the possible rows from some date far in the past to some date far in the future. Then you can easily generate the result you need by querying this table with a WHERE clause, using an index to make the query efficient.
The drawbacks of this approach are quite obvious:
It takes up storage space unnecessarily.
If you query outside of the range that you populated your table with you won't get any results, which means that you will either have to populate the table with enough dates to last the lifetime of your application or else you need a script to add more dates every so often.
See also this related question:
How do I make a row generator in MySQL
Beware this is just a concept idea: I do not have a mysql installation right here, so that I cannot test it.
However I would base myself on a table containing the integers, in order to emulate a series.
Something like :
CREATE TABLE integers_table
(
id integer primary key
);
Followed by (warning, this is pseudo code)
INSERT INTO integers_table(0…32767);
(that should be enough weeks for the rest of our lives :-)
Then
FirstMondayInUnixTimeStamp_Utc= 3600 * 24 * 4
SecondPerday=3600 * 24
(since 1 jan 1970 was a thursday. Beware I did not cross check! I might be off a few hours!)
And then
CREATE VIEW weeks
AS
SELECT integers_table.id AS week_id,
FROM_UNIXTIME(FirstMondayInUnixTimeStamp_Utc + week_id * SecondPerDay * 7) as week_start
FROM_UNIXTIME(FirstMondayInUnixTimeStamp_Utc + week_id * SecondPerDay * 7 + SecondPerDay * 6) as week_end;