REGEXP in a WHERE statement? - mysql

I have a table with a field called 'user_car'. It consists of a cat'd underscore separated value (user's id _ car's id)
user_car rating
-----------------------------
11_56748 4
13_23939 2
1_56748 1
2001_56748 5
163_23939 1
I need to get the average rating for any "car". In my example table, there are only 2 cars listed: 56748 and 23939. So say I want to get the average rating for the car: 56748, so far I have this SQL, but I need the correct regex. If I'm totally off-base, let me know. Thanks!
$sql = "
SELECT AVG 'rating' FROM 'car_ratings'
WHERE 'user_car' REGEXP '';
";

You can extract the car id using:
substring(user_car from (locate('_', user_car) + 1))
this will allow you to do:
select substring(user_car from (locate('_', user_car) + 1)) as car_id,
avg(rating)
from car_ratings
group by car_id
But, this is a bad idea. You would be much better off splitting user_car into user_id and car_id.

I don't see why you need to use REGEXes ...
SELECT AVG(`rating`) FROM `car_ratings` WHERE `user_car` LIKE '%_56748'
Regexes are slow and can pretty easily shoot you in the foot. I learned to avoid them in MySQL whenever I could.

Related

MySQL - How can I group music together when the names are similar?

I would like to be able to return a single line when the name of some musics are the same or similar, as for example this case:
music with similar names
You can see that the names are the same with an extension like " - JP Ver." or something like that, I would like to be able to group them in one row with the first column incrementing the whole.
My current request to return these lines is as follows:
select count(id) number, name, sec_to_time(floor(sum(duration) / 1000)) time
from track
where user_id = 'value'
group by name, duration
order by number desc, time desc;
I would like to get a result like this
Thank you for reading and responding! I wish you all a good day!
Try:
SELECT COUNT(name) no,
TRIM(SUBSTRING_INDEX(name, '-', 1)) namee
FROM track
GROUP BY namee
Example: https://onecompiler.com/mysql/3xt3bfev6
Use GROUP_CONCAT
Here is a proof of concept script. You can add your other columns. I have grouped by the first 4 letters. You will probably want to use more.
CREATE TABLE track (
idd INT,
nam CHAR(50),
tim INT
);
INSERT INTO track VALUES (1,'Abba 1',5);
INSERT INTO track VALUES (2,'Abba 2',6);
INSERT INTO track VALUES (3,'Beta 1',12);
INSERT INTO track VALUES (4,'Beta 4',8);
SELECT
LEFT(nam,4) AS 'Group',
COUNT(idd) AS 'Number',
GROUP_CONCAT(DISTINCT idd ORDER BY idd ASC SEPARATOR ' & ') AS IDs,
GROUP_CONCAT(DISTINCT nam ORDER BY nam ASC SEPARATOR ', ') AS 'track names',
SUM(tim) AS 'total time'
FROM track
GROUP BY LEFT(nam,4);
DROP TABLE track;
Output
Group Number IDs track names total time
Abba 2 1 & 2 Abba 1, Abba 2 11
Beta 2 3 & 4 Beta 1, Beta 4 20

SQL:Show column only if has data in it

Suppose I have a table like this:
name | age | experience
abc | 18 | 0
def | 19 | 0
efg | 20 | 0
I want to select the experience column only if any one value is greater than zero.
In this case, my SQL query should return only name and age and not experience.
If experience of lets say "efg" is greater than 0, then query should return name, age and experience.
I have tried following query
SELECT EXISTS (SELECT name,age,experience FROM emp_info )
AND NOT
EXISTS (SELECT experience FROM emp_info WHERE experience=0 );
But it is not working.
Try this:-
IF ((select max(experience)FROM emp_info) > 0)
SELECT name,age,experience FROM emp_info
ELSE
SELECT name,age FROM emp_info
In almost all relational databases, your queries have to return fixed numbers of columns, i.e the same number of columns for all rows. So what you are asking for isn't reasonable. You could probably get something like this to work on Informix due to jagged tables support, but that's the only one I can think of.
Other options you have include serializing to JSON in your query, or generating XML but that's a bit advanced for this and it is not clear this is what you want.
Normally we handle this on the front end, not in the database query.
SELECT name, age, NULLIF(experience, 0) from emp_info
Your question is kind of tricky because returning different set of column depending on the result is maybe not what you want to do, you could do it directly from your code, not from your SQL projection.
This may work :
if(experience > 0)
begin
SELECT name,age,experience FROM emp_info
end
else
begin
SELECT name,age FROM emp_info
end

How to do this query against MySQL database table?

I was given a task to show the CPU usage trend as part of a building process which also do regression test.
Each individual test case run has a record in the table RegrCaseResult. The RegrCaseResult table looks something like this:
id projectName ProjectType returnCode startTime endTime totalMetrics
1 'first' 'someType' 16 'someTime' 'someOtherTime' 222
The RegrCaseResult.totalMetrics is a special key which links to another table called ThreadMetrics through ThreadMetrics.id.
Here is how ThreadMetrics will look like:
id componentType componentName cpuTime linkId
1 'Job Totals' 'Job Totals' 'totalTime' 34223
2 'parser1' 'parser1' 'time1' null
3 'parser2' 'generator1' 'time2' null
4 'generator1' 'generator1' 'time3' null
------------------------------------------------------
5 'Job Totals' 'Jot Totals' 'totalTime' 9899
...
The rows with the compnentName 'Job Totals' is what the totalMetrics from RegrCaseResult table will link to and the 'totalTime' is what I am really want to get given a certain projectType. The 'Job Totals' is actually a summation of the other records - in the above example, the summation of time1 through time3. The linkId at the end of table ThreadMetrics can link back to RegrCaseResult.id.
The requirements also states I should have a way to enforce the condition which only includes those projects which have a consistent return code during certain period. That's where my initial question comes from as follows:
I created the following simple table to show what I am trying to achieve:
id projectName returnCode
1 'first' 16
2 'second' 16
3 'third' 8
4 'first' 16
5 'second' 8
6 'first' 16
Basically I want to get all the projects which have a consistent returnCode no matter what the returnCode values are. In the above sample, I should only get one project which is "first". I think this would be simple but I am bad when it comes to database. Any help would be great.
I tried my best to make it clear. Hope I have achieved my goal.
Here is an easy way:
select projectname
from table t
group by projectname
having min(returncode) = max(returncode);
If the min() and max() values are the same, then all the values are the same (unless you have NULL values).
EDIT:
To keep 'third' out, you need some other rule, such as having more than one return code. So, you can do this:
select projectname
from table t
group by projectname
having min(returncode) = max(returncode) and count(*) > 1;
select projectName from projects
group by projectName having count(distinct(returnCode)) = 1)
This would also return projects which has only one entry.
How do you want to handle them?
Working example: http://www.sqlfiddle.com/#!2/e7338/8
This should do it:
SELECT COUNT(ProjectName) AS numCount, ProjectName FROM (
SELECT ProjectName FROM Foo
GROUP BY ProjectName, ReturnCode
) AS Inside
GROUP BY Inside.ProjectName
HAVING numCount = 1
This groups all the ProjectNames by their names and return codes, then selects those that only have a single return code listed.
SQLFiddle Link: http://sqlfiddle.com/#!2/c52b6/11/0
You can try something like this with Not Exists:
Select Distinct ProjectName
From Table A
Where Not Exists
(
Select 1
From Table B
Where B.ProjectName = A.ProjectName
And B.ReturnCode <> A.ReturnCode
)
I'm not sure exactly what you're selecting, so you can change the Select statement to what you need.

Selecting value corresponding with MAX value of a group

I am trying to get the OXSEOURL of my OXSEO table.
Structure:
oxobjectid | oxseourl | oxparams
Data:
http://imageshack.com/a/img268/7443/3xr4.png
http://imageshack.com/a/img42/315/8bdu.png
My deepest SEO URL always has the higher value in OXPARAMS field.
Only the numeric values, the others are never count..
Return should be:
http://imageshack.com/a/img29/8404/4jbv.png
I found a solution yesterday, but it was very slow, now I am trying to get a faster way to do it.
So I would like to get the oxseourl for the same oxobjectid with the max oxparams value.
I have more than 330.000 rows, so every ms counts..
I only have to select the urls for products staring with "tbproduct_" objectid.
My query:
SELECT seo2.oxseourl, seo2.oxobjectid, seo2.oxparams
FROM oxseo AS seo2
JOIN (
SELECT oxobjectid,
MAX(oxparams) AS maxparam
FROM oxseo
GROUP BY
oxobjectid
) AS usm
ON usm.maxparam = seo2.oxparams
WHERE seo2.oxobjectid LIKE '%tbproduct_%'
AND seo2.oxparams REGEXP '^-?[0-9]+$'
But this returns the same rows for the products.
Thanks for any help.
A bit optimized, and a lot faster:
SELECT seo.oxseourl, seo.oxobjectid, MAX(seo.oxparams)
FROM oxseo AS seo
WHERE seo.oxobjectid LIKE 'tbproduct_%' AND seo.oxparams REGEXP '^-?[0-9]+$'
GROUP BY seo.oxseourl, seo.oxobjectid

Need Help streamlining a SQL query to avoid redundant math operations in the WHERE and SELECT

*Hey everyone, I am working on a query and am unsure how to make it process as quickly as possible and with as little redundancy as possible. I am really hoping someone there can help me come up with a good way of doing this.
Thanks in advance for the help!*
Okay, so here is what I have as best I can explain it. I have simplified the tables and math to just get across what I am trying to understand.
Basically I have a smallish table that never changes and will always only have 50k records like this:
Values_Table
ID Value1 Value2
1 2 7
2 2 7.2
3 3 7.5
4 33 10
….50000 44 17.2
And a couple tables that constantly change and are rather large, eg a potential of up to 5 million records:
Flags_Table
Index Flag1 Type
1 0 0
2 0 1
3 1 0
4 1 1
….5,000,000 1 1
Users_Table
Index Name ASSOCIATED_ID
1 John 1
2 John 1
3 Paul 3
4 Paul 3
….5,000,000 Richard 2
I need to tie all 3 tables together. The most results that are likely to ever be returned from the small table is somewhere in the neighborhood of 100 results. The large tables are joined on the index and these are then joined to the Values_Table ON Values_Table.ID = Users_Table.ASSOCIATED_ID …. That part is easy enough.
Where it gets tricky for me is that I need to return, as quickly as possible, a list limited to 10 results where value1 and value2 are mathematically operated on to return a new_ value where that new_value is less than 10 and the result is sorted by that new_value and any other where statements I need can be applied to the flags. I do need to be able to move along the limit. EG LIMIT 0,10 / 11,10 / 21,10 etc...
In a subsequent (or the same if possible) query I need to get the top 10 count of all types that matched that criteria before the limit was applied.
So for example I want to join all of these and return anything where Value1 + Value2 < 10 AND I also need the count.
So what I want is:
Index Name Flag1 New_Value
1 John 0 9
2 John 0 9
5000000 Richard 1 9.2
The second response would be:
ID (not index) Count
1 2
2 1
I tried this a few ways and ultimately came up with the following somewhat ugly query:
SELECT INDEX, NAME, Flag1, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10
ORDER BY New_Value
LIMIT 0,10
And then for the count:
SELECT ID, COUNT(TYPE) as Count, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10
GROUP BY TYPE
ORDER BY New_Value
LIMIT 0,10
Being able to filter on the different flags and such in my WHERE clause is important; that may sound stupid to comment on but I mention that because from what I could see a quicker method would have been to use the HAVING statement but I don't believe that will work in certain instance depending on what I want to use my WHERE clause to filter against.
And when filtering using the flags table :
SELECT INDEX, NAME, Flag1, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10 AND Flag1 = 0
ORDER BY New_Value
LIMIT 0,10
...filtered count:
SELECT ID, COUNT(TYPE) as Count, (Value1 * some_variable + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE (Value1 * some_variable + Value1) < 10 AND Flag1 = 0
GROUP BY TYPE
ORDER BY New_Value
LIMIT 0,10
That works fine but has to run the math multiple times for each row, and I get the nagging feeling that it is also running the math multiple times on the same row in the Values_table table. My thought was that I should just get only the valid responses from the Values_table first and then join those to the other tables to cut down on the processing; with how SQL optimizes things though I wasn't sure if it might not already be doing that. I know I could use a HAVING clause to only run the math once if I did it that way but I am uncertain how I would then best join things.
My questions are:
Can I avoid running that math twice and still make the query work
(or I suppose if there is a good way
to make the first one work as well
that would be great)
What is the fastest way to do this
as this is something that will
be running very often.
It seems like this should be painfully simple but I am just missing something stupid.
I contemplated pulling into a temp table then joining that table to itself but that seems like I would trade math for iterations against the table and still end up slow.
Thank you all for your help in this and please let me know if I need to clarify anything here!
** To clarify on a question, I can't use a 3rd column with the values pre-calculated because in reality the math is much more complex then addition, I just simplified it for illustration's sake.
Do you have a benchmark query to compare against? Usually it doesn't work to try to outsmart the optimizer. If you have acceptable performance from a starting query, then you can see where extra work is being expended (indicated by disk reads, cache consumption, etc.) and focus on that.
Avoid the temptation to break it into pieces and solve those. That's an antipattern. That includes temp tables especially.
Redundant math is usually ok - what hurts is disk activity. I've never seen a query that needed CPU work reduction on pure calculations.
Gather your results and put them in a temp table
SELECT * into TempTable FROM (SELECT INDEX, NAME, Type, ID, Flag1, (Value1 + Value2) as New_Value
FROM Values_Table
JOIN Users_Table ON ASSOCIATED_ID = ID
JOIN Flags_Table ON Flags_Table.Index = Users_Table.Index
WHERE New_Value < 10)
ORDER BY New_Value
LIMIT 0,10
Return Result for First Query
SELECT INDEX, NAME, Flag1, New_Value
FROM TempTable
Return Results for count of Types
Select ID, Count(Type)
FROM TempTable
GROUP BY TYPE
Is there any chance that you can add a third column to the values_table with the pre-calculated value? Even if the result of your calculation is dependent on other variables, you could run the calculation for the whole table but only when those variables change.