Is This A Dynamic Table? Or what - mysql

I came across this line of code in a Mysql Script Im trying to optimize(The script takes over 7 hours to run). I discovered that this line is responsible for over 60% of the exec time.
# #Fill temp table
SELECT
DISTINCT clv_temp(view01.user_email,,user_number) AS `Authentic`
FROM(
SELECT DISTINCT u_mail, u_phone
FROM
Cust_orders
ORDER BY order_date ASC
)view01;

The excessive runtime is presumably in the definition of the custom function clv_temp, so you will need to find the definition of that.
Note that currently this function is being run for every row returned by the sub-query - i.e. for every unique combination of u_mail and u_phone in the cust_orders table. This is generally a very inefficient way of processing data, and what you will probably need to do is implement the logic currently performed by clv_temp in a set-wise manner, rather than one row at a time.

Related

New MySQL user while loop questions

I have to use this for a project at work, and am running into some trouble. I have a large database (58mil rows) that I have figured out how to query down to what I want and then write this row in to a separate table. Here is my code so far:
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
where pollutantID=45
and fuelTypeID=2
and sourceTypeID=32;
I have about 60 different pollutant ID's, and currently I am manually changing the pollutantID number on line 5 and executing the script to write the row into my 'emissionfactors' table. Each run takes 45 seconds and I have several other fuel types and source types to do so this could take like 8 hours of clicking every 45 seconds. I have some training in matlab and thought I could put a while loop around the above code, create an index, and have it loop through from 1 to 184 on the pollutant IDs but I can't seem to get it to work.
Here are my goals:
- loop the pollutantID from 1 to 184.
-- not all integers are in this range, so need it to simply add one to the index and check to see if that number is found in the pollutantID column if the index is not found.
-- if the index number is found in the pollutant ID column, execute my above code to write the data into my other table
You do not need a while loop, all you need is to change your where clause to use a BETWEEN clause and also tell it what you want to base the average on by adding a GROUP BY clause
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
where pollutantID BETWEEN 1 AND 184
and fuelTypeID=2
and sourceTypeID=32
GROUP BY pollutantID , fuelTypeID, sourceTypeID;
If in fact you want the entire range of the pollutantID, fuelTypeID and sourceTypeID that exists you can just remove the where clause altogether.
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
GROUP BY pollutantID , fuelTypeID, sourceTypeID;
You also don't need to check if the row exists before executing the query, as if it doesn't exist and returns no rows it just won't insert any.
As to the speed issue, you will need to look at adding some table indexes to your table to improve performance. In this case an index that has pollutantID, fuelTypeID and sourceTypeID would speed things up greatly.
My advice, ask for help at work. It is better to admit early that you do not know how to do something and get proper help, as you also mention that you have different fuel types that you want, but the details of that are missing from your question.

MySQL UPDATE with JOIN, SELECT and GROUP BY?

PRETEXT:
I'm currently trying to create detailed and correct call statistics for our customers. Previously, Asterisk stored call details in one row per call. One row is not enough to store all the possible things that can happen to a call, and the data would often be misleading or wrong. They now have a new table format that stores one row per event in the call. This table will easily contain millions of records for our larger customers.
We tried selecting directly from this table on a test server with only 200k records, but the more advanced queries ended up taking forever. We decided to create a summary table (so we're back to one row per call, but now with more details available). There are many things I'd like to do with this data, but if I can solve this "simple" problem, I'm sure I'll solve all the others as well.
PROBLEM:
The field linkedid is the same for all rows in one call. The field eventtype can have zero, one or multiple occurances of the same event for one linkedid.
I fill the summary table with some data:
INSERT INTO astcel_summary
(linkedid, starttime, endtime, callfrom, callto, direction)
SELECT
linkedid, MIN(eventtime), MAX(eventtime), cid_num, exten, IF ((context = 'from-extension'), 1, 0)
FROM astcel
GROUP BY linkedid;
The BRIDGE_START event is especially important as it indicates that a person answered the call. The call can be unanswered, answered or even answered multiple times (transfer, conference). I want to UPDATE my summary table with several fields from the first (if any) BRIDGE_START event for each call.
I've been able to update one field at a time like this:
UPDATE astcel_summary, astcel
SET astcel_summary.answertime =
(
SELECT eventtime
FROM astcel
WHERE astcel.linkedid = astcel_summary.linkedid
AND astcel.eventtype = 'BRIDGE_START'
GROUP BY astcel.linkedid
)
WHERE astcel.linkedid = astcel_summary.linkedid
AND astcel.eventtype = 'BRIDGE_START';
I've tried many variations with different joins and subqueries to update multiple fields, but can't make it work. If this operation could also be merged with the original insert somehow, that would be amazing.
Even better would be a way to select, without using summary tables and without it taking too long time, for instance: The average time callers waited before being answered (plus several other similar pieces of data) during business hours last month grouped by the numbers called.

to improve performance on query

I hava nest query:
SELECT PIXEL_X as 'X_Coord', PIXEL_Y as 'Y_Coord',
CONVERTWATTS2DBM_udf(SUM(L2_VALUE)/SUM(L3_VALUE)) as 'Pixel_Value'
FROM table
WHERE
('GSM 850/900' like CONCAT('%',FILTER2,'/%') OR
'GSM 850/900' like CONCAT('%/',FILTER2,'%') )
GROUP BY X_Coord, Y_Coord;
but takes a long time, could you help me to improve their performance?
Thanks
A straight forward method to optimize would be this:
Create the filter variable yourself, in whatever language you use to access the database.
Set one query for each option for "GMS 850/900", then join them together using UNION, like so:
SELECT PIXEL_X as 'X_Coord', PIXEL_Y as 'Y_Coord',
CONVERTWATTS2DBM_udf(SUM(L2_VALUE)/SUM(L3_VALUE)) as 'Pixel_Value'
FROM table WHERE
'GSM 850/900' like '%YOURVALUE1%'
UNION
SELECT PIXEL_X as 'X_Coord', PIXEL_Y as 'Y_Coord',
CONVERTWATTS2DBM_udf(SUM(L2_VALUE)/SUM(L3_VALUE)) as 'Pixel_Value'
FROM table WHERE
'GSM 850/900' like '%YOURVALUE2%'
This should speed up the query.
Furthermore, it would speed up the query alot if you generated the values you generate on the fly beforehand. You could create a column and generate the CONVERTWATTS2DBM_udf in a ON WRITE trigger. This would remove the necessity of running this function not only on every row, but also on every run of the query itself.
Lastly, a composite index over Pixel_X, Pixel_Y and your newly created column could speed up the query further.

remove duplicates in mysql database

I have a table with columns latitude and longitude. In most cases the value extends past the decimal quite a bit: -81.7770051972473 on the rare occasion the value is like this: -81.77 for some records.
How do I find duplicates and remove one of the duplicates for only the records that extend beyond two decimal places?
Using some creative substring, float, and charindex logic, I came up with this:
delete l1
from
latlong l1
inner join (
select
id,
substring(cast(latitude as varchar), 0, INSTR(CAST(latitude as varchar))+3, '.') as truncatedLat
from
latlong
) l2 on
l1.id <> l2.id
and l1.latitude = cast(l2.truncatedLat as float)
Before running, try select * in lieu of delete l1 first to make sure you're deleting the right rows.
I should note that this worked on SQL Server using functions I know exist in MySQL, but I wasn't able to test it against a MySQL instance, so there may be some little tweaking that needs to be done. For example, in SQL Server, I used charindex instead of instr, but both should work similarly.
Not sure how to do that purely in SQL.
I have used scripting languages like PHP or CFML to solve similar needs by building a query to pull the records then looping over the record set and performing some comparison. If true, then VERY CAREFULLY call another function, passing in the record ID and delete the record. I would probably even leave the record in the table, but mark some another column as isDeleted.
If you are more ambitious than I, it looks like this thread is close to what you want
Deleting Duplicates in MySQL
finding multi column duplicates mysql
Using an external programming language (Perl, PHP, Java, Assembly...):
Select * from database
For each row, select * from database where newLat >= round(oldLat,2) and newLat < round(oldLat,2) + .01 and //same criteria for longitude
Keep one of them based on whatever criteria you choose. If lowest primary key, sort by that and skip the first result.
Delete everything else.
Repeat skipping to this step for any records you already deleted.
If for some reason you want to identify everything with greater than 2 digit precision:
select * from database where lat != round(lat,2), or long != round(long,2)

MySQL Query eliminate duplicates but only adjacent to each other

I have the following query..
SELECT Flights.flightno,
Flights.timestamp,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
ORDER BY Flights.timestamp DESC
Which returns the following screenshot.
However I cannot use a simple group by as for example BCS6515 will appear a lot later in the list and I only want to "condense" the rows that are the same next to each other in this list.
An example of the output (note BCS6515 twice in this list as they were not adjacent in the first query)
Which is why a GROUP BY flightno will not work.
I don't think there's a good way to do so in SQL without a column to help you. At best, I'm thinking it would require a subquery that would be ugly and inefficient. You have two options that would probably end up with better performance.
One would be to code the logic yourself to prune the results. (Added:) This can be done with a procedure clause of a select statement, if you want to handle it on the database server side.
Another would be to either use other information in the table or add new information to the table for this purpose. Do you currently have something in your table that is a different value for each instance of a number of BCS6515 rows?
If not, and if I'm making correct assumptions about the data in your table, there will be only one flight with the same number per day, though the flight number is reused to denote a flight with the same start/end and times on other days. (e.g. the 10a.m. from NRT to DTW is the same flight number every day). If the timestamps were always the same day, then you could use DAY(timestamp) in the GROUP BY. However, that doesn't allow for overnight flights. Thus, you'll probably need something such as a departure date to group by to identify all the rows as belonging to the same physical flight.
GROUP BY does not work because 'timestamp' value is different for 2 BCS6515 records.
it will work only if:
SELECT Flights.flightno,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
GROUP BY (Flights.flightno)