New MySQL user while loop questions - mysql

I have to use this for a project at work, and am running into some trouble. I have a large database (58mil rows) that I have figured out how to query down to what I want and then write this row in to a separate table. Here is my code so far:
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
where pollutantID=45
and fuelTypeID=2
and sourceTypeID=32;
I have about 60 different pollutant ID's, and currently I am manually changing the pollutantID number on line 5 and executing the script to write the row into my 'emissionfactors' table. Each run takes 45 seconds and I have several other fuel types and source types to do so this could take like 8 hours of clicking every 45 seconds. I have some training in matlab and thought I could put a while loop around the above code, create an index, and have it loop through from 1 to 184 on the pollutant IDs but I can't seem to get it to work.
Here are my goals:
- loop the pollutantID from 1 to 184.
-- not all integers are in this range, so need it to simply add one to the index and check to see if that number is found in the pollutantID column if the index is not found.
-- if the index number is found in the pollutant ID column, execute my above code to write the data into my other table

You do not need a while loop, all you need is to change your where clause to use a BETWEEN clause and also tell it what you want to base the average on by adding a GROUP BY clause
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
where pollutantID BETWEEN 1 AND 184
and fuelTypeID=2
and sourceTypeID=32
GROUP BY pollutantID , fuelTypeID, sourceTypeID;
If in fact you want the entire range of the pollutantID, fuelTypeID and sourceTypeID that exists you can just remove the where clause altogether.
insert into emissionfactors(pollutantID,fuelTypeID,sourceTypeID,emissionFactor)
select pollutantID,fuelTypeID,sourceTypeID,avg(ratePerDistance) as emissionFactor
from onroad_run_1.rateperdistance
GROUP BY pollutantID , fuelTypeID, sourceTypeID;
You also don't need to check if the row exists before executing the query, as if it doesn't exist and returns no rows it just won't insert any.
As to the speed issue, you will need to look at adding some table indexes to your table to improve performance. In this case an index that has pollutantID, fuelTypeID and sourceTypeID would speed things up greatly.
My advice, ask for help at work. It is better to admit early that you do not know how to do something and get proper help, as you also mention that you have different fuel types that you want, but the details of that are missing from your question.

Related

MySQL Layout needs validation

I would like to make sure, my table layout is production safe. Maybe you guys could give me some advice about the design. So my table looks like this:
AI_Index PartName String1 String2 ... String5 ... TINYINT1 TINYINT2 TINYINT3
1 Example L somestuff morestuff 0 1 0
2 Example X morestuff andmore 1 1 1
For clarification:
AI_Index is AutoIncrementing for each value added. PartName represents a filename. All the other strings and tinyints are describing the value String1. It will most likely not happen that PartName is unique, it will always be about 5-10 times in there. First estimate will bring about 1000 diffenrent parts. So the database will have about 5.000-10.000 rows.
I'm connecting to the DB with VB.NET and the MySQL connector. If you open a Part in SolidWorks, a query will check if the activedoc is present in the PartName column. If so, a UserForm will show up, displaying all values from column String1 with the values from String2, String3, String4, String5, Tinyint1, Tinyint2 and Tinyint3. There are about 40 people working with SolidWorks, changing activeparts frequently. That means ~500 queries a minute just checking if the part is present.
My questions are as follows:
Does it make sense to add PartName as Index? I read many times that a bad index decision can make the database slower.
How could a powerful query look like? I suggest if i create a view with a SELECT DISTINCT PartName the query for the active part will be faster. Is this right?
Does it make sense to create a MySQL function that returns a TINYINT if the ActiveDoc is present in PartName? Will the TableView or the function be faster?
Could you just:
SELECT String1, String2, String3, String4, String5, Tinyint1, Tinyint2, Tinyint3
WHERE PartName = 'Example';
If the result is empty, no need to show the form. Then you have the data at hand to show right away. one query, one result set.
DISTINCT will make it slower. You would have to do some stress testing to see if your server can handle 500 queries per minute as there are a lot of factors besides just the index and query.

Can I optimize such a MySQL query without using an index?

A simplified version of my MySQL db looks like this:
Table books (ENGINE=MyISAM)
id <- KEY
publisher <- LONGTEXT
publisher_id <- INT <- This is a new field that is currently null for all records
Table publishers (ENGINE=MyISAM)
id <- KEY
name <- LONGTEXT
Currently books.publisher holds values that keep getting repeated, but that the publishers.name holds uniquely.
I want to get rid of books.publisher and instead populate the books.publisher_id field.
The straightforward SQL code that describes what I want done, is as follows:
UPDATE books
JOIN publishers ON books.publisher = publishers.name
SET books.publisher_id = publishers.id;
The problem is that I have a big number of records, and even though it works, it's taking forever.
Is there a faster solution than using something like this in advance?:
CREATE INDEX publisher ON books (publisher(20));
Your question title says ".. optimize ... query without using an index?"
What have you got against using an index?
You should always examine the execution plan if a query is running slowly. I would guess it's having to scan the publishers table for each row in order to find a match. It would make sense to have an index on publishers.name to speed the lookup of an id.
You can drop the index later but it wouldn't harm to leave it in, since you say the process will have to run for a while until other changes are made. I imagine the publishers table doesn't get update very frequently so performance of INSERT and UPDATE on the table should not be an issue.
There are a few problems here that might be helped by optimization.
First of all, a few thousand rows doesn't count as "big" ... that's "medium."
Second, in MySQL saying "I want to do this without indexes" is like saying "I want to drive my car to New York City, but my tires are flat and I don't want to pump them up. What's the best route to New York if I'm driving on my rims?"
Third, you're using a LONGTEXT item for your publisher. Is there some reason not to use a fully indexable datatype like VARCHAR(200)? If you do that your WHERE statement will run faster, index or none. Large scale library catalog systems limit the length of the publisher field, so your system can too.
Fourth, from one of your comments this looks like a routine data maintenance update, not a one time conversion. So you need to figure out how to avoid repeating the whole deal over and over. I am guessing here, but it looks like newly inserted rows in your books table have a publisher_id of zero, and your query updates that column to a valid value.
So here's what to do. First, put an index on tables.publisher_id.
Second, run this variant of your maintenance query:
UPDATE books
JOIN publishers ON books.publisher = publishers.name
SET books.publisher_id = publishers.id
WHERE books.publisher_id = 0
LIMIT 100;
This will limit your update to rows that haven't yet been updated. It will also update 100 rows at a time. In your weekly data-maintenance job, re-issue this query until MySQL announces that your query affected zero rows (look at mysqli::rows_affected or the equivalent in your php-to-mysql interface). That's a great way to monitor database update progress and keep your update operations from getting out of hand.
Your update query has invalid syntax but you can fix that later. The way to get it to run faster is to add a where clause so that you are only updating the necessary records.

Is This A Dynamic Table? Or what

I came across this line of code in a Mysql Script Im trying to optimize(The script takes over 7 hours to run). I discovered that this line is responsible for over 60% of the exec time.
# #Fill temp table
SELECT
DISTINCT clv_temp(view01.user_email,,user_number) AS `Authentic`
FROM(
SELECT DISTINCT u_mail, u_phone
FROM
Cust_orders
ORDER BY order_date ASC
)view01;
The excessive runtime is presumably in the definition of the custom function clv_temp, so you will need to find the definition of that.
Note that currently this function is being run for every row returned by the sub-query - i.e. for every unique combination of u_mail and u_phone in the cust_orders table. This is generally a very inefficient way of processing data, and what you will probably need to do is implement the logic currently performed by clv_temp in a set-wise manner, rather than one row at a time.

Insert random number into table upon new record creation

I would like to store random numbers in one MySql table, randomly retrieve one and insert it into another table column each time a new record is created. I want to delete the retrieved number from the random number table as it is used.
The random numbers are 3 digit, there are 900 of them.
I have read several posts here that describe the problems using unique random numbers and triggering their insertion. I want to use this method as it seems to be reliable while generating few problems.
Can anyone here give me an example of a sql query that will accomplish the above? (If sql query is not the recommended way to do this please feel free to recommend a better method.)
Thank you for any help you can give.
I put together the two suggestions here and tried this trigger and query:
CREATE TRIGGER rand_num before
INSERT ON uau3h_users FOR EACH ROW
insert into uau3h_users (member_number)
select random_number from uau3h_rand900
where random_number not in (select member_number from uau3h_users)
order by random_number
limit 1
But it seems that there is already a trigger attached to that table so the new one cause a conflict, things stopped working until I removed it. Any ideas about how accomplish the same using another method?
You are only dealing with 900 records, so performance is not a major issue.
If you are doing a single insert into a table, you can do something like the following:
insert into t(rand)
select rand
from rand900
where rand not in (select rand from t)
order by rand()
limit 1
In other words, you don't have to continually delete from one table and move to the other. You can just choose to insert values that don't already exist. If performance is a concern, then indexes will help in this case.
More than likely you need to take a look into Triggers. You can do some stuff for instance after inserting a record in a table. Refer this link to more details.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html

Optimized SELECT query in MySQL

I have a very large number of rows in my table, table_1. Sometimes I just need to retrieve a particular row.
I assume, when I use SELECT query with WHERE clause, it loops through the very first row until it matches my requirement.
Is there any way to make the query jump to a particular row and then start from that row?
Example:
Suppose there are 50,000,000 rows and the id which I want to search for is 53750. What I need is: the search can start from 50000 so that it can save time for searching 49999 rows.
I don't know the exact term since I am not expert of SQL!
You need to create an index : http://dev.mysql.com/doc/refman/5.1/en/create-index.html
ALTER TABLE_1 ADD UNIQUE INDEX (ID);
The way I understand it, you want to select a row with id 53750. If you have a field named id you could do this:
SELECT * FROM table_1 WHERE id = 53750
Along with indexing the id field. That's the fastest way to do so. As far as I know.
ALTER table_1 ADD UNIQUE INDEX (<collumn>)
Would be a great first step if it has not been generated automatically. You can also use:
EXPLAIN <your query here>
To see which kind of query works best in this case. Note that if you want to change the where statement (anywhere in the future) but see a returning value in there it will be a good idea to put an index on that aswell.
Create an index on the column you want to do the SELECT on:
CREATE INDEX index_1 ON table_1 (id);
Then, select the row just like you would before.
But also, please read up on databases, database design and optimization. Your question is full of false assumptions. Don't just copy and paste our answers verbatim. Get educated!
There are several things to know about optimizing select queries like Range and Where clause Optimization, the documentation is pretty informative baout this issue, read the section: Optimizing SELECT Statements. Creating an index on the column you evaluate is very helpfull regarding performance too.
One possible solution You can create View then query from view. here is details of creating view and obtain data from view
http://www.w3schools.com/sql/sql_view.asp
now you just split that huge number of rows into many view (i. e row 1-10000 in one view then 10001-20000 another view )
then query from view.
I am pretty sure that any SQL database with a little respect for themselves does not start looping from the first row to get the desired row. But I am also not sure how they makes it work, so I can't give an exact answer.
You could check out what's in your WHERE-clause and how the table is indexed. Do you have a proper primary key? Like using a numeric data type for that. Do you have indexes on more columns, that is used in your queries?
There is also alot to concider when installing the database server, like where to put the data and log files, how much memory to give the server and setting the growth. There's a lot you can do to tune your server.
You could try and split your tables in partitions
More about alter tables to add partitions
Selecting from a specific partition
In your case you could create a partition on ID for every 50.000 rows and when you want to skip the first 50.000 you just select from partition 2. How to do this ies explained quite well in the MySQL documentation.
You may try simple as this one.
query = "SELECT * FROM tblname LIMIT 50000,0
i just tried it with phpmyadmin. WHERE the "50,000" is the starting row to look up.
EDIT :
But if i we're you i wouldn't use this one, because it will lapses the 1 - 49999 records to search.