How to tune the following MySQL query? - mysql

I am using the following MySQL query which is working fine, I mean giving me the desired output but... lets first see the query:
select
fl.file_ID,
length(fl.filedesc) as l,
case
when
fl.part_no is null
and l>60
then concat(fl.fileno,' ', left(fl.filedesc, 60),'...')
when
fl.part_no is null
and length(fl.filedesc)<=60
then concat(fl.fileno,' ',fl.filedesc)
when
fl.part_no is not null
and length(fl.filedesc)>60
then concat(fl.fileno,'(',fl.part_no,')', left(fl.filedesc, 60),'...')
when
fl.part_no is not null
and length(fl.filedesc)<=60
then concat(fl.fileno,'(',fl.part_no,')',fl.filedesc)
end as filedesc
from filelist fl
I don't want to use the length function repeatedly because I guess it would hit the database everytime claiming performance issue. Please suggest if I can store the length once and use it several times.

Once you have accessed a given row, what you do with the columns has only a small impact on performance. So your guess that it "hits the database" more to serve repeated use of that length function isn't as bad as you think.
The analogy I would use is a postal carrier delivering mail to your house, which is miles outside of town. He drives for 20 minutes to your mailbox, and then he worries that it takes too much time to insert one letter at a time into your mailbox, instead of all the letters at once. The cost of that inefficiency is insignificant compared to the long drive.
That said, you can make the query more concise or easier to code or to look at. But this probably won't have a big benefit for performance.
select
fl.file_ID,
concat(fl.fileno,
ifnull(concat('(',fl.part_no,')'), ' '),
left(fl.filedesc,60),
if(length(fl.filedesc)>60,'...','')
) as filedesc
from filelist fl

Related

Snowflake ST_POLYGON(TO_GEOGRAPHY(...)) Is Inefficient

I have a few queries that use geospatial conditions. These queries are running surprisingly slow. Initially I thought it was the geospatial calculation itself, but stripping everything down to just ST_POLYGON(TO_GEOGRAPHY(...)), it is still very slow. This would make sense if each row had it's own polygon, but the condition uses a static polygon in the query:
SELECT
ST_POLYGON(TO_GEOGRAPHY('LINESTRING(-95.75122850074004 28.793166796020444,-95.68622920563344 30.207416499279063,-94.5162418937178 32.56537633083211,-90.94128066286225 34.24734047810797,-88.17881062083825 36.812423897251634,-86.13133282498448 38.15341651409619,-85.28634198860107 38.66275098353796,-84.37635185711038 38.789523129087826,-82.84886842210855 38.4848923369382,-82.32887406125734 37.820427257446994,-82.26387476615074 36.96838022284757,-82.03637723327772 36.00158943485101,-80.99638851157454 35.34155096040939,-78.52641529752944 34.62260477275565,-77.51892622337955 34.005211031324734,-78.26641811710381 31.1020568651834,-80.24889661785029 29.926151366059756,-83.59636031583283 28.793166796020444,-95.75122850074004 28.793166796020444)'))
FROM TABLE(GENERATOR(ROWCOUNT=>1000000))
Snowflake should be able to figure out that it only needs to calculate this polygon once for the entire query. Yet, the more rows that are added, the slower it gets. On an x-small this query takes over a minute. Where this query:
SELECT
'LINESTRING(-95.75122850074004 28.793166796020444,-95.68622920563344 30.207416499279063,-94.5162418937178 32.56537633083211,-90.94128066286225 34.24734047810797,-88.17881062083825 36.812423897251634,-86.13133282498448 38.15341651409619,-85.28634198860107 38.66275098353796,-84.37635185711038 38.789523129087826,-82.84886842210855 38.4848923369382,-82.32887406125734 37.820427257446994,-82.26387476615074 36.96838022284757,-82.03637723327772 36.00158943485101,-80.99638851157454 35.34155096040939,-78.52641529752944 34.62260477275565,-77.51892622337955 34.005211031324734,-78.26641811710381 31.1020568651834,-80.24889661785029 29.926151366059756,-83.59636031583283 28.793166796020444,-95.75122850074004 28.793166796020444)'
FROM TABLE(GENERATOR(ROWCOUNT=>3000000))
(added 2mm more rows to match the byte count)
Can complete in 2s.
I tried "precomputing" the polygon myself with a WITH statement but SF figures out the WITH is redundant and drops it. I also tried setting a session variable, but you can't set a complex value like this one as a variable.
I believe this is a bug.
Geospatial functions are in preview for now, and the team is working hard on all kind of optimizations.
For this case I want to note that making the polygon a single row table would help, but I would still expect better performance as the team gets this feature out of beta.
Let me create a table with one row, the polygon:
create or replace temp table poly1
as
select ST_POLYGON(TO_GEOGRAPHY('LINESTRING(-95.75122850074004 28.793166796020444,-95.68622920563344 30.207416499279063,-94.5162418937178 32.56537633083211,-90.94128066286225 34.24734047810797,-88.17881062083825 36.812423897251634,-86.13133282498448 38.15341651409619,-85.28634198860107 38.66275098353796,-84.37635185711038 38.789523129087826,-82.84886842210855 38.4848923369382,-82.32887406125734 37.820427257446994,-82.26387476615074 36.96838022284757,-82.03637723327772 36.00158943485101,-80.99638851157454 35.34155096040939,-78.52641529752944 34.62260477275565,-77.51892622337955 34.005211031324734,-78.26641811710381 31.1020568651834,-80.24889661785029 29.926151366059756,-83.59636031583283 28.793166796020444,-95.75122850074004 28.793166796020444)'
)) polygon
;
To see if this would help, I tried a one million rows cross join:
select *
from poly1, TABLE(GENERATOR(ROWCOUNT=>1000000));
It takes 14 seconds, and in the query profiler you can see most time was spent on an internal TO_OBJECT​(​GET_PATH​(​POLY1​.​POLYGON, '_shape'​)​​.
What's interesting to note is that the previous operation is mostly concerned with the ascii representation of the polygon. Running operations over this polygon is much quicker:
select st_area(polygon)
from poly1, TABLE(GENERATOR(ROWCOUNT=>1000000));
This query should have taken longer (finding the area of a polygon sounds more complicated than just selecting it), but turns out it only took 7 seconds (~half).
Thanks for the report, and the team will continue to optimize cases like this.
For anyone curious about the particular polygon in the question - it's a nice heart:

MySQL: using calculated counter column as key for sub-query

Sorry, long pre-history, but it is needed to clarify the question.
In my org the computers have names like CNT30[0-9]{3}[1-9a-z], for example cnt300021 or cnt30253a.
Last symbol is a "qualifier", so single workplace may have equally named computers assigned to it, distinguished by this qualifier. For example cnt300021 may mean desktop computer on workplace #002, and cnt30002a may mean notebook assigned for same workplace. Workplaces are "virtual" and their existence made just for our (IT dept) convenience.
Each dept have its own unique range [0-9]{3}. For example, computers of accounting have names starting cnt302751 upto cnt30299z which gives them 25 unique workplaces max, with up to 35 computers per workplace. (IRL most users have one desktop PC, much lesser have desktop and notebook, and only 2 or 3 technicians have more than one notebook at their disposal).
Recently, doing some inventory of computers' passports (unsure about a term: a paper, which means for computer the same, what a passport means for human), I found that there some holes in sequential numbering. For example, we have cnt302531 and cnt302551, but have no cnt302541, which means that there's no workplace #254.
What I want to do? I want to find these gaps without manual searching. For this I need a cycle from 1 to MaxComp=664 (no more workplace numbers assigned yet)
That's what I could write using some pseudo-SQL-BASIC:
for a=0 to MaxComp
a$="CNT30"+right(a+1000,3)
'comparing only 8 leftmost characters, ignoring 9th one - the qualifier
b$=(select name from table where left(name,8) like a$)
print a$;b$
next a
That code should give me two colummns: possible names and existing ones.
But I can't figure out how to implement this in SQL-query. What I tried:
# because of qualifier there may be several computers with same
# 8 leftmost characters
select #cnum:=#cnum+1 as CompNum, group_concat(name separator ',')
# PCs are inventoried by OCS-NG Inventory software
from hardware
cross join (select #cnum:=0) cnt
where left(hardware.name,8)=concat('CNT30',right(#cnum+1000,3))
limit 100
But this construct returns exactly one row. And I can't understand, if it is possible without using the stored procedures, and what I did wrong if it is possible?
Found working path:
(at first I tried to use stored function)
CREATE FUNCTION `count_comps`(num smallint) RETURNS tinytext CHARSET utf8
BEGIN
return (select group_concat(name separator ',')
from hardware where left(hardware.name,8)=concat('CNT30',right(num+1000,3))
);
END
Then I tried hard to replicate function's results in subquery. And I did it! Note: the inner select returns exactly same results as function does
# Starting point. May be INcreased to narrow the results list
set #cnum:=0;
select
#cnum:=#cnum+1 as CompNum,
concat('CNT30',right(#cnum+1000,3)) as CalcNum,
# this
count_comps(#cnum) as hwns,
# and this gives equal results
(select group_concat(name separator ',')
from hardware where left(name,8)=calcnum
) hwn2
from hardware
# no more dummy tables here
# Ending point. May be DEcreased to narrow the results list
where #cnum<665;
So, the wrong part of "classical" approach was the usage of dummy table, which seems to be not necessary.
Partial results example (starting set #cnum:=479;, finishing where #cnum<530;):
CompNum, CalcNum, hwns, hwn2
'488', 'CNT30488', 'CNT304881', 'CNT304881'
'489', 'CNT30489', 'CNT304892', 'CNT304892'
'490', 'CNT30490', 'CNT304901,CNT304902,CNT304903', CNT304901,CNT304902,CNT304903'
'491', 'CNT30491', NULL, NULL
'492', 'CNT30492', NULL, NULL
'493', 'CNT30493', 'CNT304932', 'CNT304932'
'494', 'CNT30494', 'CNT304941', 'CNT304941'
I found that there no workplaces #491 and #492. On next adding the PCs for the 'October Region' dept (range 480-529), at least two of new PCs will get names CNT304911 and CNT304921, filling this gap.

MySQL order by problems

I have the following codes..
echo "<form><center><input type=submit name=subs value='Submit'></center></form>";
$val=$_POST['resulta']; //this is from a textarea name='resulta'
if (isset($_POST['subs'])) //from submit name='subs'
{
$aa=mysql_query("select max(reservno) as 'maxr' from reservation") or die(mysql_error()); //select maximum reservno
$bb=mysql_fetch_array($aa);
$cc=$bb['maxr'];
$lines = explode("\n", $val);
foreach ($lines as $line) {
mysql_query("insert into location_list (reservno, location) values ('$cc', '$line')")
or die(mysql_error()); //insert value of textarea then save it separately in location_list if \n is found
}
If I input the following data on the textarea (assume that I have maximum reservno '00014' from reservation table),
Davao - Cebu
Cebu - Davao
then submit it, I'll have these data in my location_list table:
loc_id || reservno || location
00001 || 00014 || Davao - Cebu
00002 || 00014 || Cebu - Davao
Then this code:
$gg=mysql_query("SELECT GROUP_CONCAT(IF((#var_ctr := #var_ctr + 1) = #cnt,
location,
SUBSTRING_INDEX(location,' - ', 1)
)
ORDER BY loc_id ASC
SEPARATOR ' - ') AS locations
FROM location_list,
(SELECT #cnt := COUNT(1), #var_ctr := 0
FROM location_list
WHERE reservno='$cc'
) dummy
WHERE reservno='$cc'") or die(mysql_error()); //QUERY IN QUESTION
$hh=mysql_fetch_array($gg);
$ii=$hh['locations'];
mysql_query("update reservation set itinerary = '$ii' where reservno = '$cc'")
or die(mysql_error());
is supposed to update reservation table with 'Davao - Cebu - Davao' but it's returning this instead, 'Davao - Cebu - Cebu'. I was previously helped by this forum to have this code working but now I'm facing another difficulty. Just can't get it to work. Please help me. Thanks in advance!
I got it working (without ORDER BY loc_id ASC) as long as I set phpMyAdmin operations loc_id ascending. But whenever I delete all data, it goes back as loc_id descending so I have to reset it. It doesn't entirely solve the problem but I guess this is as far as I can go. :)) I just have to make sure that the table column loc_id is always in ascending order. Thank you everyone for your help! I really appreciate it! But if you have any better answer, like how to set the table column always in ascending order or better query, etc, feel free to post it here. May God bless you all!
The database server is allowed to rewrite your query to optimize its execution. This might affect the order of the individual parts, in particular the order in which the various assignments are executed. I assume that some such reodering causes the result of the query to become undefined, in such a way that it works on sqlfiddle but not on your actual production system.
I can't put my finger on the exact location where things go wrong, but I believe that the core of the problem is the fact that SQL is intended to work on relations, but you try to abuse it for sequential programming. I suggest you retrieve the data from the database using portable SQL without any variable hackery, and then use PHP to perform any post-processing you might need. PHP is much better suited to express the ideas you're formulating, and no optimization or reordering of statements will get in your way there. And as your query currently only results in a single value, fetching multiple rows and combining them into a single value in the PHP code shouldn't increase complexety too much.
Edit:
While discussing another answer using a similar technique (by Omesh as well, just as the answer your code is based upon), I found this in the MySQL manual:
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed. The order of
evaluation for expressions involving user variables is undefined and
may change based on the elements contained within a given statement;
in addition, this order is not guaranteed to be the same between
releases of the MySQL Server.
So there are no guarantees about the order these variable assignments are evaluated, therefore no guarantees that the query does what you expect. It might work, but it might fail suddenly and unexpectedly. Therefore I strongly suggest you avoid this approach unless you have some relaibale mechanism to check the validity of the results, or really don't care about whether they are valid.

Geo IP database query problem

I'm running this query
SELECT
country,
countries.code,
countries.lat,
countries.lng,
countries.zoom,
worldip.start,
worldip.end
FROM countries, worldip
WHERE countries.code = worldip.code
AND
'91.113.120.5' BETWEEN worldip.start AND worldip.end
ORDER BY worldip.start DESC
on a tables with these fields,
worldip countries
-------------- ----------------
start code
end country
code lat
country_name lng
zoom
And sometimes I'm getting two results in two different countries for one ip. I understand why
'91.113.120.5' BETWEEN worldip.start AND worldip.end
would return two different results since 10 is between 9 and 11, but also 5 and 12. I would have thought including WHERE countries.code = worldip.code would have prevented this, or at least ensure I got the right country no matter how many results it returned. but it doesn't.
I also added ORDER BY worldip.start DESC which seems to work since the more accurate an ip adress, the higher up the list it appears. you can see it working (or not) here . But that's a quick fix and I'd like to do it right.
SQL is a real weak point for me. Can anyone explain what I'm doing wrong?
Firstly nice app. I was looking for flights - I would love price comparisons and no #based links please. You could try a free geolocator service instead of using your own geoip database. That aside is your ip field of 'IP' datatype in MySQL allowing comparison ? This may help you get correct ordinality. Otherwise the stuff is compared as strings and problems may arise where the length of IP's is different and so on.
With integer representation of IP's you can use the <= and >= operators.

Practical limit to length of SQL query (specifically MySQL)

Is it particularly bad to have a very, very large SQL query with lots of (potentially redundant) WHERE clauses?
For example, here's a query I've generated from my web application with everything turned off, which should be the largest possible query for this program to generate:
SELECT *
FROM 4e_magic_items
INNER JOIN 4e_magic_item_levels
ON 4e_magic_items.id = 4e_magic_item_levels.itemid
INNER JOIN 4e_monster_sources
ON 4e_magic_items.source = 4e_monster_sources.id
WHERE (itemlevel BETWEEN 1 AND 30)
AND source!=16 AND source!=2 AND source!=5
AND source!=13 AND source!=15 AND source!=3
AND source!=4 AND source!=12 AND source!=7
AND source!=14 AND source!=11 AND source!=10
AND source!=8 AND source!=1 AND source!=6
AND source!=9 AND type!='Arms' AND type!='Feet'
AND type!='Hands' AND type!='Head'
AND type!='Neck' AND type!='Orb'
AND type!='Potion' AND type!='Ring'
AND type!='Rod' AND type!='Staff'
AND type!='Symbol' AND type!='Waist'
AND type!='Wand' AND type!='Wondrous Item'
AND type!='Alchemical Item' AND type!='Elixir'
AND type!='Reagent' AND type!='Whetstone'
AND type!='Other Consumable' AND type!='Companion'
AND type!='Mount' AND (type!='Armor' OR (false ))
AND (type!='Weapon' OR (false ))
ORDER BY type ASC, itemlevel ASC, name ASC
It seems to work well enough, but it's also not particularly high traffic (a few hundred hits a day or so), and I wonder if it would be worth the effort to try and optimize the queries to remove redundancies and such.
Reading your query makes me want to play an RPG.
This is definitely not too long. As long as they are well formatted, I'd say a practical limit is about 100 lines. After that, you're better off breaking subqueries into views just to keep your eyes from crossing.
I've worked with some queries that are 1000+ lines, and that's hard to debug.
By the way, may I suggest a reformatted version? This is mostly to demonstrate the importance of formatting; I trust this will be easier to understand.
select *
from
4e_magic_items mi
,4e_magic_item_levels mil
,4e_monster_sources ms
where mi.id = mil.itemid
and mi.source = ms.id
and itemlevel between 1 and 30
and source not in(16,2,5,13,15,3,4,12,7,14,11,10,8,1,6,9)
and type not in(
'Arms' ,'Feet' ,'Hands' ,'Head' ,'Neck' ,'Orb' ,
'Potion' ,'Ring' ,'Rod' ,'Staff' ,'Symbol' ,'Waist' ,
'Wand' ,'Wondrous Item' ,'Alchemical Item' ,'Elixir' ,
'Reagent' ,'Whetstone' ,'Other Consumable' ,'Companion' ,
'Mount'
)
and ((type != 'Armor') or (false))
and ((type != 'Weapon') or (false))
order by
type asc
,itemlevel asc
,name asc
/*
Some thoughts:
==============
0 - Formatting really matters, in SQL even more than most languages.
1 - consider selecting only the columns you need, not "*"
2 - use of table aliases makes it short & clear ("MI", "MIL" in my example)
3 - joins in the WHERE clause will un-clutter your FROM clause
4 - use NOT IN for long lists
5 - logically, the last two lines can be added to the "type not in" section.
I'm not sure why you have the "or false", but I'll assume some good reason
and leave them here.
*/
Default MySQL 5.0 server limitation is "1MB", configurable up to 1GB.
This is configured via the max_allowed_packet setting on both client and server, and the effective limitation is the lessor of the two.
Caveats:
It's likely that this "packet" limitation does not map directly to characters in a SQL statement. Surely you want to take into account character encoding within the client, some packet metadata, etc.)
SELECT ##global.max_allowed_packet
this is the only real limit it's adjustable on a server so there is no real straight answer
From a practical perspective, I generally consider any SELECT that ends up taking more than 10 lines to write (putting each clause/condition on a separate line) to be too long to easily maintain. At this point, it should probably be done as a stored procedure of some sort, or I should try to find a better way to express the same concept--possibly by creating an intermediate table to capture some relationship I seem to be frequently querying.
Your mileage may vary, and there are some exceptionally long queries that have a good reason to be. But my rule of thumb is 10 lines.
Example (mildly improper SQL):
SELECT x, y, z
FROM a, b
WHERE fiz = 1
AND foo = 2
AND a.x = b.y
AND b.z IN (SELECT q, r, s, t
FROM c, d, e
WHERE c.q = d.r
AND d.s = e.t
AND c.gar IS NOT NULL)
ORDER BY b.gonk
This is probably too large; optimizing, however, would depend largely on context.
Just remember, the longer and more complex the query, the harder it's going to be to maintain.
Most databases support stored procedures to avoid this issue. If your code is fast enough to execute and easy to read, you don't want to have to change it in order to get the compile time down.
An alternative is to use prepared statements so you get the hit only once per client connection and then pass in only the parameters for each call
I'm assuming you mean by 'turned off' that a field doesn't have a value?
Instead of checking if something is not this, and it's also not that etc. can't you just check if the field is null? Or set the field to 'off', and check if type or whatever equals 'off'.