mysql performance difference between where id = vs where id IN() - mysql

I am very new to mysql . I had a query, from this query i am getting some stock_ids.In the second query i need to pass those stock_ids. For this i am fetching all the stock_ids in a array.Now my question is in which way can i pass those stock_ids to the second query. For this i have two approaches.
First approach:
$array_cnt = count($stockid_array);
for($i = 0; $i<$array_cnt;$i++)
{
$sql = "select reciever_id,sender_identifier,unique_stock_id,vat_number,tax_number from stocktable where stock_id = '".$stockid_array[$i]."'";
// my while loop
}
Another approach is
$sql = "reciever_id,sender_identifier,unique_stock_id,vat_number,tax_number from stocktable where stock_id in ('1','2','3','4','5','6','7','8','9');
//while loop comes here.
So which approach gives good performance ? Please guide me.

MySQL has a nice optimization for constants in an in list -- it basically sorts the values and uses a binary search. That means that the in is going to be faster. Also, in can take advantage of an index on the column.
In addition, running a single query should be faster than running multiple queries.
So, the in version should be better than running multiple queries with =.

Related

Perl/MySQL Relationship Query

I have the following perl code that will eventually be a webpage:
my($dbh) = DBI->connect("DBI:mysql:host=dbsrv;database=database","my_sqlu","my_sqlp") or die "Canny Connect";
my($sql) = "SELECT * FROM hardware where srv_name = \"$srv_name\"";
my($sth) = $dbh->prepare($sql);
$sth->execute();
$sth->bind_col( 1, \my($db_id));
$sth->bind_col( 2, \my($db_srv_name));
$sth->bind_col( 5, \my($db_site));
$sth->fetchrow();
$sth->finish ();
my($sql) = "SELECT sites.\`site_code\`, sites.\`long_name\` FROM \`hardware\` JOIN \`sites\` ON \`sites\`.id=\`hardware\`.\`site\` where \`hardware\`.\`id\`=\'$db_id\'";
my($sth) = $dbh->prepare($sql);
$sth->execute();
$sth->bind_col( 1, \my($db_site_code));
$sth->bind_col( 2, \my($db_long_name));
$sth->fetchrow();
$sth->finish ();
$dbh->disconnect;
print "$db_site_code<br>$db_long_name";
The query above does work however what I'm trying to find out is there any way I can run one SQL query and get the db_site_code and db_long_name from the sites DB without running the 2nd query? The hardware DB has the foreign key 'id' in the sites Db.
When you read anything about relational DBs they all say it's by far the most efficient method of getting data from your database but I just can't see how this is any quicker than just running 2 select queries. What I've done above would surely take longer than "select from hardware where srv_name = $srv_name" then "select from sites where id = db_site_id"? Any comments are greatly appreciated.
Here's an example of how to do this with placeholders as well as a combined query. If I understand your DB correctly, you can just omit the first query and add the server name instead of the ID in the second query. I might be mistaken there, but my example will still be of value for the Perl suggestions.
use strict;
use warnings;
use DBI;
# Create DB connection
my $dbh = DBI->connect("DBI:mysql:host=dbsrv;database=database","my_sqlu","my_sqlp")
or die "Cannot connect to database";
# Create the statement handle
my $sth = $dbh->prepare(<<'SQLQUERY') or die $dbh->errstr;
SELECT s.site_code, s.long_name
FROM hardware h
JOIN sites s ON s.id=h.site
WHERE h.srv_name=?
SQLQUERY
$sth->execute('Server Name'); # There's the parameter
my $res = $sth->fetchrow_hashref; # $res now has a hash ref with the first row
print "$res->{'site_code'}<br>$res->{'long_name'}";
There were a few issues with your code I'd like to point out to you:
You should always use strict and use warnings. They make your life easier!
You can leave the parens ( and ) out with my. Saves you keystrokes and makes your code more readable.
You can (but do not have to, this is preference!) leave out the parens after method calls that do not have arguments. Decide this for yourself.
As was already pointed out, always use placeholders with DBI. They are very simple. Now you don't have to escape the " with backslashes. Instead, just use ?.
Once you've combined your query, you can put it in a heredoc (<<'SQLQUERY'). It's a string that lasts from the next line to the delimiter (SQLQUERY). That way, your query is easier to read.
You can use one of the ref-fetchrow-methods to get all your result's columns into one hash. I used $sth->fetchrow_hashref because I find it most convenient. You've got the complete row and all the columns are named hash keys.
If called in a small scope (like a short sub), you don't need to finish a statement handle. It will be finished and destroyed by Perl automatically once it goes out of scope.
Another thing about performance: If this is just run occasionally, don't worry about it. You can profile your queries with DBI::Profile to see which way it is faster, but you should only do that if you really need to.
In my experience, especially with very huge queries and a very busy database, two or three queries are a lot better than a single big one because they do not take over the servers resources. But again, that is something you need to profile and benchmark (if the need arises).
Aside from #tadman's recommendation to use placeholders, I'd tag this as a sql question as well, but your solution is to simply add
srv_name = \"$srv_name\"
to your second where clause, so that your statement is:
"SELECT sites.\`site_code\`, sites.\`long_name\` FROM \`hardware\` JOIN \`sites\` ON \`sites\`.id=\`hardware\`.\`site\` where \`hardware\`.\`id\`=\'$db_id\'";
I strongly second #tadman's suggestion though -- use prepared statements and/or placeholders whenever possible.

Way to get SQL statement to be quicker

Is there a way to speed up the performance of this query.
I have indexes on tswProjectID and tswWeekEdning.
This SQL was generated from my Linq statement which is
what I want to use in my C# code.
Is there a more efficient way to write this?
var qry = (from tsw in TimesheetWeeklies where tsw.TswProjectID == 8263 select tsw).OrderByDescending(x => x.TswWeekEnding).FirstOrDefault();
SELECT TOP (1) [t0].[tswID] AS [TswID]
FROM [TimesheetWeekly] AS [t0]
WHERE [t0].[tswProjectID] = 8263
ORDER BY [t0].[tswWeekEnding] DESC
Try create an index with both columns in it (tswProjectID, tswWeekEnding)
It wont make the query any faster but if you make it a compiled query you could possibly save some time that it takes to build the query if it's done more than once, more info here:
http://msdn.microsoft.com/en-us/library/bb399335.aspx

Can we control LINQ expression order with Skip(), Take() and OrderBy()

I'm using LINQ to Entities to display paged results. But I'm having issues with the combination of Skip(), Take() and OrderBy() calls.
Everything works fine, except that OrderBy() is assigned too late. It's executed after result set has been cut down by Skip() and Take().
So each page of results has items in order. But ordering is done on a page handful of data instead of ordering of the whole set and then limiting those records with Skip() and Take().
How do I set precedence with these statements?
My example (simplified)
var query = ctx.EntitySet.Where(/* filter */).OrderByDescending(e => e.ChangedDate);
int total = query.Count();
var result = query.Skip(n).Take(x).ToList();
One possible (but a bad) solution
One possible solution would be to apply clustered index to order by column, but this column changes frequently, which would slow database performance on inserts and updates. And I really don't want to do that.
EDIT
I ran ToTraceString() on my query where we can actually see when order by is applied to the result set. Unfortunately at the end. :(
SELECT
-- columns
FROM (SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent1
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent2
WHERE (Extent1.ID = Extent2.ID) AND (Extent2.userId = :p__linq__4)
)
) AS Project2
limit 0,10 ) AS Limit1
LEFT OUTER JOIN (SELECT
-- columns
FROM table2 AS Extent3 ) AS Project3 ON Limit1.ID = Project3.ID
UNION ALL
SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent4
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent5
WHERE (Extent4.ID = Extent5.ID) AND (Extent5.userId = :p__linq__4)
)
) AS Project6
limit 0,10 ) AS Limit2
INNER JOIN table3 AS Extent6 ON Limit2.ID = Extent6.ID) AS UnionAll1
ORDER BY UnionAll1.ChangedDate DESC, UnionAll1.ID ASC, UnionAll1.C1 ASC
My workaround solution
I've managed to workaround this problem. Don't get me wrong here. I haven't solved precedence issue as of yet, but I've mitigated it.
What I did?
This is the code I've used until I get an answer from Devart. If they won't be able to overcome this issue I'll have to use this code in the end.
// get ordered list of IDs
List<int> ids = ctx.MyEntitySet
.Include(/* Related entity set that is needed in where clause */)
.Where(/* filter */)
.OrderByDescending(e => e.ChangedDate)
.Select(e => e.Id)
.ToList();
// get total count
int total = ids.Count;
if (total > 0)
{
// get a single page of results
List<MyEntity> result = ctx.MyEntitySet
.Include(/* related entity set (as described above) */)
.Include(/* additional entity set that's neede in end results */)
.Where(string.Format("it.Id in {{{0}}}", string.Join(",", ids.ConvertAll(id => id.ToString()).Skip(pageSize * currentPageIndex).Take(pageSize).ToArray())))
.OrderByDescending(e => e.ChangedOn)
.ToList();
}
First of all I'm getting ordered IDs of my entities. Getting only IDs is well performant even with larger set of data. MySql query is quite simple and performs really well. In the second part I partition these IDs and use them to get actual entity instances.
Thinking of it, this should perform even better than the way I was doing it at the beginning (as described in my question), because getting total count is much much quicker due to simplified query. The second part is practically very very similar, except that my entities are returned rather by their IDs instead of partitioned using Skip and Take...
Hopefully someone may find this solution helpful.
I haven't worked directly with Linq to Entities, but it should have a way to hook specific stored procedures into certain locations when needed. (Linq to SQL did.) If so, you could turn this query into a stored procedure, doing exacly what is required, and doing it efficiently.
Assuming from you comment the persisting the values in a List is not acceptable:
There's no way to completely minimize the iterations, as you intended (and as I would have tried too, living in hope). Cutting the iterations down by one would be nice. Is it possible to just get the Count once and cache/session it? Then you could:
int total = ctx.EntitySet.Count; // Hopefully you can not repeat doing this.
var result = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).Skip(n).Take(x).ToList();
Hopefully you can cache the Count somehow, or avoid needing it every time. Even if you can't, this is the best you can do.
Could you please create a sample illusrating the problem and send it to us (support * devart * com, subject "EF: Skip, Take, OrderBy")?
Hope we will be able to help you.
You can also contact us using our forums or contact form.
Are you absolutely certain the ordering is off? What does the SQL look like?
Can you reorder your code as follows and post the output?
// Redefine your queries.
var query = ctx.EntitySet.Where(/* filter */).OrderBy(e => e.ChangedDate);
var skipped = query.Skip(n).Take(x);
// let's look at the SQL, shall we?
var querySQL = query.ToTraceString();
var skippedSQL = skipped.ToTraceString();
// actual execution of the queries...
int total = query.Count();
var result = skipped.ToList();
Edit:
I'm absolutely certain. You can check my "edit" to see trace result of my query with skipped trace result that is imperative in this case. Count is not really important.
Yeah, I see it. Wow, that's a stumper. Might even be an outright bug. I note you're not using SQL Server... what DB are you using? Looks like it might be MySQl.
One way:
var query = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).ToList();
int total = query.Count;
var result = query.Skip(n).Take(x).ToList();
Convert it to a List before skipping. It's not too efficient, mind you...

Mysql match...against vs. simple like "%term%"

What's wrong with:
$term = $_POST['search'];
function buildQuery($exploded,$count,$query)
{
if(count($exploded)>$count)
{
$query.= ' AND column LIKE "%'. $exploded[$count] .'%"';
return buildQuery($exploded,$count+1,$query);
}
return $query;
}
$exploded = explode(' ',$term);
$query = buildQuery($exploded,1,
'SELECT * FROM table WHERE column LIKE "%'. $exploded[0] .'%"');
and then query the db to retrieve the results in a certain order, instead of using the myIsam-only sql match...against?
Would it dawdle performance dramatically?
The difference is in the algorithm's that MySQL uses behind the scenes find your data. Fulltext searches also allow you sort based on relevancy. The LIKE search in most conditions is going to do a full table scan, so depending on the amount of data, you could see performance issues with it. The fulltext engine can also have performance issues when dealing with large row sets.
On a different note, one thing I would add to this code is something to escape the exploded values. Perhaps a call to mysql_real_escape_string()
You can check out my recent presentation I did for MySQL University:
http://forge.mysql.com/wiki/Practical_Full-Text_Search_in_MySQL
Slides are also here:
http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
In my test, using LIKE '%pattern%' was more than 300x slower than using a MySQL FULLTEXT index. My test data was 1.5 million posts from the StackOverflow October data dump.

How do I know how many rows a Perl DBI query returns?

I'm trying to basically do a search through the database with Perl to tell if there is an item with a certain ID. This search can return no rows, but it can also return one.
I have the following code:
my $th = $dbh->prepare(qq{SELECT bi_exim_id FROM bounce_info WHERE bi_exim_id = '$exid'});
$th->execute();
if ($th->fetch()->[0] != $exid) {
...
Basically, this tries to see if the ID was returned and if it's not, continue with the script. But it is throwing a Null array reference error on the $th->fetch()->[0] thing.
How can i just simply check to see if it returned rows or now?
The DBD::mysql driver has a the rows() method that can return the count of the results:
$sth = $dbh->prepare( ... );
$sth->execute;
$rows = $sth->rows;
This is database-driver specific, so it might not work in other drivers, or it might work differently in other drivers.
my $th = $dbh->prepare(qq{SELECT bi_exim_id FROM bounce_info WHERE bi_exim_id = '$exid'});
$th->execute();
my $found = 0;
while ($th->fetch())
{
$found = 1;
}
Your query won't return anything if the row doesn't exist, so you can't de-reference the fetch.
Update: you might want to re-write that as
my $found = $th->fetch();
Why don't you just "select count(*) ..."??
my ($got_id) = $dbh->selectrow_array("SELECT count(*) from FROM bounce_info WHERE bi_exim_id = '$exid'");
Or to thwart Little Bobby Tables:
my $q_exid = $dbh->quote($exid);
my ($got_id) = $dbh->selectrow_array("SELECT count(*) from FROM bounce_info WHERE bi_exim_id = $q_exid");
Or if you're going to execute this a lot:
my $sth = $dbh->prepare("SELECT count(*) from FROM bounce_info WHERE bi_exim_id = ?");
....save $sth (or use prepare_cached()) and then later
my ($got_id) = $dbh->selectrow_array($sth, undef, $exid);
Change select to always return something? This should work in Sybase, dunno about other DBs.
my $th = $dbh->prepare(qq{SELECT count(*) FROM bounce_info WHERE bi_exim_id = '$exid'});
$th->execute();
if ($th->fetch()->[0]) {
....
}
In general, I'm not sure why people are so afraid of exceptions. You catch them and move on.
my $sth = prepare ...
$sth->execute;
my $result = eval { $sth->fetchrow_arrayref->[1] };
if($result){ say "OH HAI. YOU HAVE A RESULT." }
else { say "0 row(s) returned." }
In this case, though, Paul's answer is best.
Also, $sth->rows doesn't usually work until you have fetched every row. If you want to know how many rows match, then you have to actually ask the database engine the question you want to know the answer to; namely select count(1) from foo where bar='baz'.
The only reliable way to find out how many rows are returned by a SELECT query is to fetch them all and count them. As stated in the DBI documentation:
Generally, you can only rely on a row
count after a non-SELECT execute (for
some specific operations like UPDATE
and DELETE), or after fetching all the
rows of a SELECT statement.
For SELECT
statements, it is generally not
possible to know how many rows will be
returned except by fetching them all.
Some drivers will return the number of
rows the application has fetched so
far, but others may return -1 until
all rows have been fetched. So use of
the rows method or $DBI::rows with
SELECT statements is not recommended.
However, when you get down to it, you almost never need to know that in advance anyhow. Just loop on while ($sth->fetch) to process each row or, for the special case of a query that will only return zero or one rows,
if ($sth->fetch) {
# do stuff for one row returned
} else {
# do stuff for no rows returned
}
This article may users of MySQL prior to version 8:
http://www.arraystudio.com/as-workshop/mysql-get-total-number-of-rows-when-using-limit.html
Luckily since MySQL 4.0.0 you can use SQL_CALC_FOUND_ROWS option in your query which will tell MySQL to count total number of rows disregarding LIMIT clause. You still need to execute a second query in order to retrieve row count, but it’s a simple query and not as complex as your query which retrieved the data.
Usage is pretty simple. In you main query you need to add SQL_CALC_FOUND_ROWS option just after SELECT and in second query you need to use FOUND_ROWS() function to get total number of rows. Queries would look like this:
SELECT SQL_CALC_FOUND_ROWS name, email FROM users WHERE name LIKE 'a%' LIMIT 10;
SELECT FOUND_ROWS();
The only limitation is that you must call second query immediately after the first one because SQL_CALC_FOUND_ROWS does not save number of rows anywhere.
Although this solution also requires two queries it’s much more faster, as you execute the main query only once.
You can read more about SQL_CALC_FOUND_ROWS and FOUND_ROWS() in MySQL docs.
Sadly, as of 8.0.17:
The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 ...
Details here.