Where condition algorithm in SQL Select Query - mysql

I am working on task which require me to compare every column of Row. There are a number of ways to achieve, I am curious because number of rows is a big number. So here I am explaining it by example.
---------------------------------------------------------------------
ID[P_K] | Name | Address | City | Gender | College
---------------------------------------------------------------------
So above is a basic example of a table, which is hold data of Students from multiple colleges, now I am getting some data from outside source and need to compare it with data in my DB. Below are the possible ways to do it.
I will do select query with where Id = <id> and match it one by one in my code.
Other way I can do a select query with where ID = <id> and name = <name> and so on...
So now my preference is 2nd option, because of lesser complexity.
Now to go ahead there is only thing which is creating conflict in my mind.
Question:
Complexity of query these two queries comparing to each other(considering ID as Primary Key) :
where Id = <id>
where ID = <id> and name = <name> and so on...
I know this total depends on the MySQL algorithm, I've searched lot didn't find Select algorithm of MySql.
It will be helpful if someone can share Select algorithm.
Specific to Algorithm:
There are two ways this algorithm could work:
For number of rows {
if(whereCondition1 && whereCondition2 .... && whereCondition<N>)}
}
for number of rows {
if(whereCondition1){
//Result filter according whereCondition1
if(whereCondition2){
//Result filter according whereCondition2
.
.
and so on...
} else {
continue;
}
} else {
continue;
}
}
Now complexity for first one will be O(n). For second assuming ID[P_K], complexity will be reduced. Right?
So from above which algorithm is user? or non of these?

Every RDBMs has your own select algorithm but all of then is based on the SQL ANSI 99 pattern.
The matters here is how the RDBMs treats then and parse it to achieve a better performance. You don't need to worry about that, the only thing you have to worry is if your database is well designed with proper indexes.
That is what will make the difference between using the where Id = <id> or where ID = <id> and name = <name> and so on...
If the ID is the PK of that table and the external source that you talked about is syncronized with your data (meaning: The IDs has the same records) you just need to use where Id = <id> but if those IDs are not in sync you should then define what will make your registries unique and than create your sql condition and make sure that you have proper indexes for it.

I don't know how exactly work the algorithm, but in general:
for(int i = 0, j = 0; i < N; ++i; ++j) {
if (i==j) {
/* do something */
}
}
This has complexity O(N)
for(int i = 0, j = 0; i < N; ++i; ++j) {
if (i==j && i!=k) {
/* do something */
}
}
Also this has complexity O(N)
Ultimately, point 1 and point 2 has got the same complexity.

Related

implementing users notification on the same table or on a seperated relationed table (laravel)

i have a user and invoice table which i want to make notifications for them that they will be able to turn them on/off .
now my question is that should i add 5 columns for example on users table and 3 on invoice table to make it on or off or make 2 tables as below :
notification_list and : notification_user to make user to be able to turn the notification from user model and invoice on or off .
the problem here is that notification_user will be a massive table as soon as the users table growth . for every user i need to add 5 records at least .
i am using laravel and the tables and relations of it .so i can use morph relations but i still dont know which one is better to implement it inside the table or on a seperated table . thanks
If this can be toggled globally for all invoices all at once
You definitely need notification_list, but instead of notification_user I'd add extra column to users table or create something like users_preferences to store multiple user preferences. Creating separate table for each of preferences doesn't seem like a good solution
If this should be done for each invoice separately
You actually can't add extra columns to invoices and users table as you would be required to have one column for each user.
You have to use your solution Nr 2. But bare in mind that you don't need to create any records in table unless they differ from default behavior. That means you only add record to table when user toggles button. Also you can delete these records once invoice is no longer active. So this new table doesn't really grow as fast.
You can also use only one field for your user notification preferences. but this solution require good structuring.
It's base on the fact that what you need to track is just the state of the preference (on or off).
so save it as an integer containing the prefrence of the user. here is an example:
$preference1 = 1;
$preference2 = 0;
$preference3 = 0;
$preference4 = 1;
$preference5 = 0;
//the value to save is the integer result of the binary 01001
$preference = 1 * $preference1 + 2 * $preference2 + 4 * $preference3 + 8 * $preference4 + 16 * $preference5;
//or simply
$preference = bindec($pref5.$pref4.$pref3.$pref2.$pref1);
Now to check if a specific preference is enabled, use the bitwise comparison
if ($preference & 4) { //preference 3 has value 4
//preference 3 is on.
}
//you can also check multiple prefrences at the same time
if ($preference & 5 == 5) { //this is preference 3 and preference 1
//user has both preference enabled
}
You can also use it in your database request
$users = User::whereRaw("BIT_COUNT('prefrence' & 4)")->get();
//this will give you all the users having preference 3 enabled
To add more structure to this solution, you should declare constants in your user model
class User extends Authenticatable
{
const PREFRENCE1 = 1;
const PREFRENCE2 = 2;
const PREFRENCE3 = 4;
const PREFRENCE4 = 8;
const PREFRENCE5 = 16;
//...
}
//This way everything will make more sense like:
$users = User::whereRaw("BIT_COUNT('prefrence' & ".User::PREFERENCE3.")")->get();
if ($preference & User::PREFERENCE3) {
//preference 3 is on.
}
PS: Make sure that the integer size in the database in not under your number of preference to save.

How optimize the research of next free "slot" in mysql?

i've a problem and i can't find an easy solution.
I have self expanding stucture made in this way.
database1 | table1
| table2
....
| table n
.
.
.
databaseN | table 1
table 2
table n
each table has a structire like this:
id|value
each time a number is generated is put into the right database/table/structure (is divided in this way for scalability... would be impossible to manage table of billions of records in a fas way).
the problem that N is not fixed.... but is like a base for calculating numbers (to be precise N is known....62 but I can onlyuse a subset of "digits" that could be different in time).
for exemple I can work only with 0 1 and 2 and after a while (when I've done all the possibilities) I want to add 4 and so on (up to base 62).
I would like to find a simple way to find the 1st free slot to put the next randomly generated id but that could be reverted.
Exemple:
I have 0 1 2 3 as numbers I want use....
the element 2313 is put on dabase 2 table 3 and there will be 13|value into table.
the element 1301 is put on dabase 1 table 3 and there will be 01|value into table.
I would like to generate another number based on the next free slot.
I could test every slot starting from 0 to the biggest number but when there will be milions of records for every database and table this will be impossible.
the next element of the 1st exemple would be 2323(and not 2314 since I'm using only the 0 1 2 3 digits).
I would like som sort of invers code in mysql to give me the 23 slot on table 3 database 2 to transform it into the number. I could randomly generate a number and try to find the nearest free up and down but since the set is variable could not be a good choice.
I hope it will be clear enought to tell me any suggestion ;-)
Use
show databases like 'database%' and a loop to find non-existent databases
show tables like 'table%' and a loop for tables
select count(*) from tableN to see if a table is "full" or not.
To find a free slot, walk the database with count in chunks.
This untested PHP/MySQL implementation will first fill up all existing databases and tables to base N+1 before creating new tables or databases.
The if(!$base) part should be altered if another behaviour is wanted.
The findFreeChunk can also be solved with iteration; but I leave that effort to You.
define (DB_PREFIX, 'database');
define (TABLE_PREFIX, 'table');
define (ID_LENGTH, 2)
function findFreeChunk($base, $db, $table, $prefix='')
{
$maxRecordCount=base**(ID_LENGTH-strlen($prefix));
for($i=-1; ++$i<$base;)
{
list($n) = mysql_fetch_row(mysql_query(
"select count(*) from `$db`.`$table` where `id` like '"
. ($tmp = $prefix. base_convert($i, 10, 62))
. "%'"));
if($n<$maxRecordCount)
{
// incomplete chunk found: recursion
for($k=-1;++$k<$base;)
if($ret = findFreeChunk($base, $db, $table, $tmp)
{ return $ret; }
}
}
}
function findFreeSlot($base=NULL)
{
// find current base if not given
if (!$base)
{
for($base=1; !$ret = findFreeSlot(++$base););
return $ret;
}
$maxRecordCount=$base**ID_LENGTH;
// walk existing DBs
$res = mysql_query("show databases like '". DB_PREFIX. "%'");
$dbs = array ();
while (list($db)=mysql_fetch_row($res))
{
// walk existing tables
$res2 = mysql_query("show tables in `$db` like '". TABLE_PREFIX. "%'");
$tables = array ();
while (list($table)=mysql_fetch_row($res2))
{
list($n) = mysql_fetch_row(mysql_query("select count(*) from `$db`.`$table`"));
if($n<$maxRecordCount) { return findFreeChunk($base, $db, $table); }
$tables[] = $table;
}
// no table with empty slot found: all available table names used?
if(count($tables)<$base)
{
for($i=-1;in_array($tmp=TABLE_PREFIX. base_convert(++$i,10,62),$tables););
if($i<$base) return [$db, $tmp, 0];
}
$dbs[] = $db;
}
// no database with empty slot found: all available database names used?
if(count($dbs)<$base)
{
for($i=-1;in_array($tmp=DB_PREFIX.base_convert(++$i,10,62),$dbs););
if($i<$base) return [$tmp, TABLE_PREFIX. 0, 0];
}
// none: return false
return false;
}
If you are not reusing your slots or not deleting anything, you can of course dump all this and simply remember the last ID to calculate the next one.

Possible multiple enumeration of IEnumerable when counting and skipping

I'm preparing data for a datatable in Linq2Sql
This code highlights as a 'Possible multiple enumeration of IEnumerable' (in Resharper)
// filtered is an IEnumerable or an IQueryable
var total = filtered.Count();
var displayed = filtered
.Skip(param.iDisplayStart)
.Take(param.iDisplayLength).ToList();
And I am 100% sure Resharper is right.
How do I rewrite this to avoid the warning
To clarify, I get that I can put a ToList on the end of filtered to only do one query to the Database eg.
var filteredAndRun = filtered.ToList();
var total = filteredAndRun.Count();
var displayed = filteredAndRun
.Skip(param.iDisplayStart)
.Take(param.iDisplayLength).ToList();
but this brings back a ton more data than I want to transport over the network.
I'm expecting that I can't have my cake and eat it too. :(
It sounds like you're more concerned with multiple enumeration of IQueryable<T> rather than IEnumerable<T>.
However, in your case, it doesn't matter.
The Count call should translate to a simple and very fast SQL count query. It's only the second query that actually brings back any records.
If it is an IEnumerable<T> then the data is in memory and it'll be super fast in any case.
I'd keep your code exactly the same as it is and only worry about performance tuning when you discover you have a significant performance issue. :-)
You could also do something like
count = 0;
displayed = new List();
iDisplayStop = param.iDisplayStart + param.iDisplayLength;
foreach (element in filteredAndRun) {
++count;
if ((count < param.iDisplayStart) || (count > iDisplayStop))
continue;
displayed.Add(element);
}
That's pseudocode, obviously, and I might be off-by-one in the edge conditions, but that algorithm gets you the count with only a single iteration and you have the list of displayed items only at the end.

How to sort var length ids (composite string + numeric)?

I have a MySQL database whose keys are of this type:
A_10
A_10A
A_10B
A_101
QAb801
QAc5
QAc25
QAd2993
I would like them to sort first by the alpha portion, then by the numeric portion, just like above. I would like this to be the default sorting of this column.
1) how can I sort as specified above, i.e. write a MySQL function?
2) how can I set this column to use the sorting routine by default?
some constraints that might be helpful: the numeric portion of my ID's never exceeds 100,000. I use this fact in some javascript code to convert my ID's to strings concatenating the non-numeric portion with the (number + 1,000,000). (At the time I had not noticed the variations/subparts as above such as A_10A, A_10B, so I'll have to revamp that part of my code.)
The best way to achieve what you want is to store each part in its own column, and I would strongly recommend to change table structure. If it's impossible, you can try the following:
Create 3 UDFs which returns prefix, numeric part, and postfix of your string. For a better performance they should be native (Mysql, as any other RDMS, is not really good in complex string parsing). Then you can call these functions in ORDER BY clause or in trigger body which validates your column. In any case, it will work slower than if you create 3 columns.
No simple answer that I know of. I had something similar a while back but had to use jQuery to sort it. So what I did was first get the output into an javascript array. Then you may want to insert a zero padding to your numbers. Separate the Alpha from Nummerics using a regex, then reassemble the array:
var zarr = new Array();
for(var i=0; i<val.length; i++){
var chunk = val[i].match(/(\d+|[^\d]+)/g).join(',');
var chunks = chunk.split(",");
for(var s=0; s<chunks.length; s++){
if(isNaN(chunks[s]) == true)
zarr.push(chunks[s]);
else
zarr.push(zeroPad(chunks[s], 5));
}
}
function zeroPad(num,count){
var numZeropad = num + '';
while(numZeropad.length < count) {
numZeropad = "0" + numZeropad;
}
return numZeropad;
}
You'll end up with an array like this:
A_00100
QAb00801
QAc00005
QAc00025
QAd02993
Then you can do a natural sort. I know you may want to do it through straight MySQL but I am not to sure if it does natural sorting.
Good luck!

MySQL - Perl: How to get array of zip codes within submitted "x" miles of submitted "zipcode" in Perl example

I have found many calculations here and some php examples and most are just over my head.
I found this example:
SELECT b.zip_code, b.state,
(3956 * (2 * ASIN(SQRT(
POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
COS(a.lat*0.017453293) *
COS(b.lat*0.017453293) *
POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b
WHERE
a.zip_code = '90210' ## I would use the users submitted value
GROUP BY distance
having distance <= 5; ## I would use the users submitted value
But, I am having trouble understanding how to implement the query with my database.
It looks like that query has all I need.
However, I cannot even find/understand what b.zip_code actually is! (whats the b. and zips a, zips b?)
I also do not need the state in the query.
My mySQL db structure is like this:
ZIP | LAT | LONG
33416 | 26.6654 | -80.0929
I wrote this in attempt to return some kind of results (not based on above query) but, it only kicks out one zip code.
## Just for a test BUT, in reality I desire to SELECT a zip code WHERE ZIP = the users submitted zip code
## not by a submitted lat lon. I left off the $connect var, assume it's there.
my $set1 = (26.6654 - 0.20);
my $set2 = (26.6654 + 0.20);
my $set3 = (-80.0929 - 0.143);
my $set4 = (-80.0929 + 0.143);
my $test123 = $connect->prepare(qq{SELECT `ZIP` FROM `POSTAL`
WHERE `LAT` >= ? AND `LAT` <= ?
AND `LONG` >= ? AND `LONG` <= ?}) or die "$DBI::errstr";
$test123->execute("$set1","$set2","$set3","$set4") or die "$DBI::errstr";
my $cntr;
while(#zip = $test123->fetchrow_array()) {
print qq~$zip[$cntr]~;
push(#zips,$zip[$cntr]);
$cntr++;
}
As you can see, I am quite the novice so, I need some hand holding here with verbose explanation.
So, in Perl, how can I push zip codes into an array from a USER SUBMITTED ZIP CODE and user submitted DISTANCE in miles. Can be a square instead of a circle, not really that critical of a feature. Faster is better.
I'll tackle the small but crucial part of the question:
However, I cannot even find/understand what b.zip_code actually is! (whats the "b." and "zips a, zips b"?)
Basically, the query joins two tables. BUT, both tables being joined are in fact the same table - "zips" (in other words, it joins "zips" table to itself"). Of course, since the rest of the query needs to understand when you are referring to the first copy of the "zips" table and when to the second copy of the "zips" table, you are giving a table alias to each copy - to wit, "a" and "b"'.
So, "b.xxx" means "column xxx from table zips, from the SECOND instance of that table being joined".
I don't see what's wrong with your first query. You have latitude and longitude in your database (if I'm understanding, you're comparing a single entry to all others). You don't need to submit or return the state that's just part of the example. Make the first query work like this:
my $query = "SELECT b.zip_code,
(3956 * (2 * ASIN(SQRT(
POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
COS(a.lat*0.017453293) *
COS(b.lat*0.017453293) *
POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b WHERE
a.zip_code = ?
GROUP BY distance having distance <= ?";
my $sth = $dbh->prepare($query);
$sth->execute( $user_submitted_zip, $user_submitted_distance );
while( my ($zip, $distance) = $sth->fetchrow() ) ) {
# do something
}
This won't be that fast, but if you have a small record set ( less than 30k rows ) it should be fine. If you really want to go faster you should look into a search engine such as Sphinx which will do this for you.
fetchrow_array returns a list of list references, essentially a two-dimensional array, where each row represents a different result from the database query and each column represents a field from the query (in your case, there is only one field, or column, per row).
Calling while ($test123->fetchrow_array()) will cause an infinite loop as your program executes the query over and over again. If the query returns results, then the while condition will be satisfied and the loop will repeat. The usual idiom would be to say something more like for my $row ($test123->fetchrow_array()) { ..., which will only execute the query once and then iterate over the results.
Each result is a list reference, and the zip code you are interested in is in the first (and only) column, so you could accumulate the results in an array like this:
my #zips = (); # for final results
for my $row ($test123->fetchrow_array()) {
push #zips, $row->[0];
}
or even more concisely with Perl's map statement:
my #zips = map { $_->[0] } $test123->fetchrow_array()
which does the same thing.