MySQL - Perl: How to get array of zip codes within submitted "x" miles of submitted "zipcode" in Perl example - mysql

I have found many calculations here and some php examples and most are just over my head.
I found this example:
SELECT b.zip_code, b.state,
(3956 * (2 * ASIN(SQRT(
POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
COS(a.lat*0.017453293) *
COS(b.lat*0.017453293) *
POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b
WHERE
a.zip_code = '90210' ## I would use the users submitted value
GROUP BY distance
having distance <= 5; ## I would use the users submitted value
But, I am having trouble understanding how to implement the query with my database.
It looks like that query has all I need.
However, I cannot even find/understand what b.zip_code actually is! (whats the b. and zips a, zips b?)
I also do not need the state in the query.
My mySQL db structure is like this:
ZIP | LAT | LONG
33416 | 26.6654 | -80.0929
I wrote this in attempt to return some kind of results (not based on above query) but, it only kicks out one zip code.
## Just for a test BUT, in reality I desire to SELECT a zip code WHERE ZIP = the users submitted zip code
## not by a submitted lat lon. I left off the $connect var, assume it's there.
my $set1 = (26.6654 - 0.20);
my $set2 = (26.6654 + 0.20);
my $set3 = (-80.0929 - 0.143);
my $set4 = (-80.0929 + 0.143);
my $test123 = $connect->prepare(qq{SELECT `ZIP` FROM `POSTAL`
WHERE `LAT` >= ? AND `LAT` <= ?
AND `LONG` >= ? AND `LONG` <= ?}) or die "$DBI::errstr";
$test123->execute("$set1","$set2","$set3","$set4") or die "$DBI::errstr";
my $cntr;
while(#zip = $test123->fetchrow_array()) {
print qq~$zip[$cntr]~;
push(#zips,$zip[$cntr]);
$cntr++;
}
As you can see, I am quite the novice so, I need some hand holding here with verbose explanation.
So, in Perl, how can I push zip codes into an array from a USER SUBMITTED ZIP CODE and user submitted DISTANCE in miles. Can be a square instead of a circle, not really that critical of a feature. Faster is better.

I'll tackle the small but crucial part of the question:
However, I cannot even find/understand what b.zip_code actually is! (whats the "b." and "zips a, zips b"?)
Basically, the query joins two tables. BUT, both tables being joined are in fact the same table - "zips" (in other words, it joins "zips" table to itself"). Of course, since the rest of the query needs to understand when you are referring to the first copy of the "zips" table and when to the second copy of the "zips" table, you are giving a table alias to each copy - to wit, "a" and "b"'.
So, "b.xxx" means "column xxx from table zips, from the SECOND instance of that table being joined".

I don't see what's wrong with your first query. You have latitude and longitude in your database (if I'm understanding, you're comparing a single entry to all others). You don't need to submit or return the state that's just part of the example. Make the first query work like this:
my $query = "SELECT b.zip_code,
(3956 * (2 * ASIN(SQRT(
POWER(SIN(((a.lat-b.lat)*0.017453293)/2),2) +
COS(a.lat*0.017453293) *
COS(b.lat*0.017453293) *
POWER(SIN(((a.lng-b.lng)*0.017453293)/2),2))))) AS distance
FROM zips a, zips b WHERE
a.zip_code = ?
GROUP BY distance having distance <= ?";
my $sth = $dbh->prepare($query);
$sth->execute( $user_submitted_zip, $user_submitted_distance );
while( my ($zip, $distance) = $sth->fetchrow() ) ) {
# do something
}
This won't be that fast, but if you have a small record set ( less than 30k rows ) it should be fine. If you really want to go faster you should look into a search engine such as Sphinx which will do this for you.

fetchrow_array returns a list of list references, essentially a two-dimensional array, where each row represents a different result from the database query and each column represents a field from the query (in your case, there is only one field, or column, per row).
Calling while ($test123->fetchrow_array()) will cause an infinite loop as your program executes the query over and over again. If the query returns results, then the while condition will be satisfied and the loop will repeat. The usual idiom would be to say something more like for my $row ($test123->fetchrow_array()) { ..., which will only execute the query once and then iterate over the results.
Each result is a list reference, and the zip code you are interested in is in the first (and only) column, so you could accumulate the results in an array like this:
my #zips = (); # for final results
for my $row ($test123->fetchrow_array()) {
push #zips, $row->[0];
}
or even more concisely with Perl's map statement:
my #zips = map { $_->[0] } $test123->fetchrow_array()
which does the same thing.

Related

How optimize the research of next free "slot" in mysql?

i've a problem and i can't find an easy solution.
I have self expanding stucture made in this way.
database1 | table1
| table2
....
| table n
.
.
.
databaseN | table 1
table 2
table n
each table has a structire like this:
id|value
each time a number is generated is put into the right database/table/structure (is divided in this way for scalability... would be impossible to manage table of billions of records in a fas way).
the problem that N is not fixed.... but is like a base for calculating numbers (to be precise N is known....62 but I can onlyuse a subset of "digits" that could be different in time).
for exemple I can work only with 0 1 and 2 and after a while (when I've done all the possibilities) I want to add 4 and so on (up to base 62).
I would like to find a simple way to find the 1st free slot to put the next randomly generated id but that could be reverted.
Exemple:
I have 0 1 2 3 as numbers I want use....
the element 2313 is put on dabase 2 table 3 and there will be 13|value into table.
the element 1301 is put on dabase 1 table 3 and there will be 01|value into table.
I would like to generate another number based on the next free slot.
I could test every slot starting from 0 to the biggest number but when there will be milions of records for every database and table this will be impossible.
the next element of the 1st exemple would be 2323(and not 2314 since I'm using only the 0 1 2 3 digits).
I would like som sort of invers code in mysql to give me the 23 slot on table 3 database 2 to transform it into the number. I could randomly generate a number and try to find the nearest free up and down but since the set is variable could not be a good choice.
I hope it will be clear enought to tell me any suggestion ;-)
Use
show databases like 'database%' and a loop to find non-existent databases
show tables like 'table%' and a loop for tables
select count(*) from tableN to see if a table is "full" or not.
To find a free slot, walk the database with count in chunks.
This untested PHP/MySQL implementation will first fill up all existing databases and tables to base N+1 before creating new tables or databases.
The if(!$base) part should be altered if another behaviour is wanted.
The findFreeChunk can also be solved with iteration; but I leave that effort to You.
define (DB_PREFIX, 'database');
define (TABLE_PREFIX, 'table');
define (ID_LENGTH, 2)
function findFreeChunk($base, $db, $table, $prefix='')
{
$maxRecordCount=base**(ID_LENGTH-strlen($prefix));
for($i=-1; ++$i<$base;)
{
list($n) = mysql_fetch_row(mysql_query(
"select count(*) from `$db`.`$table` where `id` like '"
. ($tmp = $prefix. base_convert($i, 10, 62))
. "%'"));
if($n<$maxRecordCount)
{
// incomplete chunk found: recursion
for($k=-1;++$k<$base;)
if($ret = findFreeChunk($base, $db, $table, $tmp)
{ return $ret; }
}
}
}
function findFreeSlot($base=NULL)
{
// find current base if not given
if (!$base)
{
for($base=1; !$ret = findFreeSlot(++$base););
return $ret;
}
$maxRecordCount=$base**ID_LENGTH;
// walk existing DBs
$res = mysql_query("show databases like '". DB_PREFIX. "%'");
$dbs = array ();
while (list($db)=mysql_fetch_row($res))
{
// walk existing tables
$res2 = mysql_query("show tables in `$db` like '". TABLE_PREFIX. "%'");
$tables = array ();
while (list($table)=mysql_fetch_row($res2))
{
list($n) = mysql_fetch_row(mysql_query("select count(*) from `$db`.`$table`"));
if($n<$maxRecordCount) { return findFreeChunk($base, $db, $table); }
$tables[] = $table;
}
// no table with empty slot found: all available table names used?
if(count($tables)<$base)
{
for($i=-1;in_array($tmp=TABLE_PREFIX. base_convert(++$i,10,62),$tables););
if($i<$base) return [$db, $tmp, 0];
}
$dbs[] = $db;
}
// no database with empty slot found: all available database names used?
if(count($dbs)<$base)
{
for($i=-1;in_array($tmp=DB_PREFIX.base_convert(++$i,10,62),$dbs););
if($i<$base) return [$tmp, TABLE_PREFIX. 0, 0];
}
// none: return false
return false;
}
If you are not reusing your slots or not deleting anything, you can of course dump all this and simply remember the last ID to calculate the next one.

Ruby Geocoder near working unexpectedly

Using Alex Reisner's Ruby Geocoder, I have a model "Spot" which is geocoded using this gem
If I select the first spot into and object:
s = Spot.first
Then try to find all the spots within a 10 mile radius (let's say there should be 12 of them), then i would expect Spot.near(s,10) to return an active record object containing 12 spots. Instead it's just returning something like this every time:
Spot.near(s1,10)
=> {:select=>"spots.*, AS distance, CAST(DEGREES(ATAN2( RADIANS(spots.longitude - -6.88671972656243), RADIANS(spots.latitude - 55.1729431175342))) + 360 AS decimal) % 360 AS bearing", :conditions=>["spots.latitude BETWEEN 55.028211334423354 AND 55.31767490064505 AND spots.longitude BETWEEN -7.140145500612388 AND -6.633293952512472 AND <= ?", 10], :order=>"distance ASC"}
Basically it's returning some SQL but no results.
What's going wrong here? Even if i try something a lot wider, like:
Spot.near([40,0],10000)
I still basically get no results back...
EDIT
I am able to execute the query using this (not pretty) addition...
q = Spot.near(s1,10)
spots = Spot.select(q[:select]).where(q[:conditions]).order(q[:order])

MySql: Best way to run high number of search queries on a table

I have two tables, one is static database that i need to search in, the other is dynamic that i will be using to search the first database. Right now i have two separate queries. First on page load, values from second table are passed to first one as search term, and i am "capturing" the search result using cURL. This is very inefficient and probably really wrong way to do it, so i need help in fixing this issue. Currently page (html, front-end) takes 40 seconds to load.
Possible solutions: Turn it into function, but still makes so many calls out. Load table into memory and then run queries and unload cache once done. Use regexp to help speed up query? Possible join? But i am a noob so i can only imagine...
Search script:
require 'mysqlconnect.php';
$id = NULL;
if(isset($_GET['n'])) { $id = mysql_real_escape_string($_GET['n']); }
if(isset($_POST['n'])) { $id = mysql_real_escape_string($_POST['n']); }
if(!empty($id)){
$getdata = "SELECT id, first_name, last_name, published_name,
department, telephone FROM $table WHERE id = '$id' LIMIT 1";
$result = mysql_query($getdata) or die(mysql_error());
$num_rows = mysql_num_rows($result);
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
echo <<<PRINTALL
{$row[id]}~~::~~{$row[first_name]}~~::~~{$row[last_name]}~~::~~{$row[p_name]}~~::~~{$row[dept]}~~::~~{$row[ph]}
PRINTALL;
}
}
HTML Page Script:
require 'mysqlconnect.php';
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$getdata = "SELECT * FROM $table WHERE $table.mid != '1'ORDER BY $table.$sortbyme $o LIMIT $offset, $rowsPerPage";
$result = mysql_query($getdata) or die(mysql_error());
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
$idurl = 'http://mydomain.com/dir/file.php?n='.$row['id'].'';
$p_arr = explode('~~::~~',get_data($idurl));
$p_str = implode(' ',$p_arr);
//Use p_srt and p_arr if exists, otherwise just output rest of the
//html code with second table values
}
As you can see, second table may or may not have valid id, hence no results but second table is quiet large, and all in all, i am reading and outputting 15k+ table cells. And as you can probably see from the code, i have tried paging but that solution doesn't fit my needs. I have to have all of the data on client side in single html page. So please advice.
Thanks!
EDIT
First table:
id_row id first_name last_name dept telephone
1 aaa12345 joe smith ANS 800 555 5555
2 bbb67890 sarah brown ITL 800 848 8848
Second_table:
id_row type model har status id date
1 ATX Hybrion 88-85-5d-id-ss y aaa12345 2011/08/12
2 BTX Savin none n aaa12345 2010/04/05
3 Full Hp 44-55-sd-qw-54 y ashley a 2011/07/25
4 ATX Delin none _ smith bon 2011/04/05
So the second table is the one that gets read and displayed, first is read and info displayed if ID is positive match. ID is only unique in the first one, second one has multi format input so it could or could not be ID as well as could be duplicate ID. Hope this gives better understanding of what i need. Thanks again!
A few things:
Curl is completely unnecessary here.
Order by will slow down your queries considerably.
I'd throw in an if is_numeric check on the ID.
Why are you using while and mysql_num_rows when you're limiting to 1 in the query?
Where are $table and these other things being set?
There is code missing.
If you give us the data structure for the two tables in question we can help you with the queries, but the way you have this set up now, I'm surprised its even working at all.
What you're doing is, for each row in $table where mid!=1 you're executing a curl call to a 2nd page which takes the ID and queries again. This is really really bad, and much more convoluted than it needs to be. Lets see your table structures.
Basically you can do:
select first_name, last_name, published_name, department, telephone FROM $table1, $table2 WHERE $table1.id = $table2.id and $table2.mid != 1;
Get rid of the curl, get rid of the exploding/imploding.

Correlate 2 columns in SQL

SELECT ica.CORP_ID, ica.CORP_IDB, ica.ITEM_ID, ica.ITEM_IDB,
ica.EXP_ACCT_NO, ica.SUB_ACCT_NO, ica.PAT_CHRG_NO, ica.PAT_CHRG_PRICE,
ica.TAX_JUR_ID, ica.TAX_JUR_IDB, ITEM_PROFILE.COMDTY_NAME
FROM ITEM_CORP_ACCT ica
,ITEM_PROFILE
WHERE (ica.CORP_ID = 1000)
AND (ica.CORP_IDB = 4051)
AND (ica.ITEM_ID = 1000)
AND (ica.ITEM_IDB = 4051)
AND ica.EXP_ACCT_NO = ITEM_PROFILE.EXP_ACCT_NO
I'm trying basically say since the exp account code is '801500' then the Name should return "Miscellaneous Medic...".
It seems as if what you are showing is not possible. Have you edited the data in the editor??? You are joining using ica.EXP_ACCT_NO = ITEM_PROFILE.EXP_ACCT_NO . Therefore, every entry with EXP_ACCT_NO = 801500, should also have the same COMDTY_NAME.
However, it could be the case that your IDs are not actually numbers and that they are strings with whitespace (801500__ vs 801500 ). But since you are not performing a left-outer join, it would also mean you have an entry in ITEM_PROFILE with the same whitespace.
You also need to properly normalize your table data (unless this is a view) but it still means you have erroneous data.
Try to perform the same query, but using the TRIM function to remove whitespace: https://stackoverflow.com/a/6858168/1688441 .
Example:
SELECT ica.CORP_ID, ica.CORP_IDB, ica.ITEM_ID, ica.ITEM_IDB,
ica.EXP_ACCT_NO, ica.SUB_ACCT_NO, ica.PAT_CHRG_NO, ica.PAT_CHRG_PRICE,
ica.TAX_JUR_ID, ica.TAX_JUR_IDB, ITEM_PROFILE.COMDTY_NAME
FROM ITEM_CORP_ACCT ica
,ITEM_PROFILE
WHERE (ica.CORP_ID = 1000)
AND (ica.CORP_IDB = 4051)
AND (ica.ITEM_ID = 1000)
AND (ica.ITEM_IDB = 4051)
AND trim(ica.EXP_ACCT_NO) = trim(ITEM_PROFILE.EXP_ACCT_NO);

How expensive is ST_GeomFromText

In postgis, is the ST_GeomFromText call very expensive? I ask mostly because I have a frequently called query that attempts to find the point that is nearest another point that matches some criteria, and which is also within a certain distance of that other point, and the way I currently wrote it, it's doing the same ST_GeomFromText twice:
$findNearIDMatchStmt = $postconn->prepare(
"SELECT internalid " .
"FROM waypoint " .
"WHERE id = ? AND " .
" category = ? AND ".
" (b.category in (1, 3) OR type like ?) AND ".
" ST_DWithin(point, ST_GeomFromText(?," . SRID .
" ),". SMALL_EPSILON . ") " .
" ORDER BY ST_Distance(point, ST_GeomFromText(?,", SRID .
" )) " .
" LIMIT 1");
Is there a better way to re-write this?
Slightly OT: In the preview screen, all my underscores are being rendered as & # 9 5 ; - I hope that's not going to show up that way in the post.
I don't believe ST_GeomFromText() is particularly expensive, although in the past I've optimized PostGIS queries by creating a function, declaring a variable and then assigning the result of ST_GeomFromText to the variable.
Have you tried checking the execution plan for you query with a variety of different parameters because that should give you a definite idea of which bits of the query are taking the time?
I'm guessing most of the execution time will be in the calls to ST_DWithin() and ST_Distance(), although if the id and category columns aren't indexed then it might be doing some interesting table scanning.
#Ubiguch
It appears that ST_DWithin uses the spatial index, so that seems to cut down on the number of points to be queried pretty quickly.
navaid=> explain select internalid from waypoint where id != 'KROC' AND ST_DWithin(point, ST_GeomFromText('POINT(-77.6723888888889 43.1188611111111)',4326), 0.05) order by st_distance(point, st_geomfromtext('POINT(-77.6723888888889 43.1188611111111)',4326)) limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.37..8.38 rows=1 width=104)
-> Sort (cost=8.37..8.38 rows=1 width=104)
Sort Key: (st_distance(point, '0101000020E61000002FFE676B086B53C0847E44D7368F4540'::geometry))
-> Index Scan using waypoint_point_idx on waypoint (cost=0.00..8.36 rows=1 width=104)
Index Cond: (point && '0103000020E61000000100000005000000000000C03B6E53C000000060D0884540000000C03B6E53C0000000409D95454000000020D56753C0000000409D95454000000020D56753C000000060D0884540000000C03B6E53C000000060D0884540'::geometry)
Filter: (((id)::text <> 'KROC'::text) AND (point && '0103000020E61000000100000005000000000000C03B6E53C000000060D0884540000000C03B6E53C0000000409D95454000000020D56753C0000000409D95454000000020D56753C000000060D0884540000000C03B6E53C000000060D0884540'::geometry) AND ('0101000020E61000002FFE676B086B53C0847E44D7368F4540'::geometry && st_expand(point, 0.05::double precision)) AND (st_distance(point, '0101000020E61000002FFE676B086B53C0847E44D7368F4540'::geometry) < 0.05::double precision))
(6 rows)
Without the order by and the limit, it looks like a typical query is only returning 5-10 waypoints max. So I probably shouldn't worry about the additional cost of the filter that's applied to the points returned.