Perl with mysql, terribly slow, how to accelerate - mysql

unit
id fir_name sec_name
author
id name unit_id
author_paper
id author_id paper_id
I want to unify authors['same author' means the names are the same and their units' fir_names are the same], and I have to change author_paper table at the same time.
Here is what i do:
$conn->do('create index author_name on author (name)');
my $sqr = $conn->prepare("select name from author group by name having count(*) > 1");
$sqr->execute();
while(my #row = $sqr->fetchrow_array()) {
my $dup_name = $row[0];
$dup_name = formatHtml($dup_name);
my $sqr2 = $conn->prepare("select id, unit_id from author where name = '$dup_name'");
$sqr2->execute();
my %fir_name_hash = ();
while(my #row2 = $sqr2->fetchrow_array()) {
my $author_id = $row2[0];
my $unit_id = $row2[1];
my $fir_name = getFirNameInUnit($conn, $unit_id);
if (not exists $fir_name_hash{$fir_name}) {
$fir_name_hash{$fir_name} = []; #anonymous arr reference
}
$x = $fir_name_hash{$fir_name};
push #$x, $author_id;
}
while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
my $count = scalar #$author_id_arr;
if ($count == 1) {next;}
my $author_id = $author_id_arr->[0];
for ($i = 1; $i < $count; $i++) {
#print "$author_id_arr->[$i] => $author_id\n";
unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table
}
}
}
select count(*) from author; #240,000
select count(distinct(name)) from author; #7,7000
It is terribly slow!!I've runned it for 5hours, it just removed about 4,0000 dup names.
How to make it run faster.I am eager for your advice

You should not prepare the second sql statement within the loop and you can make real use of the preparation when you use the ? placeholder:
$conn->do('create index author_name on author (name)');
my $sqr = $conn->prepare('select name from author group by name having count(*) > 1');
# ? is the placeholder and the database driver knows if its an integer or a string and
# quotes the input if needed.
my $sqr2 = $conn->prepare('select id, unit_id from author where name = ?');
$sqr->execute();
while(my #row = $sqr->fetchrow_array()) {
my $dup_name = $row[0];
$dup_name = formatHtml($dup_name);
# Now you can reuse the prepared handle with different input
$sqr2->execute( $dup_name );
my %fir_name_hash = ();
while(my #row2 = $sqr2->fetchrow_array()) {
my $author_id = $row2[0];
my $unit_id = $row2[1];
my $fir_name = getFirNameInUnit($conn, $unit_id);
if (not exists $fir_name_hash{$fir_name}) {
$fir_name_hash{$fir_name} = []; #anonymous arr reference
}
$x = $fir_name_hash{$fir_name};
push #$x, $author_id;
}
while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
my $count = scalar #$author_id_arr;
if ($count == 1) {next;}
my $author_id = $author_id_arr->[0];
for ($i = 1; $i < $count; $i++) {
#print "$author_id_arr->[$i] => $author_id\n";
unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table
}
}
}
This should speed up things as well.

The moment I see a query and a loop I think that you have a latency problem: you query to get a set of values and then iterate over the set to do something else. That's a LOT of latency if it means a network round trip to the database for each row in the set.
It'd be better if you could do it in a single query using an UPDATE and a sub-select OR if you could batch those requests and perform all of them in one round trip.
You'll get an additional speed up if you use indexes wisely. Every column in a WHERE clause should have an index. Every foreign key should have an index.
I'd run EXPLAIN PLAN on your queries and see if there are any TABLE SCAN going on. If there is, you've got to index properly.
I wonder if a properly designed JOIN would come to your rescue?
240,000 rows in one table and 77,000 in another isn't that large a database.

Related

MySQL optional filters for search query

I am working on a query that has an optional filter, so lets assume the table name is products and the filter is the id (primary key)
If the filter is not present I would do something like this:
SELECT * FROM products;
If the filter is present I would need to do something like this:
SELECT * FROM products WHERE id = ?;
I have found some potential solutions that can mix the 2 in sql rather than doing conditions in the back-end code itself
SELECT * FROM products WHERE id = IF(? = '', id, ?);
OR
SELECT * FROM products WHERE IF(? = '',1, id = ?);
I was just wondering which one would be faster (In the case of multiple filters or a very big table) Or is there a better solution to handle this kind of situation?
A better approach is to construct the WHERE clause from the parameters available. This allows the Optimizer to do a much better job.
$wheres = array();
// Add on each filter that the user specified:
if (! empty($col)) { $s = $db->db_res->real_escape_string($col);
$wheres[] = "collection = '$s'"; }
if (! empty($theme)) { $s = $db->db_res->real_escape_string($theme);
$wheres[] = "theme = '$s'"; }
if (! empty($city)) { $s = $db->db_res->real_escape_string($city);
$wheres[] = "city = '$s'"; }
if (! empty($tripday)) { $s = $db->db_res->real_escape_string($tripday);
$wheres[] = "tripday = '$s'"; }
// Prefix with WHERE (unless nothing specified):
$where = empty($wheres) ? '' :
'WHERE ' . implode(' AND ', $wheres);
// Use the WHERE clause in the query:
$sql = "SELECT ...
$where
...";
Simplest approach is OR:
SELECT *
FROM products
WHERE (? IS NULL OR id = ?);
Please note that as you will add more and more conditions with AND, generated plan will be at least poor. There is no fit-them-all solution. If possible you should build your query using conditional logic.
More info: The “Kitchen Sink” Procedure (SQL Server - but idea is the same)

Selecting next related row in MySQL

I have a spatial dataset in MySQL 5.7 where I have the columns: id, deviceid, latlng_point, time. latlng_point is a geospatial Point.
What I'm trying to achieve is calculating distance from points. I'm unsure on how to approach this.
SELECT
ST_DISTANCE_SPHERE(latlng_point, i want the next latlng_point here) AS distance
FROM points
WHERE deviceid = 1
ORDER BY time DESC;
In PHP I would do something like this:
<?php
$conn = new mysqli($host,$user,$pass,$db);
$query = "SELECT latlng_point FROM points WHERE deviceid = 1...";
$latlng_array = array();
if ($result = $conn->query($query)) {
while ($row = $result->fetch_assoc()) {
$latlng_array[] = $row;
}
}
$distance = 0;
for ($i = 0; $i < count($latlng_array) - 1; $i++) {
$pt1 = $latlng_array[$i]['latlng_point'];
$pt2 = $latlng_array[$i+1]['latlng_point'];
$distance += haversine_function($pt1,$pt2);
}
echo "Distance: {$distance}";
?>
I'm trying to achieve something similar purely in MySQL.
Try this one:
SELECT SUM(ST_DISTANCE_SPHERE(p1.latlng_point, p2.latlng_point)) AS total_distance
FROM points p1
JOIN points p2 ON p2.id = (
SELECT p.id
FROM points p
WHERE p.deviceid = p1.deviceid
AND p.time > p1.time
ORDER BY p.time ASC
LIMIT 1
)
WHERE p1.deviceid = 1
The (correlated) subquery should return the id of the next point (sorted by time).
I can't tell you if it is really efficient or if it even works at all (can't test it).
However you should have an index on (deviceid, time) - Assuming that id is the primary key.

Mysql PDO (Getting total from all colums) [duplicate]

I'm new to php and I've searched for the past hour and read all the documentation I could find and nothing is helping. I have a table that has a bunch of rows of data. I'm trying to pick one column from the whole table and add them all together. Here is what I got. All this tells me is how many rows there are that match my query, not the total sum of column I want. Any help is appreciated.
$res1 = $db->prepare('SELECT sum(distance) FROM trip_logs WHERE user_id = '. $user_id .' AND status = "2"');
$res1->execute();
$sum_miles = 0;
while($row1 = $res1->fetch(PDO::FETCH_ASSOC)) {
$sum_miles += $row1['distance'];
}
echo $sum_miles;
You're only returning one row in this instance. Modify your summed column to have an alias:
SELECT SUM(distance) AS totDistance FROM trip_logs ....
Now you can can fetch the row -
$row = $res1->fetch(PDO::FETCH_ASSOC);
echo $row['totDistance'];
No need to loop.
You can use SUM() without explicitely grouping your rows because if you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
If however you want to use the SUM() function for something slightly more complicated you have to group your rows so that the sum can operate on what you want.
If you want to get multiple sums in a single statement, for example to get the distance for all users at once, you need to group the rows explicitely:
$res1 = $db->prepare("
SELECT
SUM(distance) AS distance,
user_id
FROM trip_logs WHERE status = '2'
GROUP BY user_id
");
$res1->execute();
while ($row = $res1->fetch(PDO::FETCH_ASSOC))
{
echo "user $row[user_id] has runned $row[distance] km.\n";
}
This will return the sum of distances by user, not for all users at once.
Try this if you are using a Class :
class Sample_class{
private $db;
public function __construct($database) {
$this->db = $database;
}
public function GetDistant($user_id,$status) {
$query = $this->db->prepare("SELECT sum(distance) FROM trip_logs WHERE user_id =? AND status =?");
$query->bindValue(1, $user_id);
$query->bindValue(2, $status);
try{ $query->execute();
$rows = $query->fetch();
return $rows[0];
} catch (PDOException $e){die($e->getMessage());}
}
}
$dist = new Sample_class($db);
$user_id = 10;
$status = 2;
echo $dist->GetDistant($user_id,$status);

Which one is Faster, SUB-Query or IN-Query?

If I want to select whether i am following the threads or not.
I have two approaches to do so... but I don't know which one would be more better in terms of performance and speed. Can anyone help me out?
Approach 1
$cui = $_SESSION['user_id'];
$data = mysqli_query($con,"
SELECT t.*,( select count(follow_id) from follows where object_id =
t.thread_id AND object_type='thread' AND user_id = $cui) as me_follow FROM threads t
");
while($row = mysqli_fetch_assoc($data)){
/*
$row['me_follow'] = 0 if i aint following
$row['me_follow'] = 1 if i am following
*/
}
Approach 2
$cui = $_SESSION['user_id'];
$data = mysqli_query($con,"SELECT * FROM threads");
$ids = array();
while($row = mysqli_fetch_assoc($data)){
$ids[] = $row['thread_id'];
}
$ids = join($ids,",");
$data = mysqli_query($con,"SELECT COUNT(*) FROM follows WHERE object_id IN($ids) AND user_id = $cui");
One round-trip wins over two. This is because there is some overhead in sending SQL to the server, having it parse it, execute it and send the results back. It is (usually) better to do everything in one round-trip.

Perl Script / MySQL query that counts existing array from mysql database query of a field

Hard for me to explain but I will try.
I have a mysql table with a field called "inventory" which is an array of items. I need to count the number of different items in this array. My problem is I have players on a game that are putting too many items into this array and it causes issues.
Here is an example:
[[["NVGoggles","Mk_48_DZ"],[3,10]],[["ItemBandage","Skin_Sniper1_DZ"],[2,34]],[["DZ_Backpack_EP1"],[1]]]
This array should count as 5 items. The number of each item doesn't matter in this case.
NVGoggles, Mk_48_DZ, ItemBandage, Skin_Sniper1_DZ, DZ_Backpack_EP1
Example of empty array:
[[[],[]],[[],[]],[[],[]]]
I need the script to:
SELECT inventory FROM instance_deployable where deployable_id = 16
Count the inventory array for each row.
Update the inventory for row if count is greater than 50 different items.
I'm stumped on how to count the array with either MySQL or a Perl script.
use strict;
use warnings;
use DBI;
use JSON 'decode_json', 'encode_json';
use List::Util 'sum';
my $limit = 50;
my $deployable_id = 16;
my $dbh = DBI->connect('DBI:mysql:database=test;host=127.0.0.1;port=3306','user','pass',{RaiseError=>1,AutoCommit=>1});
my $sth = $dbh->prepare('select distinct inventory from inventory_deployable where deployable_id=?');
$sth->execute($deployable_id);
my #bad_inventory;
while ( my ($inventory_col) = $sth->fetchrow_array ) {
my $inventory = decode_json($inventory_col);
if ( $limit < sum map scalar #{ $_->[0] }, #$inventory ) {
push #bad_inventory, $inventory_col;
}
}
$sth = $dbh->prepare('update inventory_deployable set inventory=? where inventory=? and deployable_id=?');
for my $inventory_col (#bad_inventory) {
my $inventory = decode_json($inventory_col);
my $keep = $limit;
for my $category (#$inventory) {
if ( #{ $category->[0] } > $keep ) {
splice #{ $category->[0] }, $keep;
splice #{ $category->[1] }, $keep;
}
$keep -= #{ $category->[0] };
}
my $new_inventory_col = encode_json($inventory);
$sth->execute($new_inventory_col, $inventory_col, $deployable_id);
}