More efficient active record grouping - mysql

Im trying to check 2.5 second intervals for records and add an object to an array based on the count. This way works but its far too slow. thanks
#tweets = Tweet.last(3000)
first_time = #tweets.first.created_at
last_time = #tweets.last.created_at
while first_time < last_time
group = #tweets.where(created_at: (first_time)..(first_time + 2.5.seconds)).count
if group == 0 || nil
puts "0 or nil"
first_id + 1
array << {tweets: 0}
else
first_id += group
array << {tweets: group}
end
first_time += 2.5.seconds
end
return array.to_json
end

What you really need is the group_by method on the records you've retrieved:
grouped = #tweets.group_by do |tweet|
# Convert from timestamp to 2.5s interval number
(tweet.created_at.to_f / 2.5).to_i
end
That returns a hash with the key being the time interval, and the values being an array of tweets.
What you're doing in your example probably has the effect of making thousands of queries. Always watch log/development.log to see what's going on in the background.

Related

Is it okay to do binary search on an indexed column to get data from a non-indexed column?

I have a large table users(id, inserttime, ...), with index only on id. I would like to find list of users who were inserted between a given start_date and finish_date range.
User.where(inserttime: start_date..finish_date).find_each
^This leads to a search which takes a lot of time, since the inserttime column is not indexed.
The solution which I came up with is to do find user.id for start_date and finish_date separately by doing a binary search twice on the table using the indexed id column.
Then do this to get all the users between start_id and finish_id:
User.where(id: start_id..finish_id).find_each
The binary search function I am using is something like this:
def find_user_id_by_date(date)
low = User.select(:id, :inserttime).first
high = User.select(:id, :inserttime).last
low_id = low.id
high_id = high.id
low_date = low.inserttime
high_date = high.inserttime
while(low_id <= high_id)
mid_id = low_id + ((high_id - low_id) / 2);
mid = User.select(:id, :inserttime).find_by(id: mid_id)
# sometimes there can be missing users. Ex: [1,2,8,9,10,16,17,..]
while mid.nil?
mid_id = mid_id + 1
mid = User.select(:id, :inserttime).find_by(id: mid_id)
end
if (mid.inserttime < date)
low_id = mid.id + 1
elsif (mid.inserttime > date)
high_id = mid.id - 1
else
return mid.id
end
end
# when date = start_date
return (low_id < high_id) ? low_id + 1 : high_id + 1
# when date = finish_date
return (low_id < high_id) ? low_id : high_id + 1
end
I am not sure if what I am doing is the right way to deal with this problem or even if my binary search function covers all the cases.
I think the best solution would be to add an index on inserttime column but that is sadly not possible.
This might not be the best way to do it, but if the IDs are numeric and sequential you could write a query to find the users in between the minimum and maximum user ID:
SELECT id
FROM users
WHERE id BETWEEN [low_id_here] AND [high_id_here];
In ActiveRecord:
low = User.select(:id, :inserttime).first
high = User.select(:id, :inserttime).last
low_id = low.id
high_id = high.id
User.where('id BETWEEN ? AND ?', low_id, high_id)

optimize sql query inside foreach

I need help optimizing the below querys for a recurrent calendar i've built.
if user fail to accomplish all task where date
This is the query i use inside a forech which fetched all dates that the current activity is active.
This is my current setup, which works, but is very slow.
Other string explained:
$today=date("Y-m-d");
$parts = explode($sepparator, $datespan);
$dayForDate2 = date("l", mktime(0, 0, 0, $parts[1], $parts[2], $parts[0]));
$week2 = strtotime($datespan);
$week2 = date("W", $week2);
if($week2&1) { $weektype2 = "3"; } # Odd week 1, 3, 5 ...
else { $weektype2 = "2"; } # Even week 2, 4, 6 ...
Query1:
$query1 = "SELECT date_from, date_to, bok_id, kommentar
FROM bokningar
WHERE bokningar.typ='2'
and date_from<'".$today."'";
function that makes the foreach move ahead one day at the time...
function date_range($first, $last, $step = '+1 day', $output_format = 'Y-m-d' )
{
$dates = array();
$current = strtotime($first);
$last = strtotime($last);
while( $current <= $last ) {
$dates[] = date($output_format, $current);
$current = strtotime($step, $current);
}
return $dates;
}
foreach:
foreach (date_range($row['date_from'], $row['date_to'], "+1 day", "Y-m-d")
as $datespan)
if ($datespan < $today)
Query 2:
$query2 = "
SELECT bok_id, kommentar
FROM bokningar b
WHERE b.typ='2'
AND b.bok_id='".$row['bok_id']."'
AND b.weektype = '1'
AND b.".$dayForDate2." = '1'
AND NOT EXISTS
(SELECT t.tilldelad, t.bok_id
FROM tilldelade t
WHERE t.tilldelad = '".$datespan."'
AND t.bok_id='".$row['bok_id']."')
OR b.typ='2'
AND b.bok_id='".$row['bok_id']."'
AND b.weektype = '".$weektype2."'
AND b.".$dayForDate2." = '1'
AND NOT EXISTS
(SELECT t.tilldelad, t.bok_id
FROM tilldelade t
WHERE t.tilldelad = '".$datespan."'
AND t.bok_id='".$row['bok_id']."')";
b.weektype is either 1,2 or 3 (every week, every even week, every uneven week)
bokningar needs INDEX(typ, date_from)
Instead of computing $today, you can do
and date_from < CURDATE()
Are you running $query2 for each date? How many days is that? You may be able to build a table of dates, then JOIN it to bokningar to do all the SELECTs in a single SELECT.
When doing x AND y OR x AND z, first add parenthes to make it clear which comes first AND or OR: (x AND y) OR (x AND z). Then use a simple rule in Boolean arithmetic to transform it into a more efficient expression: x AND (y OR z) (where the parens are necessary).
The usual pattern for EXISTS is EXISTS ( SELECT 1 FROM ... ); there is no need to list columns.
If I am reading it correctly, the only difference is in testing b.weektype. So the WHERE can be simply
WHERE b.weektype IN ('".$weektype2."', '1')
AND ...
There is no need for OR, since it is effectively in IN().
tilldelade needs INDEX(tilldelad, bok_id), in either order. This should make the EXISTS(...) run faster.
Finally, bokningar needs INDEX(typ, bok_id, weektype) in any order.
That is a lot to change and test. See if you can get those things done. If it still does not run fast enough, start a new Question with the new code. Please include SHOW CREATE TABLE for both tables.

Mysql select sleep then return

I want select X records from database (in PHP script), then sleep 60 seconds after continue the next 60 results...
SO:
SELECT * FROM TABLE WHERE A = 'B' LIMIT 60
SELECT SLEEP(60);
....
SELECT * FROM TABLE WHERE A = 'B' LIMIT X **where X is the next 60 results, then**
SELECT SLEEP(60);
AND etc...
How can I achievement this?
There is no such thing as "the next 60 records". SQL tables represent unordered sets. Without an order by, a SQL statement can return a result set in any order -- and even in different orders on different executions.
Hence, you first need something to guarantee the ordering . . . that is, an order by with keys that uniquely identify each row.
You can then use offset/limit to accomplish what you want. Or, you could put the code into a stored procedure and use a while loop. Or, you could do this on the application side.
In PHP:
<?php
// obtain the database connection, there's a heap of examples on the net, assuming you're using a library like mysqlite
$offset = 0;
while (true) {
if ($offset == 0) {
$res = $db->query('SELECT * FROM TABLE WHERE A = 'B' LIMIT 60');
} else {
$res = $db->query('SELECT * FROM TABLE WHERE A = 'B' LIMIT ' . $offset . ',60');
}
$rows = $db->fetch_assoc($res);
sleep(60);
if ($offset >= $some_arbitrary_number) {
break;
}
$offset += 60;
}
What you're doing is gradually incrementing the limit field by 60 until you reach a limit. The easiest way to do it is in a control while loop using true for the condition and break when you reach your invalid condition.

Selecting Individual rows using LIMIT

I'm trying to figure out if there's anyway to use GROUP_CONCAT to select rows based on these parameters.
What I'm trying to to get the lowest time for every single style/zonegroup.
AKA:
Lowest Time for Style 0 ZoneGroup 0
Lowest Time for Style 0 ZoneGroup 1
Lowest Time for Style 0 ZoneGroup 2
Lowest Time for Style 1 ZoneGroup 0
Lowest Time for Style 2 ZoneGroup 0
...
I could have multiple queries sent through my plugin, but I would like to know if this could be firstly eliminated with a GROUP_CONCAT function, and if so -how.
Here's what I could do, but I'ld like to know if this could be converted into one line.
for (int i = 0; i < MAX_STYLES; i++) {
for (int x = 0; x < MAX_ZONEGROUPS; x++) {
Transaction.AddQuery("SELECT * FROM `t_records` WHERE mapname = 'de_dust2' AND style = i AND zonegroup = x ORDER BY time ASC LIMIT 1;");
}
}
Thanks.
You don't need group_concat(). You want to filter records, so use WHERE . . . in this case with a correlated subquery:
select r.*
from t_records r
where r.mapname = 'de_dust2' and
r.timestamp = (select min(r2.timestamp)
from t_records r2
where r2.mapname = r.mapname and
r2.style = r.style and
r2.zonegroup = r.zonegroup
);

Mysql updating data using name generator with ruby

We have a table that has 13m rows,It's name and surname fields are nil by default, when we are trying to push some data, it stops running after 1.2m query. We looped with 10k row each because of ram issue.
The algorithm is,
$i = 0;
until $i > 13000 do
b = Tahsil.where("NO < ?",(10000*($i+1))).offset(10000*$i)
b.each do |a|
a.name = Generator('name')
a.surname = Generator('surname')
a.save
end
$i += 1
end
Ruby on Rails has some methods build in that you might want to use:
Tahsil.find_each do |tahsil|
tahsil.update(name: Generator('name'), surname: Generator('surname'))
end
find_each iterates through all records in batches (with a default batch size of 1000). update updates a record.