Converting sql query to ActiveRecord Rails - mysql

I have a location table in my database which contains location data of all the users of my system.
The table design is something like
id| user_id| longitude| latitude| created_at|
I have an array of users. Now I want to select the latest(sorted according to created at) location of all these users.
I am able to figure out the sql query for same
SELECT * FROM my_table
WHERE (user_id , created_at) IN (
SELECT user_id, MAX(created_at)
FROM my_table
GROUP BY user_id
)
AND user_id IN ('user1', 'user2', ... );
Now as I am working in Ruby On Rails, I want to write this sql query to activerecord rails. Can anyone please help me with this ?

I think this will give the correct result:
MyModel.order(created_at: :desc).group(:user_id).distinct(:user_id)
If you want to generate the exact same query, this will do it:
MyModel.where("(user_id, created_at) IN (SELECT user_id, MAX(created_at) from my_table GROUP BY user_id)")
I think the subquery will probably not scale well with a large data set, but I understand if you just want to get it into rails and optimize later.

How about adding a scope, and getting the same result in a slightly different way:
class UserLocation
def self.latest_per_user
where("user_locations.created_at = (select Max(ul2.created_at) from user_locations ul2 where ul2.user_id = user_locations.user_id)")
end
end
Then you just use:
UserLocation.latest_per_user.where(:user_id => ['user1', 'user2'])
... to get the required data set.

Related

Tips to optimize query, with many subqueries in MySQL

I have ~6 tables where I have to count or sum fields based on matching site_ids and date. I have the following query, with many subqueries which takes an extraordinary amount of time to run. I am certain there is an easier, more efficient way, however I am rather new to these more complex queries. I have read regarding optimizations, specifically using joins ON but struggling to understand and implement.
The goal is to speed this up and not bring my small server to it's knees when running. Any assistance or direction would be VERY much appreciated!
SELECT date(date_added) as dt_date,
site_id as dt_site_id,
(SELECT site_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as site_id,
(SELECT parent_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as main_site_id,
(SELECT corp_owned from branch_mappings bm WHERE mark_id_site = dt.site_id) as corp_owned,
count(id) as dt_calls,
(SELECT count(date_submitted) FROM mark_unbounce ub WHERE date(date_submitted) = dt_date AND ub.site_id = dt.site_id) as ub,
(SELECT count(timestamp) FROM mark_wordpress_contact wp WHERE date(timestamp) = dt_date AND wp.site_id = dt.site_id) as wp,
(SELECT count(added_on) FROM m_shrednations sn WHERE date(added_on) = dt_date AND sn.description = dt.site_id) as sn,
(SELECT sum(users) FROM mark_ga ga WHERE date(ga.date) = dt_date AND channel LIKE 'Organic%' AND ga.site_id = dt.site_id) as ga_organic
FROM mark_dialogtech dt
WHERE site_id is not null
GROUP BY site_name, dt_date
ORDER BY site_name, dt_date;
What you're doing is the equivalent of asking your server to query 7+ different tables every time you run this query. Personally, I use Joins and nested queries because I can whittle down do what I need.
The first 3 subqueries can be replaced with...
SELECT date(date_added) as dt_date,
dt.site_id as dt_site_id,
bm.site_id as site_id,
bm.parent_id as main_site_id,
bm.corp_owned as corp_owned,
FROM mark_dialogtech dt
INNER JOIN branch_mappings bm
ON bm.mark_id_site = dt.site_id
I'm not sure why you are running the last 3. Is there a business requirement? If so, consider how often this is to be run and when.
If absolutely necessary, add those to the joins like...
FROM mark_dialogtech dt
INNER JOIN
(SELECT site_id, count(date_submitted) FROM mark_unbounce GROUP BY site_id) ub
on ub.site_id = dt.site_id
This should limit the results to only records where the site_id exists in both the mark_dialogtech and mark_unbounce (or whatever table). From my experience, this method has sped things up.
Still, my concern is the number of aggregations you're performing. If they can be cached to a dashboard and pulled during slow times, that would be best.
Its hard to analyze how big is your query(no data examples) but in your case I hightly recommend to use CTE(Common Table Expressions). Check this :
https://www.sqlpedia.pl/cte-common-table-expressions/
CTEs do not have a physical representation in tempdb like temporary tables or tabular variables. CTE can be viewed as such a temporary, non-materialized view. When MSSQL executes a query and encounters a CTE, it replace the reference to that CTE with definition. Therefore, if the CTE data is used several times in a given query, the same code will be executed several times and MSSQL does not optimize it. Soo... it will work just for few data like you want to do.
Appreciate all the responses.
I ended up creating a python script to run the queries separately and inserting the results into the table for the respective KPI. So, I scrapped the idea of a single query due to performance. I concatenated each date and site_id to create the id, then leveraged an ON DUPLICATE KEY UPDATE with each INSERT statement.
The python dictionary looks like this, and I simply looped. Again, thanks for the help.
SELECT STATEMENTS (Python Dict)
"dt":"SELECT date(date_added) as dt_date, site_id as dt_site, count(site_id) as dt_count FROM mark_dialogtech WHERE site_id is not null GROUP BY dt_date, dt_site ORDER BY dt_date, dt_site;",
"ub":"SELECT date_submitted as ub_date, site_id as ub_site, count(site_id) as ub_count FROM mark_unbounce WHERE site_id is not null GROUP BY ub_date, ub_site;",
"wp":"SELECT date(timestamp) as wp_date, site_id as wp_site, count(site_id) as wp_count FROM mark_wordpress_contact WHERE site_id is not null GROUP BY wp_date, wp_site;",
"sn":"SELECT date(added_on) as sn_date, description as sn_site, count(description) as sn_count FROM m_shrednations WHERE description <> '' GROUP BY sn_date, sn_site;",
"ga":"SELECT date as ga_date, site_id as ga_site, sum(users) as ga_count FROM mark_ga WHERE users is not null GROUP BY ga_date, ga_site;"
INSERT STATEMENTS (Python Dict)
"dt":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, dt_calls, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE dt_Calls={dbdata[3]}, added_on='{dbdata[4]}';",
"ub":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ub, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ub={dbdata[3]}, added_on='{dbdata[4]}';",
"wp":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, wp, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE wp={dbdata[3]}, added_on='{dbdata[4]}';",
"sn":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, sn, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE sn={dbdata[3]}, added_on='{dbdata[4]}';",
"ga":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ga_organic, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ga_organic={dbdata[3]}, added_on='{dbdata[4]}';",
It would be very difficult to analyze the query with out the data, Any ways!
try joining the tables and group it, that should improve the performance
here is a left join sample
SELECT column names
FROM table1
LEFT JOIN table2
ON table1.common_column = table2.common_column;
check this for more detailed inform https://learnsql.com/blog/how-to-left-join-multiple-tables/

SQL: Return immediately if one matched record found

I have one table with user and their posts. It looks like "user_id | post_id | post_status".
Now I have a list of userid (ex, 100 users) and I want to know how many of them has at least one post that gets deleted (ex, post_status 3).
Here is my sample search:
select count(distinct user_id)
from post_table
where user_id in ( {my set} )
and post_status=3
It runs super slow since it iterates the entire table. Is there a way to speed up the query?
Use something like
SELECT COUNT(*)
FROM
-- the list of userid as a rowset
( SELECT 123 AS user_id UNION ALL
SELECT 456 UNION ALL
-- ...
SELECT 789
) user_id_list
WHERE EXISTS ( SELECT NULL
FROM post_table
WHERE post_table.user_id = user_id_list.user_id
AND post_table.post_status = 3 )
If your MySQL version is 8.0.4 or above then you may provide the users list as CSV/JSON and parse it using JSON_TABLE (the query text will be more compact).
INDEX(post_status, user_id)
may help speed up your query, especially if very few rows have status=3.
This could also speed up Akina's solution.

Get Conditionally Latest record from each group - without Aggregate functions or Partition

I have been trying to do this in many ways suggested.
Note: we do not want aggregate function or Partition since this is just a small part of whole Stored procedure and this is client requirement to not have it, so not in option and not possible duplicate of other existing answers / questions
I have a messages table, which has a column from and to, a foreign key to the user table, basically which user sends to whom at simplest. I also have other columns which are isSnoozed and snoozeAt for if the message is snoozed.
So the ordering is according to case. If messages is snoozed then consider snoozeAt time to Order or if not then consider sendAt. (right now we can ignore this condition while ordering, But I mentioned this since we cannot take simply MAX(id) )
I need to get recent most message from messages group by from user id
messages table like :
id -- to -- from -- isSnoozed -- snoozedAt -- sendAt ...
What I tried :
select * from ( select * from messages order by sendAt DESC) as TEMP GROUP BY TEMP.from
I tried many similar approaches but none worked.
I wasted many paid hours but can't find an approach which meets my exact requirement
NOTE: Please ignore typo in query if any, since I cant type in exact query table and names, So i typed in directly here
I figured this out by doing something like this, which could be explained in a simplified way:
select * from message where message.id in (
select
( select id from message where message.from = user.id order by CASE isSnoozed WHEN 0 THEN sendAt ELSE snoozeAt END DESC limit 1) as id
from user where user.id in ( select friends.`whoIsAdded` from friends where friends.`whoAdded` = myId)
) order by CASE isSnoozed WHEN 0 THEN sendAt ELSE snoozeAt END DESC
If I understand correctly, you just want the largest value in one of two columns. Assuming the values are never NULL, you can use greatest():
select m.*
from messages m
where greatest(m.sendAt, m.snoozedAt) =
(select max(greatest(m2.sendAt, m2.snoozedAt))
from messages m2
where m2.from = m.from
);
If the columns can be NULL, then you can use coalesce() to give them more reasonable values.

Refactoring a large union statement to use a SELECT which breaks early

It's a bit difficult to explain the situation, but currently I'm generating massive unions to accomplish this. They look at bit like:
(
SELECT
ipaddress
FROM post
WHERE ipaddress = 'someipaddress'
AND userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
LIMIT 1
)
UNION
(
SELECT
ipaddress
FROM post
WHERE ipaddress = 'someotheripaddress'
AND userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
LIMIT 1
)
These get huge fast, but seem to be the fastest way for me to accomplish this right now. I've tried refactoring it to something like:
SELECT
ipaddress
FROM post
WHERE ipaddress in ('all ips', .....)
AND userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
GROUP BY ipaddress
But this is around x5 slower than the massive union statement. The big issue is that the post table is huuuuuge, so the refactored SQL is forced to look through the entire table where each union statement can break after finding a single instance. Is there any way to specify the SQL to break on finding the first unique group?
Anyone have tips on how to refactor the huge union statement above into something cleaner?
You can write the query like this:
select i.ipaddress
from (select 'someipaddress' as ipaddress union all
select 'someotheripaddress'
) i
where exists (select 1
from posts p
where p.ipaddress = i.ipaddress and
p.userid NOT IN (1, {$postinfo['userid']}, {$vbulletin->options['sdwikipostuserid']})
);
This is optimized with an index on posts(ipaddress, userid) -- one index, two columns.

Nested SELECT OR INSERT

I am quite new with writing mysql queries and so far things have been going well although I have recently become stuck with something. What I'm trying to do is select information from another table for use in the same query. Here's what I have so far which works fine:
SELECT *
FROM `userskinlist`
WHERE userid IN (
SELECT userid
FROM userlist
WHERE authid = 'STEAM_1:0:2144092'
)
AND active = '1'
AND weaponid >= '1'
AND skinid > '0'
But now if the nested part does not return anything
(SELECT userid FROM userlist WHERE authid = 'STEAM_1:0:2144092')
I need to run an insert statement as follows
INSERT IGNORE INTO userlist (authid) VALUES ('STEAM_1:0:2144092')`
but I can't figure out how to add this to the same query.
Any help would be greatly appreciated :)