sql script works on MySQL, but not on google bigquery - mysql

I have sql scripts that work fine in MySQL, but that I cannot get to work in google bigquery. After reading through bq documentation, I made a number of adjustments (eg no more than one join per select statement), but the script still fails. Any help is appreciated. If you know of any good resources in terms of bq sql vs other sql, that would also be greatly appreciated. Thanks.
SELECT
T1.action_date AS action_date,
T1.ad_campaign_category AS ad_campaign_category,
T1.campaign_id AS campaign_id,
T2.total_sends AS total_sends,
count(*) AS clicks_per_category
FROM (
SELECT action_date, campaign_id, ad_campaign_category
FROM projectX.email_action
WHERE action_date > '2009-04-01' AND action_date < '2011-05-01') T1,
(
SELECT action_date, campaign_id, ad_campaign_category, count(*) AS total_sends
FROM projectX.email_action
WHERE action_type = 'send' AND action_date > '2009-04-01' AND action_date < '2011-05-01'
GROUP BY action_date, campaign_id) T2
WHERE T1.action_date = T2.action_date
AND T1.campaign_id = T2.campaign_id
GROUP BY action_date, campaign_id, ad_campaign_category

The JOIN must be explicit -- that is, rather than using SELECT ... FROM (...) t1, (...) t2 WHERE t1.x = t2.y you should use the form SELECT ... FROM (...) t1 JOIN (...) t2 ON t1.x = t2.y
For your example, this would look like:
SELECT
T1.action_date AS action_date,
T1.ad_campaign_category AS ad_campaign_category,
T1.campaign_id AS campaign_id,
T2.total_sends AS total_sends,
count(*) AS clicks_per_category
FROM (
SELECT action_date, campaign_id, ad_campaign_category
FROM projectX.email_action
WHERE action_date > '2009-04-01' AND action_date < '2011-05-01') T1
JOIN (
SELECT action_date, campaign_id, ad_campaign_category, count(*) AS total_sends
FROM projectX.email_action
WHERE action_type = 'send' AND action_date > '2009-04-01' AND action_date < '2011-05-01'
GROUP BY action_date, campaign_id) T2
ON T1.action_date = T2.action_date
AND T1.campaign_id = T2.campaign_id
GROUP BY action_date, campaign_id, ad_campaign_category
Note if you get an error that one of the tables is too large, try using JOIN EACH instead of JOIN.

Related

select join two table then order by latest upload time(uplaod_time column) in mysql select query

I got two table users(table 01)、record_dcm_upload(table02)
i try to query counts and latest upload file time by everylogin account(users.username)
like
SELECT record_dcm_upload.user_id, users.username, record_dcm_upload.upload_time, COUNT( * )
FROM record_dcm_upload
JOIN users ON ( users.id = record_dcm_upload.user_id )
GROUP BY record_dcm_upload.user_id
but my query sql got some problem (actually the result of upload_time not the latest)
how should i adjust my query code (hope user_id and upload_time all sort By DESC)
SELECT t2.id, t2.username, MAX(t1.upload_time), COUNT(*)
FROM record_dcm_upload t1
JOIN users t2 ON ( t2.id = t1.user_id )
GROUP BY t2.id, t2.username

SQL count number of common matches using several WHERE clauses

I have a table having columns like: membership_id | user_id | group_id
I'm looking for a SQL query to get the number of common groups between 2 different users. I could do that in several queries and using some PHP but I'd like to know if there is a way to use only SQL for that.
Like with the user ids 1 and 3, there are 3 common groups (1, 5 and 6) so the result returned would be 3.
I've made several tests but so far no result...Thank you.
You don't need "multiple WHERE clauses" or even a self JOIN:
SELECT group_id
FROM theTable AS t
WHERE t.user_id IN (1, 3)
GROUP BY group_id
HAVING COUNT(DISTINCT user_id) = 2;
more generically
SELECT group_id
FROM theTable AS t
WHERE t.user_id IN ([user id list])
GROUP BY group_id
HAVING COUNT(DISTINCT user_id) = [# of user ids in list];
Edit: Oh, you wanted the number of groups....
SELECT COUNT(1) FROM (
SELECT group_id
FROM theTable AS t
WHERE t.user_id IN (1, 3)
GROUP BY group_id
HAVING COUNT(DISTINCT user_id) = 2
);
You can achieve this with join.
Try this:
select t1.user_id, t2.user_id, group_concat(distinct t1.group_id)
from your_table t1
join your_table t2
on t1.user_id < t2.user_id
and t1.group_id = t2.group_id
group by t1.user_id, t2.user_id;
If you don't want a concatenated output:
select distinct t1.user_id, t2.user_id, t1.group_id
from your_table t1
join your_table t2
on t1.user_id < t2.user_id
and t1.group_id = t2.group_id;
Try to join two instances of the same table (for each of them you select only the records relative to one of the users) using group_id as join attribute, and count the result:
SELECT COUNT(*)
FROM table AS t1
JOIN table AS t2 ON t1.group_id=t2.group_id
WHERE t1.user_id=1 AND t2.user_id=3;
SELECT COUNT(*)
FROM TABLE_NAME USER_ONE_INFO
TABLE_NAME USER_TWO_INFO
WHERE USER_ONE_INFO.ID = USER_ONE_ID
AND USER_TWO_INFO.ID = USER_TWO_ID
AND USER_ONE_INFO.GROUP_ID = USER_TWO_INFO.GROUP_ID;

Subselect bad performance

This is my query, it takes a long time to execute. Can I use an inner join? I am working on only one table.
SELECT imei,csv_data_table.time,phone_model,test_unique_id
FROM verveba_mos.csv_data_table
WHERE time = (SELECT MAX(time) FROM csv_data_table
T1 WHERE csv_data_table.imei=T1.imei)
You can use JOIN or NOT EXISTS() to do this, that doesn't necessarily means it will be faster:
EXISTS() :
SELECT imei,csv_data_table.time,phone_model,test_unique_id
FROM verveba_mos.csv_data_table t
WHERE NOT EXISTS(SELECT 1
FROM csv_data_table s
WHERE t.imei= s.imei
AND s.time > t.time)
JOIN:
SELECT t.imei,t.time,t.phone_model,t.test_unique_id
FROM verveba_mos.csv_data_table t
JOIN(SELECT s.imei,MAX(time) as max_t
FROM csv_data_table s
GROUP BY s.imei) p
ON(t.imei= p.imei and t.time = p.max_t)
SELECT t1.imei, t1.time, t1.phone_model, t1.test_unique_id
FROM csv_data_table t1
JOIN (select imei, max(time) time from csv_data_table group by imei) t2
ON (t1.imei = t2.imei and t1.time = t2.time)
You should also consider putting an index on csv_data_table(imei, time) if you don't already have one.

sql query to select the latest entry of each user

I have a location table in my database which contains location data of all the users of my system.
The table design is something like
id| user_id| longitude| latitude| created_at|
Now I have a array of user ids and I want to write a sql query to select the latest location of all these users. Can you please help me with this sql query ?
In the user_id in (......) at the end of the query you sould insert your array of user ..
select * from my_table
where (user_id , created_at) in (select user_id, max(created_at)
from my_table
group by user_id)
and user_id in ('user1','user2',... ) ;
SELECT
t1.ID,
t1.user_id,
t1.longitude,
t1.latitude,
t1.created_at
FROM
YourTable t1
INNER JOIN
(SELECT MAX(id), user_id FROM YourTable GROUP BY user_id) t2 on t2.user_id = t1.user_id
INNER JOIN
yourArrayTable ON
yourArrayTable.user_id = t1.user_id

Better solution for finding unique visitors

I am using the following query for finding the number of unique visitors from one of my table for each day. But this is affecting the performance. Can anyone suggest a better solution for this. My current query is :
SELECT t.date,COUNT(DISTINCT t.uID) as unique_clicks FROM table_name t
WHERE
NOT EXISTS(
SELECT 1
FROM table_name t2
WHERE
t2.uID = t.uID
AND t2.date < (t.date)
)
GROUP BY t.date
You could try this:
SELECT
t.date, COUNT(DISTINCT t.uID) as unique_clicks
FROM
table_name t LEFT JOIN table_name t1
ON t.uID=t2.uID AND t2.date < t.date
WHERE
t2.uID is NULL
GROUP BY t.date
I think that a join should be faster than an EXISTS clause in this particular situation. Or if I understand the logic correctly, also this:
SELECT min_date, COUNT(*) as unique_clicks
FROM (
SELECT
t.uID, min(t.date) min_date
FROM
table_name t
GROUP BY
t.uID
) s
GROUP BY min_date
Please see fiddle here.