I have a major problem for my Capstone project. I'm joining 2 tables from my database but the output is
MY Query: select Distinct tbl_attendance_in.time_in as time_in,tbl_attendance_out.time_in as time_out from tbl_attendance_in join tbl_attendance_out on tbl_attendance_in.user_id=tbl_attendance_out.user_id where tbl_attendance_in.user_id=4 AND tbl_attendance_out.user_id=4;
I've already tried all sorts of joining like inner right,outer right,inner join but still no luck.
Maybe you can make a sub-query for both of the tables first and generate a row number based on the time_in column ordering. Then you do a LEFT JOIN between those two. Something like this example query below:
SELECT A.user_id,A.time_in,B.time_in AS 'time_out' FROM
/*the first sub-query is for time in table*/
(SELECT user_id, time_in,ROW_NUMBER() OVER (ORDER BY time_in) AS RowN
FROM tbl_attendance_in WHERE user_id=4) A
LEFT JOIN
/*the second sub-query is for time out table*/
(SELECT user_id, time_in,ROW_NUMBER() OVER (ORDER BY time_in) AS RowN
FROM tbl_attendance_out WHERE user_id=4) B
ON A.User_id=B.User_id AND A.RowN=B.RowN;
I'm using ROW_NUMBER() OVER () function to generate a running number for each of the row data then using it in the ON for the LEFT JOIN. Although this can give you the result for your current example, it might not be the best solution considering other factors like date of the time_in and time_out must match, or if there's any duplicate (but valid) timestamp in one of the table.
Another that you can consider is using UNION ALL. Example below:
SELECT 'IN' AS Opr, user_id, time_in AS records,ROW_NUMBER() OVER (ORDER BY time_in) AS RowN
FROM tbl_attendance_in WHERE user_id=4 UNION ALL
SELECT 'OUT' AS Opr, user_id, time_in AS records,ROW_NUMBER() OVER (ORDER BY time_in) AS RowN
FROM tbl_attendance_out WHERE user_id=4 order by RowN, records;
This also uses ROW_NUMBER() function for the ordering and I added Opr column to indicate which operation (or table) its coming from.
Maybe you can try both and see if you can use it. If you have other concern, just edit your question and add more details.
To fiddle around: https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=087e4a7cabb5e50090639b5f3435aaa7
If it multiplies the data then you can use group by at the end of your query.
select Distinct tbl_attendance_in.time_in as time_in,tbl_attendance_out.time_in as time_out from tbl_attendance_in join tbl_attendance_out on tbl_attendance_in.user_id=tbl_attendance_out.user_id where tbl_attendance_in.user_id=4 AND tbl_attendance_out.user_id=4 group by tbl_attendance_in.time_in,tbl_attendance_out.time_in ;
this will group the duplicate rows.
Related
I am trying to create an SQL Query to select rows from a database, ordered by a numerical field, however there are repeated entries in the table.
The table consists of the following columns.
UID - Numerical Unique ID
ACCOUNT_NAME - Account Name, unchanged
NICK_NAME - Can be changed by the user at any time
POINTS - Records points held by the user's account
The goal of the query is to display the Account_Name ordered by Points. However, Account_Name is not unique and can appear multiple times in the table.
To deal with this I would like to display only the latest row for each Account_Name.
This meaning that in the results from the select each Account_Name should only appear once. I am trying to have the selection be decided by the UID, meaning that I want only the row with the greatest UID where each account_name appears to be displayed.
I have tried the following without desired results. (The name of the table is ACCOUNT)
SELECT DISTINCT A.account_name , A.uid, A.points
FROM account A, account B
where A.account_name = B.account_name
and A.points > 0
and A.uid >= B.uid
order by A.points DESC;
This doesn't give me the desired results, specifically, there is an account in the database where an outdated row exists with a high value in the Points column. This record appears as the first result in the select, even though it is outdated.
How would you recommend adjusting this Query to select the desired information?
I hope this is enough information to work off of (first time posting a question) Thank you for you help :)
EDIT: Adding in examples with data.
Sample Table Data:
Sample Table Data
Current Results:
Current Results
Desired Results:
Desired Results
Consider joining on an aggregate query calculating MAX(UID)
SELECT a.account_name, a.uid, a.points
FROM account a
INNER JOIN
(
SELECT account_name, MAX(uid) AS max_uid
FROM account
GROUP BY account_name
) agg
ON a.account_name = agg.account_name
AND a.uid = agg.max_uid)
WHERE a.points > 0
ORDER by a.points DESC;
Alternatively, with MySQL 8.0, consider a window function:
SELECT a.account_name, a.uid, a.points
FROM account a
WHERE a.points > 0
AND a.uid = MAX(a.uid) OVER (PARTITION BY a.account_name)
ORDER by a.points DESC;
I have a table (call_history) with a list of phone calls report, caller_id is the caller and start_date (DATETIME) is the call date. I need to make a report that will show how many people called for the first time for every day. For example:
2013-01-01 - 100
2013-01-02 - 80
2013-01-03 - 90
I have this query that does it perfectly, but it is very slow. There are indexes on both start_date and caller_id columns; is there an alternative way to get this information to speed the process up?
Here is the query:
SELECT SUBSTR(c1.start_date,1,10), COUNT(DISTINCT caller_id)
FROM call_history c1
WHERE NOT EXISTS
(SELECT id
FROM call_history c2
WHERE SUBSTR(c2.start_date,1,10) < SUBSTR(c1.start_date,1,10)
AND c2.caller_id=c1.caller_id)
GROUP BY SUBSTR(start_date,1,10)
ORDER BY SUBSTR(start_date,1,10) desc
The following "WHERE SUBSTR(c2.start_date,1,10)" is breaking your index (you shouldn't perform functions on the left hand side of a where clause)
Try the following instead:
SELECT DATE(c1.start_date), COUNT(caller_id)
FROM call_history c1
LEFT OUTER JOIN call_history c2 on c1.caller_id = c2.caller_id and c2.start_date < c1.start_date
where c2.id is null
GROUP BY DATE(start_date)
ORDER BY start_date desc
Also re-reading your problem, I think this is another way of writing without using NOT EXISTS
SELECT DATE(c1.start_date), COUNT(DISTINCT c1.caller_id)
FROM call_history c1
where start_date =
(select min(start_date) from call_history c2 where c2.caller_id = c1.caller_id)
GROUP BY DATE(start_date)
ORDER BY c1.start_date desc;
You are doing a weird thing - using functions in WHERE, GROUP and ORDER clauses. MySQL will never use indexes when function was applied to calculate condition. So, you can not do anything with this query, but to improve your situation, you should alter your table structure and store your date as DATE column (and single column). Then create index by this column - after this you'll get much better results.
Try to replace the NOT EXISTS with a left outer join.
OK here is the ideal solution,
speed is now 0.01
SELECT first_call_date, COUNT(caller_id) AS caller_count
FROM (
SELECT caller_id, DATE(MIN(start_date)) AS first_call_date
FROM call_history
GROUP BY caller_id
) AS ch
GROUP BY first_call_date
ORDER BY first_call_date DESC
I have two tables, one for downloads and one for uploads. They are almost identical but with some other columns that differs them. I want to generate a list of stats for each date for each item in the table.
I use these two queries but have to merge the data in php after running them. I would like to instead run them in a single query, where it would return the columns from both queries in each row grouped by the date. Sometimes there isn't any download data, only upload data, and in all my previous tries it skipped the row if it couldn't find log data from both rows.
How do I merge these two queries into one, where it would display data even if it's just available in one of the tables?
SELECT DATE(upload_date_added) as upload_date, SUM(upload_size) as upload_traffic, SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
SELECT DATE(download_date_added) as download_date, SUM(download_size) as download_traffic, SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
I want to get result rows like this:
date, upload_traffic, upload_files, download_traffic, download_files
All help appreciated!
Your two queries can be executed and then combined with the UNION cluase along with an extra field to identify Uploads and Downloads on separate lines:
SELECT
'Uploads' TransmissionType,
DATE(upload_date_added) as TransmissionDate,
SUM(upload_size) as TransmissionTraffic,
SUM(upload_files) as TransmittedFileCount
FROM
packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
'Downloads',
DATE(download_date_added),
SUM(download_size),
SUM(download_files)
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC;
Give it a Try !!!
What you're asking can only work for rows that have the same add date for upload and download. In this case I think this SQL should work:
SELECT
DATE(u.upload_date_added) as date,
SUM(u.upload_size) as upload_traffic,
SUM(u.upload_files) as upload_files,
SUM(d.download_size) as download_traffic,
SUM(d.download_files) as download_files
FROM
packages_uploads u, packages_downloads d
WHERE u.upload_date_added = d.download_date_added
AND u.upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY date
ORDER BY date DESC
Without knowing the schema is hard to give the exact answer so please see the following as a concept not a direct answer.
You could try left join, im not sure if the table package exists but the following may be food for thought
SELECT
p.id,
up.date as upload_date
dwn.date as download_date
FROM
package p
LEFT JOIN package_uploads up ON
( up.package_id = p.id WHERE up.upload_date = 'etc' )
LEFT JOIN package_downloads dwn ON
( dwn.package_id = p.id WHERE up.upload_date = 'etc' )
The above will select all the packages and attempt to join and where the value does not join it will return null.
There is number of ways that you can do this. You can join using primary key and foreign key. In case if you do not have relationship between tables,
You can use,
LEFT JOIN / LEFT OUTER JOIN
Returns all records from the left table and the matched
records from the right table. The result is NULL from the
right side when there is no match.
RIGHT JOIN / RIGHT OUTER JOIN
Returns all records from the right table and the matched
records from the left table. The result is NULL from the left
side when there is no match.
FULL OUTER JOIN
Return all records when there is a match in either left or right table records.
UNION
Is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of,
columns The columns must also have similar data types The columns in,
each SELECT statement must also be in the same order.
INNER JOIN
Select records that have matching values in both tables. -this is good for your situation.
INTERSECT
Does not support MySQL.
NATURAL JOIN
All the column names should be matched.
Since you dont need to update these you can create a view from joining tables then you can use less query in your PHP. But views cannot update. And you did not mentioned about relationship between tables. Because of that I have to go with the UNION.
Like this,
CREATE VIEW checkStatus
AS
SELECT
DATE(upload_date_added) as upload_date,
SUM(upload_size) as upload_traffic,
SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
DATE(download_date_added) as download_date,
SUM(download_size) as download_traffic,
SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
Then anywhere you want to select you just need one line:
SELECT * FROM checkStatus
learn more.
I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
Actually the first query will be less efficient (HAVING applied after WHERE).
UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.
Therefore, you can also use:
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN
( SELECT user_id FROM banned_users )
GROUP BY user_id
HAVING COUNT(*) >= 100
There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.
Using LEFT JOIN / NULL :
SELECT fp.user_id
FROM forum_posts AS fp
LEFT JOIN banned_users AS bu
ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Using NOT EXISTS :
SELECT fp.user_id
FROM forum_posts AS fp
WHERE NOT EXISTS
( SELECT *
FROM banned_users AS bu
WHERE bu.user_id = fp.user_id
)
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.
To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.
Alternatively, you may try the following query to avoid using IN:
SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100
Hope this helps.
No it does not gives same results.
Because first query will filter records from count(id) condition
Another query filter records and then apply having clause.
Second Query is correctly written
Here is my data structure
alt text http://luvboy.co.cc/images/db.JPG
when i try this sql
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
group by dc_number
order by rec_id desc;
something is wrong somewhere, idk
I need
rec_id customer_id dc_number balance
2 IHS050018 DC3 -1
3 IHS050018 52 600
I want the recent balance of the customer with respective to dc_number ?
Thanx
There are essentially two ways to get this
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id='IHS050018' and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
Also if you want to get the last balance for each customer you might do
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id=p.customer_id and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
What I consider essentially another way is utilizing the fact that select rec_id with order by desc and limit 1 is equivalent to select max(rec_id) with appropriate group by, in full:
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select max(s.rec_id)
from payments s
group by s.customer_id, s.dc_number
);
This should be faster (if you want the last balance for every customer), since max is normally less expensive then sort (with indexes it might be the same).
Also when written like this the subquery is not correlated (it need not be run for every row of the outer query) which means it will be run only once and the whole query can be rewritten as a join.
Also notice that it might be beneficial to write it as correlated query (by adding where s.customer_id = p.customer_id and s.dc_number = p.dc_number in inner query) depending on the selectivity of the outer query.
This might improve performance, if you look for the last balance of only one or few rows.
I don't think there is a good way to do this in SQL without having window functions (like those in Postgres 8.4). You probably have to iterate over the dataset in your code and get the recent balances that way.
ORDER comes before GROUP:
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
order by rec_id desc
group by dc_number