I have a dataset that looks like this:
+----+-------------+
| ID | StoreVisit |
+----+-------------+
| 1 | Home Depot |
| 2 | Lowes |
| 3 | Home Depot |
| 2 | ACE |
| 2 | Lowes |
| 1 | Home Depot |
| 4 | ACE |
| 5 | ACE |
| 4 | Lowes |
+----+-------------+
I'm new(ish) to SQL and I know I can select all and then either use Excel (pivot table / functions / paste special) or R (tidyr) to transpose.. however, if I have a lot of data, this is not efficient. Is the query below correct? If so, how can I define all values of StoreVisit if there are thousands of types of stores without typing each one in the query?
select * from Stores
pivot (COUNT(StoreVisit) for StoreVisit in ([ACE],[Lowes],[Home Depot])) as StoreCounts
+----+-------+-----------+-----+
| ID | Lowes | HomeDepot | ACE |
+----+-------+-----------+-----+
| 1 | 0 | 2 | 0 |
| 2 | 2 | 0 | 0 |
| 3 | 0 | 1 | 0 |
| 4 | 1 | 0 | 1 |
| 5 | 0 | 0 | 1 |
+----+-------+-----------+-----+
Please excuse the formatting of this post! Many apologies.
Use conditional aggregation:
select id,
sum(storevisit = 'Lowes') as lowes,
sum(storevisit = 'HomeDepot') as HomeDepot,
sum(storevisit = 'Ace') as ace
from t
group by id;
I have two tables tbl_user1 and tbl_user2 both are field name are same but there is no relation between that tables now I want to find total referred count from both table for example...
tbl_user1
-----------------------
UID | referenceBy | firstName | lastName | emailAddress
----------------------------------------------------------------------------
1 | NULL | aa1 | ab1 | aa1#email.com
2 | aa1#email.com | aa2 | ab2 | aa2#email.com
3 | NULL | aa3 | ab3 | aa3#email.com
4 | aa2#email.com | aa4 | ab4 | aa4#email.com
5 | aa2#email.com | aa5 | ab5 | aa5#email.com
6 | bb1#email.com | aa6 | ab6 | aa6#email.com
7 | bb2#email.com | aa7 | ab7 | aa7#email.com
8 | bb3#email.com | aa8 | ab8 | aa8#email.com
9 | bb3#email.com | aa9 | ab9 | aa9#email.com
and second one table is somthing like...
tbl_user2
-----------------------
UID | referenceBy | firstName | lastName | emailAddress
----------------------------------------------------------------------------
1 | NULL | bb1 | bc1 | bb1#email.com
2 | bb1#email.com | bb2 | bc2 | bb2#email.com
3 | NULL | bb3 | bc3 | bb3#email.com
4 | bb3#email.com | bb4 | bc4 | bb4#email.com
5 | bb2#email.com | bb5 | bc5 | bb5#email.com
6 | bb1#email.com | bb6 | bc6 | bb6#email.com
7 | aa2#email.com | bb7 | bc7 | bb7#email.com
8 | aa3#email.com | bb8 | bc8 | bb8#email.com
9 | bb5#email.com | bb9 | bc9 | bb9#email.com
now, as you can see there is no relation between these two tables and I want result like following..
MAIN_RESULT_THAT_I_WANT
-----------------------
referenceEmail | referenceEmailCount
----------------------------------------------------------------------------
aa1#email.com | 1
aa2#email.com | 3
aa3#email.com | 1
aa4#email.com | 0
aa5#email.com | 0
aa6#email.com | 0
aa7#email.com | 0
aa8#email.com | 0
aa9#email.com | 0
bb1#email.com | 3
bb2#email.com | 2
bb3#email.com | 3
bb4#email.com | 0
bb5#email.com | 1
bb6#email.com | 0
bb7#email.com | 0
bb8#email.com | 0
bb9#email.com | 0
here in result all emailAddress of all user and total of how many user(s) registered by that particular emailAddress.
I am guessing that the result you want is just copy and pasted since it seems inaccurate. Like HoneyBadger says it is strange that aa6 is missing and still in the result, that indicates you have another list you are not telling us about? Or you just write the result in notepad...
If you just want a list of emails and count this will work:
select referenceBy, count(1) as referenceEmailCount from (
select referenceBy from tbl_user1
union all
select referenceBy from tbl_user2
) as t
group by referenceBy
Otherwise give us more info if this is not what you need.
Since the schema is same for 2 tables so you can perform union to get combined results and can perform an outer query to get the total count.
select referenceEmail, count(*) as referenceEmailCount from (
select * from table1
union all
select * from table2
) as alias
group by alias.referenceEmail
I have 2 tables bellow
0 --> Pending
1 --> Success
2 --> Fail
table : mntnc
+-------+-------+-------+
| id | own | sts |
+-------+-------+-------+
| 1 | BN | 1 |
| 2 | BB | 2 |
| 3 | BN | 1 |
| 4 | BD | 1 |
| 5 | BD | 0 |
table : istlsi
+-------+-------+-------+
| id | own | sts |
+-------+-------+-------+
| 1 | BN | 1 |
| 2 | BB | 1 |
| 3 | BB | 1 |
| 4 | BC | 0 |
| 5 | BD | 2 |
of the two tables above, I want to add both of them to be the table below
+-------+-----------+-----------+-----------+
| own | success | fail | pending |
+-------+-----------+-----------+-----------+
| BN | 3 | 0 | 0 |
| BB | 2 | 1 | 0 |
| BD | 1 | 1 | 1 |
| BC | 0 | 0 | 1 |
The two key points here:
Union tables (I aliased result to B)
Use sum(case...) for each column.
First we union both tables together as an inline view.
We then use a case statement for each desired column and evaluate the status setting the value to 1 or 0 depending on sts value. and then sum those...
SELECT own
, sum(case when sts=1 then 1 else 0 end) as Success
, sum(case when sts=2 then 1 else 0 end) as Fail
, sum(case when sts=0 then 1 else 0 end) as Pending
FROM ( SELECT ID, own, sts
FROM mntnc
UNION ALL
SELECT id, own, sts
FROM istlsi
) B
GROUP BY own
I'm building a website for our ball team for the fun of it and keeping track of stats using PHP and SQL for the database. I've learned both by reading the manuals and through forums. I'm working on building a query that will display the current longest hitting streak. I stumbled across a page about detecting runs and streaks and am trying to work with that. I'm really new to all this stuff, so maybe I've structured my tables incorrectly.
Table "games"
+--------+------------+------+
| GameID | Date | Time |
+--------+------------+------+
| 1 | 2015/08/19 | 6:30 |
| 2 | 2015/08/20 | 6:30 |
| 3 | 2015/08/22 | 6:30 |
| 4 | 2015/08/24 | 8:00 |
| 5 | 2015/08/24 | 6:30 |
| 6 | 2015/07/15 | 8:00 |
+--------+------------+------+
Table "player"
+--------+----+---+
| GameID | AB | H |
+--------+----+---+
| 1 | 3 | 1 |
| 2 | 4 | 2 |
| 3 | 2 | 0 |
| 4 | 3 | 0 |
| 5 | 2 | 1 |
| 6 | 3 | 0 |
+--------+----+---+
Code
SELECT games.GameID, GR.H,
(SELECT COUNT(*)
FROM player G
WHERE (CASE WHEN G.H > 0 THEN 1 ELSE 0 END) <> (CASE WHEN GR.H > 0 THEN 1 ELSE 0 END)
AND G.GameID <= GR.GameID) as RunGroup
FROM player GR
INNER JOIN games
ON GR.gameID = games.GameID
ORDER BY Date ASC, Time ASC
Basically in order to correctly get the hit streak right, I need to reorder the GameIDs on the "player" table based on the Date (ASC) and Time (ASC) on the "games" table before executing the RunGroup part of the code. Obviously by adding the ORDER BY, everything gets sorted only after the RunGroup has finished querying and results in incorrect data. I've been stuck here for a few days and now need some help.
The Result I currently get is:
+--------+---+----------+
| GameID | H | RunGroup |
+--------+---+----------+
| 6 | 0 | 3 |
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 3 | 0 | 2 |
| 5 | 1 | 2 |
| 4 | 0 | 2 |
+--------+---+----------+
This is what I'm trying to achieve:
+--------+---+----------+
| GameID | H | RunGroup |
+--------+---+----------+
| 6 | 0 | 0 |
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 0 | 2 |
| 5 | 1 | 2 |
| 4 | 0 | 3 |
+--------+---+----------+
Thanks
Consider the following:
DROP TABLE IF EXISTS games;
CREATE TABLE games
(game_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,date_played DATETIME NOT NULL
);
INSERT INTO games VALUES
(1,'2015/08/19 18:30:00'),
(2,'2015/08/20 18:30:00'),
(3,'2015/08/22 18:30:00'),
(4,'2015/08/24 20:00:00'),
(5,'2015/08/24 18:30:00'),
(6,'2015/07/15 20:00:00');
DROP TABLE IF EXISTS stats;
CREATE TABLE stats
(player_id INT NOT NULL
,game_id INT NOT NULL
,at_bat INT NOT NULL
,hits INT NOT NULL
,PRIMARY KEY(player_id,game_id)
);
INSERT INTO stats VALUES
(1,1,3,1),
(1,2,4,2),
(1,3,2,0),
(1,4,3,0),
(1,5,2,1),
(1,6,3,0),
(2,1,2,1),
(2,2,3,2),
(2,3,3,0),
(2,4,3,1),
(2,5,2,1),
(2,6,3,0);
SELECT x.*
, SUM(y.at_bat) runningAB
, SUM(y.hits) runningH
, SUM(y.hits)/SUM(y.at_bat) BA
FROM
(
SELECT s.*, g.date_played FROM stats s JOIN games g ON g.game_id = s.game_id
) x
JOIN
(
SELECT s.*, g.date_played FROM stats s JOIN games g ON g.game_id = s.game_id
) y
ON y.player_id = x.player_id
AND y.date_played <= x.date_played
GROUP
BY x.player_id
, x.date_played;
+-----------+---------+--------+------+---------------------+-----------+----------+--------+
| player_id | game_id | at_bat | hits | date_played | runningAB | runningH | BA |
+-----------+---------+--------+------+---------------------+-----------+----------+--------+
| 1 | 6 | 3 | 0 | 2015-07-15 20:00:00 | 3 | 0 | 0.0000 |
| 1 | 1 | 3 | 1 | 2015-08-19 18:30:00 | 6 | 1 | 0.1667 |
| 1 | 2 | 4 | 2 | 2015-08-20 18:30:00 | 10 | 3 | 0.3000 |
| 1 | 3 | 2 | 0 | 2015-08-22 18:30:00 | 12 | 3 | 0.2500 |
| 1 | 5 | 2 | 1 | 2015-08-24 18:30:00 | 14 | 4 | 0.2857 |
| 1 | 4 | 3 | 0 | 2015-08-24 20:00:00 | 17 | 4 | 0.2353 |
| 2 | 6 | 3 | 0 | 2015-07-15 20:00:00 | 3 | 0 | 0.0000 |
| 2 | 1 | 2 | 1 | 2015-08-19 18:30:00 | 5 | 1 | 0.2000 |
| 2 | 2 | 3 | 2 | 2015-08-20 18:30:00 | 8 | 3 | 0.3750 |
| 2 | 3 | 3 | 0 | 2015-08-22 18:30:00 | 11 | 3 | 0.2727 |
| 2 | 5 | 2 | 1 | 2015-08-24 18:30:00 | 13 | 4 | 0.3077 |
| 2 | 4 | 3 | 1 | 2015-08-24 20:00:00 | 16 | 5 | 0.3125 |
+-----------+---------+--------+------+---------------------+-----------+----------+--------+
I rebuilt my database to have only one table to contain the stats from all players. From there i was able to use this query to find my longest current hitting streak for a certain player.
SELECT *
FROM (SELECT (CASE WHEN h > 0 THEN 1 ELSE 0 END) As H, MIN(date_played) as StartDate,
MAX(date_played) as EndDate, COUNT(*) as Games
FROM (SELECT date_played, (CASE WHEN h > 0 THEN 1 ELSE 0 END) as H, (SELECT COUNT(*)
FROM stats G WHERE ((CASE WHEN G.h > 0 THEN 1 ELSE 0 END) <> (CASE WHEN GR.h > 0 THEN 1 ELSE 0 END))
AND G.date_played <= GR.date_played AND player_id = 13) as RunGroup
FROM stats GR
WHERE player_id = 13) A
GROUP BY H, RunGroup
ORDER BY Min(date_played)) A
WHERE H = 1
ORDER BY Games DESC
LIMIT 1
I've the following table:
| id | Name | Date of Birth | Date of Death | Result |
| 1 | John | 3546565 | 3548987 | |
| 2 | Mary | 5233654 | 5265458 | |
| 3 | Lewis| 6546876 | 6548752 | |
| 4 | Mark | 6546546 | 6767767 | |
| 5 | Steve| 6546877 | 6548798 | |
And I need to do this for the whole table:
Result = 1, if( current_row(Date of Birth) - row_above_current_row(Date of Death))>X else 0
To make things easier, I guess, I created the same table above but with 2 extra id fields: id_minus_one and id_plus_one
Like this:
| id | id_minus_one | id_plus_one |Name | Date_of_Birth | Date_of_Death | Result |
| 1 | 0 | 2 |John | 3546565 | 3548987 | |
| 2 | 1 | 3 |Mary | 5233654 | 5265458 | |
| 3 | 2 | 4 |Lewis| 6546876 | 6548752 | |
| 4 | 3 | 5 |Mark | 6546546 | 6767767 | |
| 5 | 4 | 6 |Steve| 6546877 | 6548798 | |
So my approach would be something like (in pseudo code):
for id=1, ignore result. (Because there is no row above)
for id=2, Result = 1 if( (Where id=2).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death )>X else 0
for id=3, Result = 1 if( (Where id=3).Date_of_Birth - (where id_minus_one=id-1).Date_of_Death)>X else 0
and so on for the whole table...
Just ignore id_plus_one if there is no need for it, I'll use it later for the same thing. So, if I manage to do this for id_minus_one I'll manage for id_plus_one as they are the same algorithm.
My question is how to pass that pseudo code into SQL code, I can't find a way to relate both ids in just one select.
Thank you!
As you describe this, it is just a self join with some logic on the select:
select t.*,
((t.date_of_birth - tprev.date_of_death) > x) as flag
from t left outer join
t tprev
on t.id_minus_one = tprev.id