I've got a table of crime data. In a simplified version, it would look like this:
Table Headings:
crime_id, neighborhood, offense
Table Data:
- 1, Old Town, robbery
- 2, Bad Town, theft
- 3, Bad Town, theft
- 4, Uptown, stolen auto
If I SELECT * FROM mytable WHERE offense ='theft', then the results for Bad Town are returned. But, I'm making a ranking, so what I'm really interested in is:
Old Town: 0
Bad Town: theft
Bad Town: theft
Uptown: 0
How do I write a SELECT statement that returns cases where there are thefts, but also returns neighborhoods that don't have an entry for the specified offense?
UPDATE: This my actual SELECT. I'm having problems applying the solution that p.campbell and Gratzy were so kind to post to this SELECT. How do I apply the CASE statement with the COUNT(*)?
SELECT
cbn.neighborhoods AS neighborhoods,
COUNT(*) AS offenses,
TRUNCATE(((na.neighborhood_area_in_sq_meters /1000) * 0.000386102159),2) AS sq_miles,
( COUNT(*) / ((na.neighborhood_area_in_sq_meters /1000) * 0.000386102159) ) AS offenses_per_sq_mile
FROM
wp_crime_by_neighborhood cbn, wp_neighborhood_area na
WHERE
cbn.offense='theft'
AND
cbn.neighborhoods = na.neighborhoods
GROUP BY
cbn.neighborhoods
ORDER BY
offenses_per_sq_mile DESC
If you're looking to make a ranking, wouldn't it be better to get the number of thefts in Bad Town rather than a row for each? Something like this:
select distinct mt.neighborhood, ifnull(total, 0)
from mytable mt
left join (
select neighborhood, count(*) as total
from mytable
where offense = 'theft'
group by neighborhood
) as t on t.neighborhood = mt.neighborhood
Based on the data you gave, this query should return:
Old Town: 0
Bad Town: 2
Uptown: 0
That seems more useful to me for making a ranking. You can easily throw an order by on there.
I would think using a case statement should do it.
http://dev.mysql.com/doc/refman/5.0/en/case-statement.html
something like
Select neighborhood,
case offense when 'theft' then offense else '0' end case
from table
Try this:
SELECT cbn.neighborhoods AS neighborhoods,
CASE WHEN IFNULL(COUNT(*),0) > 0 THEN CONCAT(COUNT(*), ' ', offense)
ELSE '0'
END AS offenses
--- ... and the rest of your query
FROM wp_crime_by_neighborhood cbn
INNER JOIN wp_neighborhood_area na
ON cbn.neighborhoods = na.neighborhoods
WHERE cbn.offense='theft'
GROUP BY cbn.neighborhoods
--ORDER BY offenses_per_sq_mile DESC
Related
I have a task, but couldn't solve it:
There are truckers and they have to travel between cities.
We have data of these travels in our database in 2 tables:
trucker_traffic
tt_id (key)
date
starting_point_coordinate
destination_coordinate
traveller_id
event_type ('travel', 'accident')
parent_event_id (For 'accident' event type it's tt_id of the original travel. There might be few accidents within one travel.)
trucker_places
coordinate (key)
country
city
I need SQL query to pull the number of all unique truckers who travelled more than once from or to London city in June 2020.
In the same query pull the number of these travels who got into an accident.
Example of my tries
SELECT
count(distinct(tt.traveller_id)),
FROM trucker_traffic tt
JOIN trucker_places tp
ON tt.starting_point_coordinate = tp.coordinate
OR tt.destination_coordinate = tp.coordinate
WHERE
tp.city = 'London'
AND month(tt.date) = 6
AND year(tt.date) = 2020
GROUP BY tt.traveller_id
HAVING count(tt.tt_id) > 1
But it's select count distinct truckers with grouping and works only if I had one tracker in db
For second part of task (where I have select number of travels with accident - I think that good to use function like this
SUM(if(count(tt_id = parent_event_id),1,0))
But I'm not sure
This is rather complicated, so make sure you do this step by step. WITH clauses help with this.
Steps
Find travels from and to London in June 2020. You can use IN or EXISTS in order to see whether a travel had accidents.
Group the London travels by traveller, count travels and accident travels and only keep those travellers with more than one travel.
Take this result set to count the travellers and sum up their travels.
Query
with london_travels as
(
select
traveller_id,
case when tt_id in
(select parent_event_id from trucker_traffic where event_type = 'accident')
then 1 else 0 end as accident
from trucker_traffic tt
where event_type = 'travel'
and month(tt.date) = 6
and year(tt.date) = 2020
and exists
(
select
from trucker_places tp
where tp.coordinate in (tt.starting_point_coordinate, tt.destination_coordinate)
and tp.city = 'London'
)
)
, london_travellers as
(
select
traveller_id,
count(*) as travels,
sum(accident) as accident_travels
from london_travels
group by traveller_id
having count(*) > 1;
)
select
count(*) as total_travellers,
sum(travels) as total_travels,
sum(accident_travels) as total_accident_travels
from london_travellers;
If your MySQL version doesn't support WITH clauses, you can of course just nest the queries. I.e.
with a as (...), b as (... from a) select * from b;
becomes
select * from (... from (...) a) b;
You say in the request title that you don't want GROUP BY in the query. This is possible, but makes the query more complicated. If you want to do this I leave this as a task for you. Hint: You can select travellers and count in subqueries per traveller.
I need to select all entries that do not start with a number between 1-9.
Example Entries:
6300 Dog Lane
Kitty Drive
500 Bird Chrest
800 Tire Road
Johnson Ave
Park Ave
So if I ran a query on the above, I would expect:
Kitty Drive
Johnson Ave
Park Ave
The table is called objects and the column is called location.
Something I tried:
SELECT DISTINCT name, location FROM object WHERE location NOT LIKE '1%' OR '2%' OR '3%' OR '4%' OR '5%' OR '6%' OR '7%' OR '8%' OR '9%';
Unfortunately, that is unsuccessful. Is this possible? If no, I will resort to modifying the data with Perl.
Try this:
SELECT DISTINCT name, location FROM object
WHERE substring(location, 1, 1)
NOT IN ('1','2','3','4','5','6','7','8','9');
or you have to add NOT LIKE before every number:
SELECT DISTINCT name, location FROM object
WHERE location NOT LIKE '1%'
OR location NOT LIKE '2%'
...
You can use the following stntax:
SELECT column FROM TABLE where column NOT REGEXP '^[0-9]+$' ;
SELECT DISTINCT name, location FROM object
WHERE location NOT REGEXP '^[0-9]+$' ;
Try this. It's simpler:
SELECT DISTINCT name, location FROM object WHERE location NOT LIKE '[0-9]%';
What you "tried" needed to have AND instead of OR. Also, DISTINCT is unnecessary.
If you have
INDEX(location)
this would probably be faster than any of the other answers:
( SELECT name, location FROM object
WHERE location < '1'
) UNION ALL
( SELECT name, location FROM object
WHERE location >= CHAR(ORD('9' + 1)) )
This technique only works for contiguous ranges of initial letters, such as 1..9.
A somewhat related question: Should I perform regex filtering in MySQL or PHP? -- it asks about fetching rows starting with 1..9 instead of the opposite.
Try this for SQLÂ Server:
select column_name
from table
where substring(column_name,1,1) not in (1,2,3,4,5,6,7,8,9)
ISNUMERIC should work. (will exclude 0 as well).
Sample code -
ISNUMERIC(SUBSTRING(location, 1, 1)) = 0
I am trying to figure out website visits. Every visit within 30 minutes should count as one visit for that user.
My table looks like this
TimeUser, Userid, OrderID
10/7/2013 14:37:14 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/7/2013 14:39:59 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/7/2013 14:40:35 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/11/2013 0:09:23 _2MrGz4L_d5AF3UHpP-oJQ 151078706
10/2/2013 20:55:05 _4Pb2wEwiQomUny_XwVuvQ 151078706
10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706
10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706
In this case 151078706 should return 3 visits.
I think my SQL query looks right, but when I check my answer with my Excel created Visits number, some of orders off by 5%. I am hundred percent sure Excel numbers are correct.
Here is what I have so far. If anyone sees any issue with my query please correct me. And also if there any other better ways to find visits?
SET #row_num=0,
#temp_row=1;
SELECT orderidtable.orders,
count(orderidtable.users)
FROM
(SELECT temptab.temprow,
temptab.userid users,
temptab.orderid orders,
temptab.TimeUser
FROM
(SELECT #row_num := #row_num + 1 AS rownumber, TimeUser,
userid,
orderid
FROM order.order_dec
ORDER BY orderid,
userid,
timeuser) subtable ,
(SELECT #temp_row:= #temp_row+1 AS temprow, Timeuser,
userid,
orderid
FROM
ORDER.order_dec
ORDER BY orderid,
userid,
timeuser) temptab
WHERE (subtable.rownumber=temptab.temprow
AND abs(Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))>=1800)
OR (subtable.rownumber=temptab.temprow
AND subtable.userid<>temptab.userid)
OR (subtable.rownumber=temptab.temprow
AND subtable.orderid<>temptab.orderid)) orderidtable
GROUP BY orderidtable.orders
Numbering the rows is a right strategy; your query is going wrong in where condition.
Algorithm to solve it would be:
Number the rows ordering by orderid, userid, timeuser. Make two copies (subtable and temptable) of this dataset as you are already doing.
Join these tables on following condition:
subtable.rownumber =temptab.temprow + 1
What we trying to do here is to join the tables in a manner such that a row of subtable joins with a row of temptable with rownumber 1 lesser than its own. We are doing it to be capable of comparing consecutive time of visits of an user to an Ad. (You have already done it by setting #row_num=0, #temp_row=1). This is the only condition we should apply to the JOIN.
Now in the SELECT statement use CASE statement like below
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 1800 THEN 0
ELSE 1) As IsVisit
Now in an outer query GROUP BY order_id and in SELECT sum up IsVisit.
Let me know should you need more clarity or let me know if it worked.
Addendum:
From the previous query you can try replacing the where condition as subtable.rownumber = temptab.temprow + 4 and in SELECT statement replace the CASE statement of above query with the following:
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 900 THEN 1
ELSE 0) As IsVisit
Take UNION of the result set returned by previous query and this one, and then apply GROUP BY.
One issue I see: Your query is overly complex.
What about this?
Now then, both your original and this query will err when there's a visit near midnight, and another visit right shortly after it - in this case, both queries will count them as 2 visits when they really should be counted as one, if I understood your request correctly. From this simplified query, though, it should be easy for you to do the required change.
SELECT orderidtable.OrderID, COUNT(orderidtable.UserID) visits
FROM (
SELECT Timeuser, Userid, OrderID
FROM order.order_dec SubTab1
WHERE NOT EXISTS (
SELECT 1 FROM order.order_dec SubTab2
WHERE SubTab1.OrderID = SubTab2.OrderID
AND SubTab2.TimeUser > SubTab2.TimeUser
AND Time_To_Sec(SubTab2.TimeUser)
BETWEEN Time_To_Sec(SubTab1.OrderID)
AND Time_To_Sec(SubTab1.OrderID)+1800
)
) orderidtable
GROUP BY orderidtable.OrderID
I think just one time table full scan is sufficient for what you want as follows.
You can test here. http://www.sqlfiddle.com/#!2/a5dbcd/1.
Although my Query is not tested on many sample data, I think minor change is needed if it has bugs.
SELECT MAX(current_uv) AS uv
FROM (
SELECT orderid, userid, timeuser,
IF(orderid != #prev_orderid, #prev_timeuser := 0, #prev_timeuser) AS prev_timeuser,
#prev_orderid := orderid AS prev_orderid,
IF(userid != #prev_userid, #prev_timeuser := 0, #prev_timeuser) AS prev_timeuser2,
#prev_userid := userid AS prev_userid,
IF(TO_SECONDS(timeuser) - #prev_timeuser > 1800, #current_uv := #current_uv + 1, #current_uv) AS current_uv,
#prev_timeuser := TO_SECONDS(timeuser) AS prev_timeuser3
FROM order_dec,
(SELECT #prev_orderid := 0, #prev_userid = '', #prev_timeuser := 0, #current_uv := 0) init
ORDER BY orderid, userid, timeuser
) x;
Let's say I have a hypothetical table like so that records when some player in some game scores a point:
name points
------------
bob 10
mike 03
mike 04
bob 06
How would I get the sum of each player's scores and display them side by side in one query?
Total Points Table
bob mike
16 07
My (pseudo)-query is:
SELECT sum(points) as "Bob" WHERE name="bob",
sum(points) as "Mike" WHERE name="mike"
FROM score_table
You can pivot your data 'manually':
SELECT SUM(CASE WHEN name='bob' THEN points END) as bob,
SUM(CASE WHEN name='mike' THEN points END) as mike
FROM score_table
but this will not work if the list of your players is dynamic.
In pure sql:
SELECT
sum( (name = 'bob') * points) as Bob,
sum( (name = 'mike') * points) as Mike,
-- etc
FROM score_table;
This neat solution works because of mysql's booleans evaluating as 1 for true and 0 for false, allowing you to multiply truth of a test with a numeric column. I've used it lots of times for "pivots" and I like the brevity.
Are the player names all known up front? If so, you can do:
SELECT SUM(CASE WHEN name = 'bob' THEN points ELSE 0 END) AS bob,
SUM(CASE WHEN name = 'mike' THEN points ELSE 0 END) AS mike,
... so on for each player ...
FROM score_table
If you don't, you still might be able to use the same method, but you'd probably have to build the query dynamically. Basically, you'd SELECT DISTINCT name ..., then use that result set to build each of the CASE statements, then execute the result SQL.
This is called pivoting the table:
SELECT SUM(IF(name = "Bob", points, 0)) AS points_bob,
SUM(IF(name = "Mike", points, 0)) AS points_mike
FROM score_table
SELECT sum(points), name
FROM `table`
GROUP BY name
Or for the pivot
SELECT sum(if(name = 'mike',points,0)),
sum(if(name = 'bob',points,0))
FROM `table
you can use pivot function also for the same thing .. even by performance vise it is better option to use pivot for pivoting... (i am talking about oracle database)..
you can use following query for this as well..
-- (if you have only these two column in you table then it will be good to see output else for other additional column you will get null values)
select * from game_scores
pivot (sum(points) for name in ('BOB' BOB, 'mike' MIKE));
in this query you will get data very fast and you have to add or remove player name only one place
:)
if you have more then these two column in your table then you can use following query
WITH pivot_data AS (
SELECT points,name
FROM game_scores
)
SELECT *
FROM pivot_data
pivot (sum(points) for name in ('BOB' BOB, 'mike' MIKE));
I have a table with a field called 'user_car'. It consists of a cat'd underscore separated value (user's id _ car's id)
user_car rating
-----------------------------
11_56748 4
13_23939 2
1_56748 1
2001_56748 5
163_23939 1
I need to get the average rating for any "car". In my example table, there are only 2 cars listed: 56748 and 23939. So say I want to get the average rating for the car: 56748, so far I have this SQL, but I need the correct regex. If I'm totally off-base, let me know. Thanks!
$sql = "
SELECT AVG 'rating' FROM 'car_ratings'
WHERE 'user_car' REGEXP '';
";
You can extract the car id using:
substring(user_car from (locate('_', user_car) + 1))
this will allow you to do:
select substring(user_car from (locate('_', user_car) + 1)) as car_id,
avg(rating)
from car_ratings
group by car_id
But, this is a bad idea. You would be much better off splitting user_car into user_id and car_id.
I don't see why you need to use REGEXes ...
SELECT AVG(`rating`) FROM `car_ratings` WHERE `user_car` LIKE '%_56748'
Regexes are slow and can pretty easily shoot you in the foot. I learned to avoid them in MySQL whenever I could.