MySQL using GROUP BY to group by multiple columns - mysql

I'd like to use GROUP BY multiple columns, I think it's best to start with an example:
SELECT
eventsviews.eventId,
showsActive.showId,
showsActive.venueId,
COUNT(*) AS count
FROM eventsviews
INNER JOIN events ON events.eventId = eventsviews.eventId
INNER JOIN showsActive ON showsActive.eventId = eventsviews.eventId
WHERE events.status = 1
GROUP BY showsActive.venueId, showsActive.showId, showsActive.eventId
ORDER BY count DESC
LIMIT 100;
Output:
| *eventId* | *showId* | *venueId* | *count* |
+-----------+----------+-----------+---------+
[...snip...]
| 95 | 92099 | 9770 | 32 |
| 95 | 105472 | 10702 | 32 |
| 3804 | 41225 | 8165 | 17 |
| 3804 | 41226 | 8165 | 17 |
| 923 | 2866 | 5451 | 14 |
| 923 | 20184 | 5930 | 14 |
[...snip...]
What I would like instead:
| *eventId* | *showId* | *venueId* | *count* |
+-----------+----------+-----------+---------+
| 95 | 92099 | 9770 | 32 |
| 3804 | 41226 | 8165 | 17 |
| 923 | 20184 | 5930 | 14 |
So, I want my data grouped by eventId, but only once for each showId and venueId ...
I actually have a SQL query that does that, but it has 8 subqueries and is as slow as a T-Ford ... And since this is executed on every page load, speeding things up looks like a good idea!
There are a few questions like this, and I've tried many different things, but I've been at this query for an hour and I can't seem to get it to work as I want :-(
Thanks!

You probably want either a min or a max on showid, and then not include it in the group by, I can't tell which because looking at your "prefered" resultset, you have both.

If you want your data grouped by eventId, group just by eventId and you'll get exactly the result you're looking for.
This is a MySQL feature (?) that it allows you to select non-aggregate columns, in which case it will return the first row available. In other DBMS it's achieved by DISTINCT ON, which is not available in MySQL.

Related

SQL: How to get last date in groupby for getting unique records?

I am working on a data where I have to use multiple joins and figures out that one of the table is producing duplicates as I applied Group by on dates as well and b/c of different dates my query takes in duplicate values.
I wrote following query
SELECT
ll.ID,
ll.EST_DT
gg.col1 ,
ll.EST_CLAIM_DT,
gg.col2
FROM table gg
inner join
(select substr(ID,1,instr(ID,'-',7)-1) EST_ID,
max(est_dt) as EST_DT,
max(EST_CLAIM_DT) as EST_CLAIM_DT
from table group by substr(gg.ID,1,instr(ID,'-',7)-1)) ll
on substr(ID,1,instr(gg.ID,'-',7)-1)=substr(ll.ID,1,instr(ll.ID,'-',7)-1)
GROUP BY
ll.ID,
ll.EST_DT
gg.col1 ,
ll.EST_CLAIM_DT,
gg.col2
Table looks like this:
+-----------------+------------+----------------+------+------+
| ID | est_date | est_claimed_dt | col1 | col2 |
+-----------------+------------+----------------+------+------+
| EST-U-1040452-1 | 28/02/2019 | 28/02/2019 | 50 | 50 |
| EST-U-1040452-2 | 5/10/2020 | 5/10/2020 | 50 | 50 |
+-----------------+------------+----------------+------+------+
Desired output
+---------+-----------+----------------+------+------+
| ID | est_date | est_claimed_dt | col1 | col2 |
+---------+-----------+----------------+------+------+
| 1040452 | 5/10/2020 | 5/10/2020 | 50 | 50 |
+---------+-----------+----------------+------+------+
I get this error as well
Negative sub string length not allowed
P.S. I have search SO for this issue and it helped but couldn't get it to work.

MySQL Count within an IF

+-------------+--------------+----------+-------+
| ticketRefNo | nameOnTicket | boughtBy | event |
+-------------+--------------+----------+-------+
| 38 | J XXXXXXXXX | 2 | 13 |
| 39 | C YYYYYYY | 1 | 13 |
| 40 | M ZZZZZZZZZZ | 3 | 14 |
| 41 | C AAAAAAA | 3 | 15 |
| 42 | D BBBBBB | 3 | 16 |
| 43 | A CCCCC | 3 | 17 |
+-------------+--------------+----------+-------+
+-------------+------------------+--------------+---------------------+--------+
| ticketRefNo | cardNo | cardHolder | exp | issuer |
+-------------+------------------+--------------+---------------------+--------+
| 38 | 4444111133332222 | J McKenny | 2016-01-01 00:00:00 | BOS |
| 39 | 4434111133332222 | C Dempsey | 2016-04-01 00:00:00 | BOS |
| 40 | 4244111133332222 | M Gunn-Davis | 2018-02-01 00:00:00 | RBS |
+-------------+------------------+--------------+---------------------+--------+
+-------------+-------------+----------+
| ticketRefNo | boxOfficeID | paidWith |
+-------------+-------------+----------+
| 41 | 1 | card |
| 42 | 2 | cash |
| 43 | 3 | chequ |
+-------------+-------------+----------+
I have a database with the data shown above. It represents a ticket-buying system. I would like to be able to see a list of tickets bought with the name of the event and either the boxOfficeID or the issuer of the debit card.
I have tried running the following code, to no avail.
SELECT t.ticketRefNo AS 'Reference', t.event AS 'Event',
IF(COUNT(SELECT * FROM Online WHERE t.ticketRefNo=o.ticketRefNo;) >= 1,
o.issuer, InPerson.boxOfficeID) AS 'Card Issuer or Box Office'
FROM Ticket AS t, InPerson, Online AS o
WHERE t.ticketRefNo=o.ticketRefNo;
Cheers in advance!
Some notes: the semicolon character isn't valid syntax; if you have a need to delimit the subquery, wrap it in parens. Escape column aliases like you'd escape any other identifier: use backticks, not single quotes. Single quotes are used around string literals.
Assuming that issuer in the Online table is NOT NULL, and assuming that ticketRefNo is unique in both the Online and InPerson tables, you could do something like this:
SELECT t.ticketRefNo AS `Reference`
, t.event AS `Event`
, IF(o.ticketRefNo IS NOT NULL,o.issuer,i.boxOfficeId)
AS `Card Issuer or Box Office`
FROM Ticket t
LEFT
JOIN InPerson i
ON i.ticketRefNo = t.ticketRefNo
LEFT
JOIN Online o
ON o.ticketRefNo = t.ticketRefNo
Use outer join operations to find matching rows in the InPerson and Online tables, and use a conditional test to see if you got a matching row from the Online table. A NULL will be returned if there wasn't a matching row found.
It's not a good idea to have one column JOINing to two different tables with some values in each of the two tables.
But here goes anyway:
( SELECT ... FROM Ticket t JOIN InPerson x USING(ticketRefNo) ... )
UNION ALL
( SELECT ... FROM Ticket t JOIN Online x USING(ticketRefNo) ... )
ORDER BY ...
The ALL assumes that InPerson and Online never have any overlapping ticketRefNos.
The ORDER BY an the end is in case you want to sort things, although I see no need for it in your attempted SELECT.
The two SELECTs must have the same number of columns.

SQL COUNT query similar to UNION but with results in multiple columns

I have a single-table SQL database built from DHCPD logs, structured as below:
+------+-------+------+----------+---------+-------------------+-----------------+
| id | Month | Day | Time | Type | MAC | ClientIP |
+------+-------+------+----------+---------+-------------------+-----------------+
| 9305 | Nov | 24 | 03:20:00 | DHCPACK | 00:04:f2:4b:dd:51 | 10.123.246.116 |
| 9307 | Nov | 24 | 03:20:07 | DHCPACK | 00:04:f2:99:4c:ba | 10.123.154.176 |
| 9310 | Nov | 24 | 03:20:08 | DHCPACK | 00:19:bb:cf:cd:28 | 10.99.107.3 |
| 9311 | Nov | 24 | 03:20:08 | DHCPACK | 00:19:bb:cf:cd:28 | 10.99.107.3 |
Every DHCP event from the log will eventually make its way into this database, so events from any point in time will be potentially used in the construction of graphs. To make use of the data for graphing, I need to be able to create an output table with multiple columns, but with values derived from a count of those in a single column matching a specific pattern.
The closest thing I've managed to come up with is this query:
select 'Data' as ClientIP, count(*) from Log where ClientIP like '10.99%' and MAC like '00:04:f2%'
union
select 'Voice' as ClientIP, count(*) from Log where ClientIP like '10.123%' and MAC like '00:04:f2%';
Which yields the following result:
+-----------+-------+
| ClientIP | Count |
+-----------+-------+
| Data | 4618 |
| Voice | 13876 |
+-----------+-------+
Fine for a one-off query, but I want to take those two rows, turn them into two columns, and run the same query with one row per hour (for instance). I want something like this:
+------+-------+------+
| Hour | Voice | Data |
+------+-------+------+
| 03 | 22 | 4 |
| 04 | 123 | 23 |
| 05 | 45 | 5 |
Any advice is greatly welcomed.
Thanks
You can group by hour and use conditional computation to count Data and Voice traffic.
For example:
SELECT
HOUR(time) AS `Hour`,
SUM(CASE WHEN ClientIP like '10.99%' and MAC like '00:04:f2%' THEN 1 ELSE 0 END) AS `Data`,
SUM(CASE WHEN ClientIP like '10.123%' and MAC like '00:04:f2%' THEN 1 ELSE 0 END) AS `Voice`
FROM log
GROUP BY HOUR(time)
Create a separate table for (as you want) :
+------+-------+------+
| Hour | Voice | Data |
+------+-------+------+
and update it every hour using triggers.

Convert Mysql Query to Rails ActiveRecord Query Without using find_by_sql

I have table named questions like follows
+----+---------------------------------------------------------+----------+
| id | title | category |
+----+---------------------------------------------------------+----------+
| 89 | Tinker or work with your hands? | 2 |
| 54 | Sketch, draw, paint? | 3 |
| 53 | Express yourself clearly? | 4 |
| 77 | Keep accurate records? | 6 |
| 32 | Efficient? | 6 |
| 52 | Make original crafts, dinners, school or work projects? | 3 |
| 70 | Be elected to office or make your opinions heard? | 5 |
| 78 | Take photographs? | 3 |
| 84 | Start your own political campaign? | 5 |
| 9 | Free spirit or a rebel? | 3 |
| 38 | Lead a group? | 5 |
| 71 | Work in groups? | 4 |
| 2 | Helpful? | 4 |
| 4 | Mechanical? | 6 |
| 14 | Responsible? | 6 |
| 66 | Pitch a tent, an idea? | 1 |
| 62 | Write useful business letters? | 5 |
| 28 | Creative? | 3 |
| 68 | Perform experiments? | 2 |
| 10 | Like to figure things out? | 2 |
+----+---------------------------------------------------------+----------+
I have a sql query to get one random record from each category.Can any one convert the mysql query to rails activerecord query(with out using Question.find_by_sql).This mysql query is working absolutely fine but I need only active record query because of my dependency in further steps.
Here is mysql query
SELECT t.id, title as question, category
FROM
(
SELECT
(
SELECT id
FROM questions
WHERE category = t.category
ORDER BY RAND()
LIMIT 1
) id
FROM questions t
GROUP BY category
) q JOIN questions t
ON q.id = t.id
Thank You for your consideration!
When things get crazy one have to reach out for Arel:
It is intended to be a framework framework; that is, you can build
your own ORM with it, focusing on innovative object and collection
modeling as opposed to database compatibility and query generation.
So what we want to do is to let Arel create the query for us. Moreover the approach here is gonna be used: the questions table is left joined with randomized version of itself:
q_normal = Arel::Table.new("questions")
q_random = Arel::Table.new("questions").project(Arel.sql("*")).order("RAND()").as("q2")
Time to left join
query = q_normal.join(q_random, Arel::Nodes::OuterJoin).on(q_normal[:category].eq(q_random[:category])).group(q_normal[:category]).order(q_random[:category])
Now you can use which columns you want using project, e.g.:
query.project(q_normal[:id])
The only way I can think of to do this requires a good bit of application code. I don't think there's a way of accessing the RAND() functionality in MySQL (or equivalent in other DB technologies) using ActiveRecord. Here's what I came up with:
counts = Question.group(:category_id).count(:id)
offsets = {}
counts.each do |cat_id, count|
offsets[cat_id] = rand(count)
end
random_questions = []
offsets.each do |cat_id, offset|
random_questions.push(Question.where(:category_id => cat_id).offset(offset).first)
end

MySQL: optimize query for scoring calculation

I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.