I am trying to group rows in MySQL but end up with a wrong result.
My DB looks like this:
I'm using this query:
SELECT
r_id, va_id,va_klasse,va_periode,
1va_mer,1va_hjem,1va_mot,1va_bil,1va_fit,1va_hand,1va_med,1va_fra,
2va_mer,2va_hjem,2va_trae,2va_bil,2va_sty,2va_mus,2va_med,2va_fra,
3va_mer,3va_hjem,3va_mot,3va_bil,3va_pima,3va_nat,3va_med,3va_fra,
va_lock, va_update
FROM o6hxd_valgfag
WHERE va_klasse IN('7A','7B','7C','8A','8B','8C','9A','9B','9C')
GROUP BY va_id
ORDER BY va_klasse,va_name
This produces a wrong result, where one row is returned with only the first three numbers 123 and not the ones from row two and three.
What I would like is a result where the numbers 123, 321 and 132 are gathered in one line.
I can explain more detailed if this isn't sufficient.
If across those fields there should only be ever one value, you should really have them all in the same record and go about fixing it to insert and update the same record.
Ie I am aware that you database isn't designed correctly
However
To dig you out, you could give this a crack, I suppose.
SELECT
r_id, va_id,va_klasse,va_periode,
MAX(1va_mer),MAX(1va_hjem),MAX(1va_mot),MAX(1va_bil),MAX(1va_fit),MAX(1va_hand),MAX(1va_med),MAX(1va_fra),
MAX(2va_mer),MAX(2va_hjem),MAX(2va_trae),MAX(2va_bil),MAX(2va_sty),MAX(2va_mus),MAX(2va_med),MAX(2va_fra),
MAX(3va_mer),MAX(3va_hjem),MAX(3va_mot),MAX(3va_bil),MAX(3va_pima),MAX(3va_nat),MAX(3va_med),MAX(3va_fra),
va_lock, va_update
FROM o6hxd_valgfag
WHERE va_klasse IN('7A','7B','7C','8A','8B','8C','9A','9B','9C')
GROUP BY va_id
ORDER BY va_klasse,va_name
Your query will not work as intended. Think about this use-case:
what if for row1 (r_id =9), the fields 2va_sty, 2va_mus, 2va_med are not empty and has values?
In such case what should your desired output be? It certainly cannot be the numbers 123, 321 and 132 gathered in one line. Group by is usually used if you want to use aggregate functions executed against a certain field value, in your case va_id.
Not a solution to your problem but i think a better query would be like this (because of the not named columns in the group by clause https://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html):
SELECT
aa.r_id, aa.va_id, aa.va_klasse, aa.va_periode,
aa.1va_mer, aa.1va_hjem, aa.1va_mot, aa.1va_bil, aa.1va_fit, aa.1va_hand, aa.1va_med, aa.1va_fra,
aa.2va_mer, aa.2va_hjem, aa.2va_trae, aa.2va_bil, aa.2va_sty,2va_mus, aa.2va_med, aa.2va_fra,
aa.3va_mer, aa.3va_hjem, aa.3va_mot, aa.3va_bil, aa.3va_pima, aa.3va_nat, aa.3va_med, aa.3va_fra,
aa.va_lock, aa.va_update
FROM o6hxd_valgfag AS aa
INNER JOIN (
SELECT va_id
FROM o6hxd_valgfag
GROUP BY va_id
) AS _aa
ON aa.va_id = _aa.va_id
WHERE aa.va_klasse IN ('7A','7B','7C','8A','8B','8C','9A','9B','9C')
ORDER BY aa.va_klasse, aa.va_name;
Related
Using this question's answer. I'm trying to find duplicate records between two tables by the column names matrix_unique_id and Matrix_Unique_ID in each table and then display the full address. The Full address columns are formatted differently from each other in each table so I cannot use that as a comparison. I'm getting an "unknown column fort_property_res.matrix_unique_id" error but everything looks okay?
So two questions:
Will this query find duplicates correctly?
Why the unknown column error?
SQL query:
SELECT matrix_unique_id, full_address
FROM fort_property_res
INNER JOIN (
SELECT Matrix_Unique_ID, FullAddress
FROM sunshinemls_property_res
GROUP BY FullAddress
HAVING count(fort_property_res.matrix_unique_id) > 1
) dup ON fort_property_res.matrix_unique = sunshinemls_property_res.Matrix_Unique_ID
The solution you're trying to copy is a totally different case. You have two tables and (it looks like) a convenient matrix_unique_id to join on, so this is much easier:
SELECT fort.matrix_unique_id, fort.full_address AS fortAddress, sun.FullAddress AS sunAddress
FROM fort_property_res fort, sunshinemls_property_res sun
WHERE fort.matrix_unique_id = sun.Matrix_Unique_ID
I have the following query:
SELECT routes.route_date, time_slots.name, time_slots.openings, time_slots.appointments
FROM routes
INNER JOIN time_slots ON routes.route_id = time_slots.route_id
WHERE route_date
BETWEEN 20140109
AND 20140115
AND time_slots.openings > time_slots.appointments
ORDER BY route_date, name
This works just fine and will produce the following results:
What I want to do is only return one name per date. So the 9th, name = 1, would only have 1 result, rather than 2, as it currently does.
UPDATE: See the SQLFIDDLE for different type of solutions here: http://sqlfiddle.com/#!2/9ac65b/6
Will it solve your request if you use...
SELECT DISTINCT routes.route_date...your query... ?
It depends if you know that your rows always will have the same values, for same date/name.
Otherwise use group by...
(which I think suits your request best)
SELECT routes.route_date, time_slots.name, sum(time_slots.openings), sum(time_slots.appointments)
FROM routes
INNER JOIN time_slots ON routes.route_id = time_slots.route_id
WHERE route_date
BETWEEN 20140109
AND 20140115
AND time_slots.openings > time_slots.appointments
group by routes.route_date, time_slots.name
ORDER BY route_date, name
(i did a sum for the openings and appointments, you could do min, max, count, etc. Pick the one that fits your requirements best!)
You need to figure out which "name" you want when there are several for the same date.
Then you can group by date and select the right "name" by using an aggregate function like COUNT, MAX, etc.
I can't help you more if you don't explain your rule for picking one.
I have a little query, it goes like this:
It's slightly more complex than it looks, the only issue is using the output of one subquery as the parameter for an IN clause to generate another. It works to some degree - but it only provides the results from the first id in the "IN" clause. Oddly, if I manually insert the record ids "00003,00004,00005" it does give the proper results.
What I am seeking to do is get second level many to many relationship - basically tour_stops have items, which in turn have images. I am trying to get all the images from all the items to be in a JSON string as 'item_images'. As stated, it runs quickly, but only returns the images from the first related item.
SELECT DISTINCT
tour_stops.record_id,
(SELECT
GROUP_CONCAT( item.record_id ) AS in_item_ids
FROM tour_stop_item
LEFT OUTER JOIN item
ON item.record_id = tour_stop_item.item_id
WHERE tour_stop_item.tour_stops_id = tour_stops.record_id
GROUP BY tour_stops.record_id
) AS rel_items,
(SELECT
CONCAT('[ ',
GROUP_CONCAT(
CONCAT('{ \"record_id\" : \"',record_id,'\",
\"photo_credit\" : \"',photo_credit,'\" }')
)
,' ]')
FROM images
WHERE
images.attached_to IN(rel_items) AND
images.attached_table = 'item'
ORDER BY img_order ASC) AS item_images
FROM tour_stops
WHERE
tour_stops.attached_to_tour = $record_id
ORDER BY tour_stops.stop_order ASC
Both of these below answers I tried, but it did not help. The second example (placing the entire first subquery inside he "IN" statement) not only produced the same results I am already getting, but also increased query time exponentially.
EDIT: I replaced my IN statement with
IN(SELECT item_id FROM tour_stop_item WHERE tour_stops_id = tour_stops.record_id)
and it works, but it brutally slow now. Assuming I have everything indexed correctly, is this the best way to do it?
using group_concat in PHPMYADMIN will show the result as [BLOB - 3B]
GROUP_CONCAT in IN Subquery
Any insights are appreciated. Thanks
I am surprised that you can use rel_items in the subquery.
You might try:
concat(',', images.attached_to, ',') like concat('%,', rel_items, ',%') and
This may or may not be faster. The original version was fast presumably because there are no matches.
Or, you can try to change your in clause. Sometimes, these are poorly optimized:
exists (select 1
from tour_stop_item
where tour_stops_id = tour_stops.record_id and images.attached_to = item_id
)
And then be sure you have an index on tour_stop_item(tour_stops_id, item_id).
I'm using this kind of queries with different parameters :
EXPLAIN SELECT SQL_NO_CACHE `ilan_genel`.`id` , `ilan_genel`.`durum` , `ilan_genel`.`kategori` , `ilan_genel`.`tip` , `ilan_genel`.`ozellik` , `ilan_genel`.`m2` , `ilan_genel`.`fiyat` , `ilan_genel`.`baslik` , `ilan_genel`.`ilce` , `ilan_genel`.`parabirimi` , `ilan_genel`.`tarih` , `kgsim_mahalleler`.`isim` AS mahalle, `kgsim_ilceler`.`isim` AS ilce, (
SELECT `ilanresimler`.`resimlink`
FROM `ilanresimler`
WHERE `ilanresimler`.`ilanid` = `ilan_genel`.`id`
LIMIT 1
) AS resim
FROM (
`ilan_genel`
)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `kgsim_mahalleler` ON `kgsim_mahalleler`.`id` = `ilan_genel`.`mahalle`
WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
ORDER BY `ilan_genel`.`id` DESC
LIMIT 225 , 15
and this is what i get in explain section:
these are the indexes that i already tried to use:
any help will be deeply appreciated what kind of index will be the best option or should i use another table structure ?
You should first simplify your query to understand your problem better. As it appears your problem is constrained to the ilan_gen1 table, the following query would also show you the same symptoms.:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
So the first thing to do is check that this is the case. If so, the simpler question is simply why does this query require a file sort on 3661 rows. Now the 'hepsi' index sort order is:
ilce->mahelle->durum->kategori->tip->ozelik
I've written it that way to emphasise that it is first sorted on 'ilce', then 'mahelle', then 'durum', etc. Note that your query does not specify the 'mahelle' value. So the best the index can do is lookup on 'ilce'. Now I don't know the heuristics of your data, but the next logical step in debugging this would be:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'`
Does this return 3661 rows?
If so, you should be able to see what is happening. The database is using the hepsi index, to the best of it's ability, getting 3661 rows back then sorting those rows in order to eliminate values according to the other criteria (i.e. 'durum', 'kategori', 'tip').
The key point here is that if data is sorted by A, B, C in that order and B is not specified, then the best logical thing that can be done is: first a look up on A then a filter on the remaining values against C. In this case, that filter is performed via a file sort.
Possible solutions
Supply 'mahelle' (B) in your query.
Add a new index on 'ilan_gene1' that doesn't require 'mahelle', i.e. A->C->D...
Another tip
In case I have misdiagnosed your problem (easy to do when I don't have your system to test against), the important thing here is the approach to solving the problem. In particular, how to break a complicated query into a simpler query that produces the same behaviour, until you get to a very simple SELECT statement that demonstrates the problem. At this point, the answer is usually much clearer.
I need to combine the results for two select queries from two view tables, from which I am performing calculations. Perhaps there is an easier way to perform a query using if...else - any pointers?
Essentially I need to divide everything by 'ar.time_ratio' under the condition in sql query 1, and ignore that for query 2.
SELECT
gs.traffic_date,
gs.domain_group,
gs.clicks/ar.time_ratio as 'Scaled_clicks',
gs.visitors/ar.time_ratio as 'scaled_visitors',
gs.revenue/ar.time_ratio as 'scaled_revenue',
(gs.revenue/gs.clicks)/ar.time_ratio as 'scaled_average_cpc',
(gs.clicks)/(gs.visitors)/ar.time_ratio as 'scaled_ctr',
gs.average_rpm/ar.time_ratio as 'scaled_rpm',
(((gs.revenue)/(gs.visitors))/ar.time_ratio)*1000 as "Ecpm"
FROM
group_stats gs,
v_active_ratio ar
WHERE ar.group_id=gs.domain_group
and
SELECT
gs.traffic_date,
gs.domain_group,
gs.clicks,
gs.visitors,
gs.revenue,
(gs.revenue/gs.clicks) as 'average_cpc',
(gs.clicks)/(gs.visitors) as 'average_ctr',
gs.average_rpm,
((gs.revenue)/(gs.visitors))*1000 as "Ecpm"
FROM
group_stats gs,
v_active_ratio ar
where not ar.group_id=gs.domain_group
Use the UNION operator. You can add another column to show which query a row came from if that's important to you.
http://dev.mysql.com/doc/refman/5.1/en/union.html
That's usually what UNION [ALL] is for.
The UNION suggested above will give you all scaled results, and all the non-scaled data (the second query) as separate rows.
If you want to group them on their common data, so you have the scaled and normal data side by side on the same row, you can use a query like this:
SELECT * FROM ( [first-query] ) AS scaled
INNER JOIN ( [second-query] ) AS normal
ON scaled.traffic_date=normal.traffic_date
AND scaled.domain_group=normal.domain_group
Instead of using * to select all columns, may want to explicitly declare the columns selected, since the columns traffice_date and domain_group will be output twice.