Using GROUP_CONCAT - mysql

I have three tables. TB_Main is a table of Entities. TB_BoardMembers is a table of People. TB_BoardMembersLINK is a bridging table which references the other two by ids and also has start and end dates for when a Person was on the board of an Entity. These dates are often incomplete.
I have been asked to export as part of a report a CSV with one row per Entity per year in which I have a list of board members for that year with their occupations in a single field delimited by newlines.
I don't need bml.Entity in the result but added it to try to debug. I'm getting one row where I expect 85. Tried with and without GROUP BY and the fact that the result is the same suggests I am misusing GROUP_CONCAT. How should I construct this to get the result they want?
SELECT
GROUP_CONCAT(
DISTINCT CONCAT(bm.First, ' ', bm.Last,
IF (bm.Occupation != '', ' - ', ''),
bm.Occupation) SEPARATOR "\n") as Board,
bml.Entity
FROM
TB_Main arfe,
TB_BoardMembers bm,
TB_BoardMembersLINK bml
WHERE YEAR(bml.start) <= 2011
AND (YEAR(bml.end) >= 2011 OR bml.end IS NULL)
AND bml.start > 0
AND bml.Entity = arfe.ID
GROUP BY bml.Entity
ORDER BY Board

There are a few issues with this query. The main issue appears to be that you are missing a condition to link board members to the link table, so you have a cross join, i.e. you will be returning every broadband member regardless of their start/end dates, and assuming you have 85 rows where the criteria matches, you will actually be returning each board member 85 times. This highlights a very good reason to switch from the ANSI 89 implicit joins you are using, to the ANSI 92 explicit join syntax. This article highlights some very good reasons to make the switch.
So your query would become (I've had to guess at your field names):
SELECT *
FROM TB_Main arfe
INNER JOIN TB_BoardMembersLINK bml
ON bml.Entity = arfe.ID
INNER JOIN TB_BoardMembers bm
ON bm.ID = bml.BoardMemberID
The next thing I noticed about your query is that using functions in the where clause is not very efficient at all, so because of this:
WHERE YEAR(bml.start) <= 2011
AND (YEAR(bml.end) >= 2011 OR bml.end IS NULL)
You are operating the YEAR function twice for every row, and removing any possible chance of using an index on bml.Start or bml.End (if any exist). Yet again Aaron Bertrand has written a nice article highlighting good practises when querying date ranges, it is target at SQL-Server, but the principles are still the same, so your where clause would become:
WHERE bml.Start <= '20110101'
AND (bml.End >= '20110101' OR bml.End IS NULL)
AND bml.start > 0
Your final query should then be:
SELECT bml.Entity,
GROUP_CONCAT(DISTINCT CONCAT(bm.First, ' ', bm.Last,
IF (bm.Occupation != '', ' - ', ''), bm.Occupation)
SEPARATOR "\n") as Board
FROM TB_Main arfe
INNER JOIN TB_BoardMembersLINK bml
ON bml.Entity = arfe.ID
INNER JOIN TB_BoardMembers bm
ON bm.ID = bml.BoardMemberID
WHERE bml.Start <= '20110101'
AND (bml.End >= '20110101' OR bml.End IS NULL)
AND bml.start > 0
GROUP BY bml.Entity
ORDER BY Board;
Example on SQL Fiddle

If you read up on Group_Concat
"This function returns a string result with the concatenated non-NULL values from a group."
Here in this case, the group seems to be just one group, as you say there is only one entity? I am not sure if that is the case from your description. Why dont you also group by firstname, lastname and Occupation, this may give you all the members.
I am also not sure of your joins, without real data its tough to explain that part as every join works for some set of data properly, even though its not the best way to write a query

Related

MySQL Select within another select

I have a query as follows
select
Sum(If(departments.vat, If(weeklytransactions.weekendingdate Between
'2011-01-04' And '2099-12-31', weeklytransactions.takings / 1.2,
If(weeklytransactions.weekendingdate Between '2008-11-30' And '2010-01-01',
weeklytransactions.takings / 1.15, weeklytransactions.takings / 1.175)),
weeklytransactions.takings)) As Total,
weeklytransactions.weekendingdate,......
and another that returns a vat rate as follows
select format(Max(Distinct vat_rates.Vat_Rate),3) From vat_rates Where
vat_rates.Vat_From <= '2011-01-03'
I want to replace the hard coded if statement with the lower query, replacing the date in the lower query with weeklytransactions.weekendingdate.
After Kevin's comments, here is the full query I'm trying to get to work;
Select Max(vat_rates.vat_rate) As r,
If(departments.vat, weeklytransactions.takings / r, weeklytransactions.takings) As Total,
weeklytransactions.weekendingdate,
Week(weeklytransactions.weekendingdate),
round(datediff(weekendingdate, (if(month(weekendingdate)>5,concat(year(weekendingdate),'-06-01'),concat(year(weekendingdate)-1,'-06-01'))))/7,0)+1 as fyweek,
cast((Case When Month(weeklytransactions.weekendingdate) >5 Then Concat(Year(weeklytransactions.weekendingdate), '-',Year(weeklytransactions.weekendingdate) + 1) Else Concat(Year(weeklytransactions.weekendingdate) - 1, '-',Year(weeklytransactions.weekendingdate)) End) as char) As fy,
business_units.business_unit
From departments Inner Join (business_units Inner Join weeklytransactions On business_units.buID = weeklytransactions.businessUnit) On departments.deptid = weeklytransactions.departmentId
Where (vat_rates.vat_from <= weeklytransactions.weekendingdate and business_units.Active = true and business_units.sales=1)
Group By weeklytransactions.weekendingdate, business_units.business_unit Order By fy desc, business_unit, fyweek
Regards
Pete
Assuming I read your question correctly, your problem is about having the result of another SELECT used to be returned by the result of your main query (plus depending on how acquainted you are with SQL, maybe you haven't had the occasion to learn about JOINs?).
You can have subqueries you extract data from within a SELECT, provided you define it within the FROMclause. The following query will work, for example:
SELECT A.a, B.b
FROM A
JOIN (SELECT aggregate(c) FROM C) AS B
Notice that there is no reference to table A within the subquery. Thing is, you cannot just add it like that to the query, as the subquery doesn't know it is a subquery. So the following won't work:
SELECT A.a, B.b
FROM A
JOIN (SELECT aggregate(c) FROM C WHERE C.someValue = A.someValue) AS B
Back to basics. What you want to do here visibly, is to aggregate some data associated to each of the records of another table. For that, you will need merge your SELECT queries and use GROUP BY:
SELECT A.a, aggregate(C.c)
FROM A, C
WHERE C.someValue = A.someValue
GROUP BY A.a
Back to your tables, the following should work:
SELECT w.weekendingdate, FORMAT(MAX(v.Vat_Rate, 3)
FROM weeklytransactions AS w, vat_rates AS v
WHERE v.Vat_From <= w.weekendingdate
GROUP BY w.weekendingdate
Feel free to add and remove fields and conditions as you see fit (I wouldn't be surprised that you'd also want to use a lower bound when filtering the records from vat_rates, since the way I have written it above, for a given weekendingdate, you get records from that week + the weeks before!).
So it looks like my first try did not address the actual problem. With the additional information provided in the comments, as well as the new complete query, let's see how this goes.
We are still missing error messages, but normally the query as written should result in MySQL having the following complaint:
ERROR 1109 (42S02): Unknown table 'vat_rates' in field list
Why? Because the vat_rates table does not appear in the FROM clause, whereas it should. Let's make that more obvious by simplifying the query, removing all references to the business_units table as well as the fields, calculations and order that do not add or remove anything to the problem, leaving us with the following:
SELECT MAX(vat_rates.vat_rate) AS r,
IF(d.vat, w.takings / r, w.takings) AS Total
FROM departments AS d
INNER JOIN weeklytransactions AS w ON w.departmentId = d.deptid
WHERE vat_rates.vat_from <= w.weekendingdate
GROUP BY w.weekendingdate
That cannot work, and will produce the error mentioned above. It looks like there is no FOREIGN ID between the weeklytransactions and vat_rates tables, so we have no choice but to do a CROSS JOIN for the moment, hoping that the condition in the WHERE clause and the aggregate function used to get r are enough to fit the business logic at hand here. The following query should return the expected data instead of an error message (I also remove r since that seems to be an intermediate value judging by the comments that were written):
SELECT IF(d.vat, w.takings / MAX(v.vat_rate), w.takings) AS Total
FROM vat_rates AS v, departments AS d
INNER JOIN weeklytransactions AS w ON w.departmentId = d.deptid
WHERE v.vat_from <= w.weekendingdate
GROUP BY w.weekendingdate
From there, assuming it works, you will only need to put back all the parts I removed to get your final query. I am a tad doubtful about the way the VAT rate is gotten here, but I have no idea what your requirements are in that regard so I leave it up to you to make sure that works as expected.

SQL statement hanging up in MySQL database

I am needing some SQL help. I have a SELECT statement that references several tables and is hanging up in the MySQL database. I would like to know if there is a better way to write this statement so that it runs efficiently and does not hang up the DB? Any help/direction would be appreciated. Thanks.
Here is the code:
Select Max(b.BurID) As BurID
From My.AppTable a,
My.AddressTable c,
My.BurTable b
Where a.AppID = c.AppID
And c.AppID = b.AppID
And (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
There is NO mysql error in the log. Sorry.
Pro tip: use explicit JOINs rather than a comma-separated list of tables. It's easier to see the logic you're using to JOIN that way. Rewriting your query to do that gives us this.
select Max(b.BurID) As BurID
From My.AppTable AS a
JOIN My.AddressTable AS c ON a.AppID = c.AppID
JOIN My.BurTable AS b ON c.AppID = b.AppID
WHERE (a.Forename = 'Bugs'
And a.Surname = 'Bunny'
And a.DOB = '1936-01-16'
And c.PostcodeAnywhereBuildingNumber = '999'
And c.PostcodeAnywherePostcode = 'SK99 9Q9'
And c.isPrimary = 1
And b.ErrorInd <> 1
And DateDiff(CurDate(), a.ApplicationDate) <= 30)
Next pro tip: Don't use functions (like DateDiff()) in WHERE clauses, because they defeat using indexes to search. That means you should change the last line of your query to
AND a.ApplicationDate >= CurDate() - INTERVAL 30 DAY
This has the same logic as in your query, but it leaves a naked (and therefore index-searchable) column name in the search expression.
Next, we need to look at your columns to see how you are searching, and cook up appropriate indexes.
Let's start with AppTable. You're screening by specific values of Forename, Surname, and DOB. You're screening by a range of ApplicationDate values. Finally you need AppID to manage your join. So, this compound index should help. Its columns are in the correct order to use a range scan to satisfy your query, and contains the needed results.
CREATE INDEX search1 USING BTREE
ON AppTable
(Forename, Surname, DOB, ApplicationDate, AppID)
Next, we can look at your AddressTable. Similar logic applies. You'll enter this table via the JOINed AppID, and then screen by specific values of three columns. So, try this index
CREATE INDEX search2 USING BTREE
ON AddressTable
(AppID, PostcodeAnywherePostcode, PostcodeAnywhereBuildingNumber, isPrimary)
Finally, we're on to your BurTable. Use similar logic as the other two, and try this index.
CREATE INDEX search3 USING BTREE
ON BurTable
(AppID, ErrorInd, BurID)
This kind of index is called a compound covering index, and can vastly speed up the sort of summary query you have asked about.

MS Access query multiple criteria

I am trying to build an access query with multiple criteria. The table to be queried is "tblVendor" which has information about vendor shipment data as shown below:
The second table is "tblSchedule" which has the schedule for each Vendor cutoff date. This table has cutoff dates for data analysis.
For each vendor, I need to select records which have the ShipDate >= CutoffDate. Although not shown in the data here, it may be possible that multiple vendors have same CutoffDate.
For small number of records in "tblCutoffdate", I can write a query which looks like:
SELECT tblVendors.ShipmentId, tblVendors.VendorNumber, tblVendors.VendorName,
tblVendors.Units, tblVendors.ShipDate
FROM tblVendors INNER JOIN tblCutoffDate ON tblVendors.VendorNumber =
tblCutoffDate.VendorNumber
WHERE (((tblVendors.VendorNumber) In (SELECT VendorNumber FROM [tblCutoffDate] WHERE
[tblCutoffDate].[CutoffDate] = #2/1/2014#)) AND ((tblVendors.ShipDate)>=#2/1/2014#)) OR
(((tblVendors.VendorNumber) In (SELECT VendorNumber FROM [tblCutoffDate] WHERE
[tblCutoffDate].[CutoffDate] = #4/1/2014#)) AND ((tblVendors.ShipDate)>=#4/1/2014#));
As desired, the query gives me a result which looks like:
What concerns me now is that I have a lot of records being added to the "tblCutoffDate" which makes it difficult for me to hardcode the dates in the query. Is there a better way to write the above SQL statement without any hardcoding?
You might try something like -- this should handle vendors having no past cutoff,
or those having no future cutoff
"today" needs a suitable conversion to just date w/o time
comparison "=" may go on both, or one, or none Max/Min
"null" may be replaced by 1/1/1900 and 12/31/3999 in Max/Min
SELECT tblvendors.shipmentid,
tblvendors.vendornumber,
tblvendors.vendorname,
tblvendors.units,
tblvendors.shipdate
FROM tblvendors
LEFT JOIN
( SELECT vendornum,
Max( iif cutoffdate < today, cutoffdate, null) as PriorCutoff,
Min( iif cutoffdate >= today, cutoffdate, null) as NextCutoff
FROM tblcutoffdate
GROUP BY vendornum
) as VDates
ON vendornumber = vendornum
WHERE tblvendors.shipdate BETWEEN PriorCutoff and NextCutoff
ORDER BY vendornumber, shipdate, shipmentid
A simpler WHERE clause should give you what you want.
SELECT
v.ShipmentId,
v.VendorNumber,
v.VendorName,
v.Units,
v.ShipDate
FROM
tblVendors AS v
INNER JOIN tblCutoffDate AS cd
ON v.VendorNumber = cd.VendorNumber
WHERE v.ShipDate >= cd.CutoffDate;

Using the right MYSQL JOIN

I'm trying to get all the data from the match table, along with the currently signed up gamers of each type, experienced or not.
Gamers
(PK)Gamer_Id
Gamer_firstName,
Gamer_lastName,
Gamer experience(Y/N)
Gamer_matches
(PK)FK GamerId,
(PK)FK MatchId,
Gamer_score
Match
(PK)Match_Id,
ExperiencedGamers_needed,
InExperiencedGamers_needed
I've tried this query along with many others but it doesn't work, is it a bad join?
SELECT M.MatchId,M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "Y"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId)AS ExpertsSignedUp,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "N"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId) AS NovicesSignedUp
FROM MATCHES M
What you've written is called a correlated subquery which forces SQL to re-execute the subquery for each row fetched from Matches. It can be made to work, but it's pretty inefficient. In some complex queries it may be necessary, but not in this case.
I would solve this query this way:
SELECT M.MatchId, M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
SUM(G.experience = 'Y') AS ExpertsSignedUp,
SUM(G.experience = 'N') AS NovicesSignedUp
FROM MATCHES M
LEFT OUTER JOIN (Gamer_matches GM
INNER JOIN Gamers G ON G.GamerId = GM.GamerId)
ON M.MatchId = GM.MatchId
GROUP BY M.MatchId;
Here it outputs only one row per Match because of the GROUP BY at the end.
There's no subquery to re-execute many times, it's just joining Matches to the respective rows in the other tables once. But I use an outer join in case a Match has zero players of eithe type signed up.
Then instead of using COUNT() I use a trick of MySQL and use SUM() with a boolean expression inside the SUM() function. Boolean expressions in MySQL always return 0 or 1. The SUM() of these is the same as the COUNT() where the expression returns true. This way I can get the "count" of both experts and novices only scanning the Gamers table once.
P.S. MySQL is working in a non-standard way to return 0 or 1 from a boolean expression. Standard ANSI SQL does not support this, nor do many other brands of RDBMS. Standardly, a boolean expression returns a boolean, not an integer.
But you can use a more verbose expression if you need to write standard SQL for portability:
SUM(CASE G.experience WHEN 'Y' THEN 1 WHEN 'N' THEN 0 END) AS ExpertsSignedUp

SQL Join only returning 1 row

Not quite sure what I'm missing, but my SQL statement is only returning one row.
SELECT
tl.*,
(tl.topic_total_rating/tl.topic_rates) as topic_rating,
COUNT(pl.post_id) - 1 as reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list tl
JOIN post_list pl
ON tl.topic_id=pl.post_parent
WHERE
tl.topic_board_link = %i
AND topic_hidden != 1
ORDER BY %s
I have two tables (post_list and topic_list), and post_list's post_parent links to a topic_list's topic_id.
Instead of returning all the topics (where their board's topic_board_link is n), it only returns one topic.
You would normally need a GROUP BY clause in there. MySQL has different rules from Standard SQL on the subject of when GROUP BY is needed. This is therefore closer to Standard SQL:
SELECT tl.*,
(tl.topic_total_rating/tl.topic_rates) AS topic_rating,
COUNT(pl.post_id) - 1 AS reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list AS tl
JOIN post_list AS pl ON tl.topic_id = pl.post_parent
WHERE tl.topic_board_link = ? -- %i
AND tl.topic_hidden != 1
GROUP BY tl.col1, ..., topic_rating
ORDER BY ? -- %s
In Standard SQL, you would have to list every column in topic_list, plus the non-aggregate value topic_rating (and you might have to list the expression rather than the display label or column alias in the select list).
You also have a restriction condition on 'topic_board_link' which might be limiting your result set to one group. You cannot normally use a placeholder in the ORDER BY clause, either.