SQL to return distinct rows with a max value of column? Complicated by duplicates of every other column

SQL to return distinct rows with a max value of column? Complicated by duplicates of every other column - duplicates

I have 1 table, it contains 6 columns (DivisionId, ItemId, BuyerCode, BuyerName, ReceivedPounds, POIssueDate) and all 6 are not unique. My DISTINCT value I eventually want is a CONCAT of DivisionID_ItemId. I have taken a lot of attempts at this already, I want to return every unique row of DivisionId and ItemId and the BuyerCode with the most pounds for that DivisionId ItemId combo. My complication is I have multiple BuyerCodes for most itemid's and I have multiple itemid's for different divisionid's. I have gotten close, but my final result still gives me every buyer for every item. Here is my current query in SSMS. I'm a self taught hack so feel free, I tried to break out each subquery. T1 sums the pounds, FT is the MAX function, T2 is to join buyer to MAX div and item. Problem is I still get multiple occurrences of divisionId and ItemId in my result. I am lacking the way to make the result as distinct as the FT subquery.
SELECT FT.DivisionId,FT.ItemId, T2.BuyerCode, T2.BuyerName
FROM( SELECT DISTINCT T1.DivisionId,T1.ItemId, MAX(T1.SUMPOUNDS) as MaxPounds
FROM( SELECT DISTINCT TCO.DivisionId, TCO.ItemId, TCO.BuyerCode, TCO.BuyerName, SUM(TCO.ReceivedPounds) as SUMPOUNDS
FROM [GTDev].[dbo].[TCO_FinalData_ABC] TCO
WHERE PO_Issue_Date > (GETDATE()-90) AND ReceivedPounds > 0
GROUP BY TCO.DivisionId, TCO.ItemId, TCO.BuyerCode, TCO.BuyerName) as T1
GROUP BY T1.DivisionId,T1.ItemId) as FT
LEFT JOIN ( SELECT DISTINCT T2.DivisionId, T2.ItemId, T2.BuyerCode, T2.BuyerName, SUM(T2.ReceivedPounds) as SUMPOUNDS
FROM [GTDev].[dbo].[TCO_FinalData_ABC] T2
WHERE PO_Issue_Date > (GETDATE()-90) AND ReceivedPounds > 0
GROUP BY T2.DivisionId, T2.ItemId, T2.BuyerCode, T2.BuyerName) as T2
ON FT.DivisionId = T2.DivisionId AND FT.ItemId = T2.ItemId AND FT.MaxPounds = T2.SUMPOUNDS
GROUP BY FT.DivisionId,FT.ItemId, T2.BuyerCode, T2.BuyerName
TL;DR - Need to SUM pounds for every unique DivId, ItemId, Buyer and then select the MAX pounds of every DivId and ItemId and return the Buyer with the most pounds for each DivId ItemId combo.
Thank you in advance! Feel free to tell me what is wrong and not rewrite the code unless you want to. Not looking for someone to do my work, just advice to get over this issue!

I would create an id with the DivId and ItemId, like:
select DivId||'_'||ItemId as indicator, {other columns} from {your tables and conditions}
Then I would run a query on top of it, like:
select sum(pounds), indicator, buyer from ({the previous query})
group by indicator,buyer, then select the max, and rank it (this is the trick that you are missing)
Look at this whole sequence, I think this is what you need: (I broke this down in several pieces, you can simplify it or do it like this)
select * from (
select distinct indicator, max(ReceivedPoundsSum), BuyerCode,RANK() OVER ( PARTITION BY indicator ORDER BY max(ReceivedPoundsSum) DESC ) AS "Rank" from (
select sum(ReceivedPounds) ReceivedPoundsSum, indicator, BuyerCode,BuyerName from (
with data as(
select 1 DivisionId, 1 ItemId,1 BuyerCode,'user1' BuyerName,55 ReceivedPounds,sysdate POIssueDate from dual union
select 1 DivisionId, 1 ItemId,3 BuyerCode,'user3' BuyerName,15 ReceivedPounds,sysdate POIssueDate from dual union
select 2 DivisionId, 2 ItemId,1 BuyerCode,'user1' BuyerName,25 ReceivedPounds,sysdate POIssueDate from dual union
select 2 DivisionId, 2 ItemId,2 BuyerCode,'user2' BuyerName,35 ReceivedPounds,sysdate POIssueDate from dual union
select 2 DivisionId, 2 ItemId,3 BuyerCode,'user3' BuyerName,45 ReceivedPounds,sysdate POIssueDate from dual union
select 3 DivisionId, 3 ItemId,5 BuyerCode,'user5' BuyerName,55 ReceivedPounds,sysdate POIssueDate from dual union
select 1 DivisionId, 3 ItemId,3 BuyerCode,'user3' BuyerName,5 ReceivedPounds,sysdate POIssueDate from dual union
select 4 DivisionId, 4 ItemId,4 BuyerCode,'user4' BuyerName,1 ReceivedPounds,sysdate POIssueDate from dual union
select 5 DivisionId, 4 ItemId,4 BuyerCode,'user4' BuyerName,12 ReceivedPounds,sysdate POIssueDate from dual union
select 6 DivisionId, 5 ItemId,4 BuyerCode,'user4' BuyerName,13 ReceivedPounds,sysdate POIssueDate from dual union
select 6 DivisionId, 5 ItemId,1 BuyerCode,'user1' BuyerName,14 ReceivedPounds,sysdate POIssueDate from dual union
select 6 DivisionId, 6 ItemId,2 BuyerCode,'user2' BuyerName,16 ReceivedPounds,sysdate POIssueDate from dual union
select 5 DivisionId, 10 ItemId,1 BuyerCode,'user1' BuyerName,157 ReceivedPounds,sysdate POIssueDate from dual)
select DivisionId||'_'||ItemId as indicator,a.* from data a
) group by indicator, BuyerCode,BuyerName
) group by indicator, BuyerCode
) where "Rank"=1
The result will look like this:
Hope this helps!

Related

Optimizing a Select with Subquery that is loading VERY slow

I have a SELECT that is a little bit tricky, as I try to display data that has to be calculated on the fly.
The data is logged from a SmartHome system and displayed in the visualization solution Grafana.
So I have to handle all of this in MySQL and can't really edit the data or the frontend to do some of this work.
The diagram should show the average temperature per day for a time range that can be selected in the UI.
The data in MySQL is a table like that:
DEVICE | READING | VALUE | TIMESTAMP
-----------------------------------------------------------------------------
Thermometer | temperature | 20.0 | 2107.10.12 00:12:59
Thermometer | temperature | 20.2 | 2107.10.12 00:24:12
...
The Request first creates a virtual table (that is not in the database) with timestamps for every full hours for about 10 years.
This is running very quick and doesn't seem to be a reason for my slow fetches
After that I strip down the virtual table to values only within the visible time range in my diagram.
On all of these full-hour-timestamps I have to run a sub-select to get the last temperature value that was logged before the full hour.
This values are then grouped by day and the average is calculated.
That way I get the average over 24 values for each full hour from 00:00 to 23:00.
Based on different wether sites, this is how the official average temperature is normally calculated.
Here is the Select Statement:
SELECT
filtered.hour as time,
AVG((SELECT VALUE
FROM history
WHERE READING="temperature" AND DEVICE="Thermometer" AND TIMESTAMP <= filtered.hour
ORDER BY TIMESTAMP DESC
LIMIT 1
)) as value
FROM (
SELECT calculated.hour as hour FROM (
SELECT DATE_ADD(DATE_SUB(DATE($__timeTo()), INTERVAL 10 YEAR), INTERVAL t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i HOUR) as hour
FROM (SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) calculated
WHERE calculated.hour >= $__timeFrom() AND calculated.hour <= $__timeTo()
) filtered
GROUP BY DATE(filtered.hour)
For a timespan of a week it already takes about 5-10 seconds for the diagram to show up. For a month you're close to half a minute.
All my other (simple fetches without calculations) diagrams are loading in about or less than a second.
As I'm a completely MySQL noob and just started to build some SELECTs for my smart home, I don't really know how this can be improved.
Any ideas from the pros? :)

Unless I'm overseeing something really obvious and it doesn't really matter on how many results the average per day is calculated you could really simplify your query and get rid of the subquery's. This should also give you a boost in speed.
SELECT DATE(`TIMESTAMP`) AS `date`, AVG(`VALUE`) AS `value` FROM `history` WHERE `READING`='temperature' AND `DEVICE`='Thermometer' AND DATE(`TIMESTAMP`) BETWEEN 'date1' AND 'date2'
Just replace date1 & date2 with the values you want, for example 2017-10-15.

How to include empty rows in Access query results

I have an Access 2010 database with client information. I need to create a table of the number of clients in each age. The agency I am reporting to wants a report with the number of clients of every age from 0 - 100 years listed. The SQL query below will create the required report, but does not include ages with zero clients.
SELECT AgeNum & " years" AS [Age], Count(*) AS [Count]
FROM (SELECT Int(DateDiff("d", Clients.dob, now())/365.25) AS AgeNum
FROM Clients) AS [%$###_Alias]
GROUP BY [%$###_Alias].AgeNum;
How can I have the query return the empty rows with 0 in the Count column?
I looked around and found this:
How can I create a row for values that don't exist and fill the count with 0 values?
They create a table of values to lookup the empty groups. It is very similar to what I need except it uses a Coalesce function which is not supported in Access 2010.

The system only knows there is an age if a particular record exists. If you want to have a list of age between 1..100, you need to tell or provide the system that you are looking for 0..100 ages. By providing a list of ages you are looking for, system will automatically return 0/null if the requested age is not found withing your searched records.
As others mentioned, you can have a table with 1..100 as rows and compare them in your SQL or you could generate list of numbers with SQL.
Some DBMS provide a default table called dual which has one column and one row, you can use that table for any queries that does not have a from table.
In your access application, create a table called "dual" and insert one row.
Now execute this query:
SELECT TMain.counter
FROM (SELECT (T2.mAge*t3.mFactor10)+t1.mAge AS counter
FROM (select 1 as mAge from dual
union all select 2 from dual
union all select 3 from dual
union all select 4 from dual
union all select 5 from dual
union all select 6 from dual
union all select 7 from dual
union all select 8 from dual
union all select 9 from dual
union all select 10 from dual) AS T1,
(select 0 as mAge from dual
union all select 1 from dual
union all select 2 from dual
union all select 3 from dual
union all select 4 from dual
union all select 5 from dual
union all select 6 from dual
union all select 7 from dual
union all select 8 from dual
union all select 9 from dual
union all select 10 from dual) AS T2,
(select 10 as mFactor10 from dual) AS T3 ) AS TMain
WHERE (((TMain.counter) Between 1 And 100));
this will produce 100 rows from 1..100.
you can then use this result as outer table for your SQL and find/count anyone whose age is on this list.
the logic would be:
select all age
from the reqeusted age list
find and count/return all matched records or return 0 if no records found.
In SQL, it would be something like this,
SELECT TMain.counter as Age,
(SELECT Count(*) AS [Count]
FROM (SELECT Int(DateDiff("d", Clients.dob, now())/365.25) AS AgeNum
FROM Clients) AS [%$###_Alias]
WHERE (TMain.counter = [%$###_Alias].ageNum)
GROUP BY [%$###_Alias].AgeNum) as number_of_clients
FROM (SELECT (T2.mAge*t3.mFactor10)+t1.mAge AS counter
FROM (select 1 as mAge from dual
union all select 2 from dual
union all select 3 from dual
union all select 4 from dual
union all select 5 from dual
union all select 6 from dual
union all select 7 from dual
union all select 8 from dual
union all select 9 from dual
union all select 10 from dual) AS T1,
(select 0 as mAge from dual
union all select 1 from dual
union all select 2 from dual
union all select 3 from dual
union all select 4 from dual
union all select 5 from dual
union all select 6 from dual
union all select 7 from dual
union all select 8 from dual
union all select 9 from dual
union all select 10 from dual) AS T2,
(select 10 as mFactor10 from dual) AS T3 ) AS TMain
WHERE (((TMain.counter) Between 1 And 100));
this will produce: age from 1..100 as well as number of clients for each age and null for null/empty no results .
of course, you can dynamically extend or shorten the age list.

You can use instead of COALESCE Nz function:
Nz([Age],0)
And yes, your link should work for you.

Create a 'sequence' table of all possible integers (FYI in UK medical data dictionaries we use 220 as the maximum age in years), then 'anti-join' to this table. You could use a view for your original results.
The following SQL DDL requires ANSI-92 Query Mode (probably better for the SQL Server coder than the default Query Mode) but can also be created maually using the Access GUI tools:
CREATE TABLE Seq ( seq INT NOT NULL UNIQUE );
INSERT INTO Seq VALUES ( 1 );
INSERT INTO Seq VALUES ( 2 );
INSERT INTO Seq VALUES ( 3 );
...
(you can use Excel to create this script!)
...
INSERT INTO Seq VALUES ( 100 );
CREATE VIEW ClientAgeTallies ( AgeInYears, Tally )
AS
SELECT dt.AgeInYears, COUNT(*) AS Tally
FROM ( SELECT INT(DATEDIFF( 'd', c.dob, NOW() ) / 365.25) AS AgeInYears
FROM Clients AS c ) AS dt
GROUP
BY dt.AgeInYears;
SELECT AgeInYears, Tally
FROM ClientAgeTallies
UNION
SELECT seq AS AgeInYears, 0 AS Tally
FROM Seq
WHERE seq NOT IN ( SELECT AgeInYears FROM ClientAgeTallies );

MySQL Query Optimization for running total query - How can I reduce query exectuion time?

Problem
I have a query that I pasted below. The problem I face is how can I trim the latency to under the current time of about 10 seconds.
set# csum = 0;
SELECT Date_format(assigneddate, '%b %d %Y') AS assigneddate, (#csum: = #csum + numactionitems) AS totalactionitems
FROM(
SELECT assigneddate,
Sum(numactionitems) AS numactionitems FROM(
SELECT assigneddate,
Count( * ) AS numactionitems FROM(
SELECT *
FROM(
SELECT actionitemtitle,
actionitemstatement,
altownerid,
approvalstatement,
assigneddate,
assignorid,
closeddate,
closurecriteria,
closurestatement,
criticality,
duedate,
ecd,
notes,
ownerid,
Concat(lastname, ', ', firstname) AS owner,
cnames2.categoryvalue AS `team`,
cnames2.categorynameid AS `teamid`,
cnames3.categoryvalue AS `department`,
cnames3.categorynameid AS `departmentid`,
cnames4.categoryvalue AS `source`,
cnames4.categorynameid AS `sourceid`,
cnames5.categoryvalue AS `project_phase`,
cnames5.categorynameid AS `project_phaseid`,
ac1.actionitemid FROM actionitemcategories AS ac1 INNER JOIN actionitems AS a INNER JOIN users AS u INNER JOIN(
SELECT actionitemid AS a2id,
categorynameid AS c2 FROM actionitemcategories WHERE categoryid = 195) AS ac2 INNER JOIN categorynames AS cnames2 ON cnames2.categorynameid = ac2.c2 AND ac1.categoryid = 195 AND a.actionitemid = ac2.a2id AND ac1.actionitemid = a.actionitemid AND a.ownerid = u.userid INNER JOIN(
SELECT actionitemid AS a3id,
categorynameid AS c3 FROM actionitemcategories WHERE categoryid = 200) AS ac3 INNER JOIN categorynames AS cnames3 ON cnames3.categorynameid = ac3.c3 AND ac2.a2id = ac3.a3id INNER JOIN(
SELECT actionitemid AS a4id,
categorynameid AS c4 FROM actionitemcategories WHERE categoryid = 202) AS ac4 INNER JOIN categorynames AS cnames4 ON cnames4.categorynameid = ac4.c4 AND ac3.a3id = ac4.a4id INNER JOIN(
SELECT actionitemid AS a5id,
categorynameid AS c5 FROM actionitemcategories WHERE categoryid = 203) AS ac5 INNER JOIN categorynames AS cnames5 ON cnames5.categorynameid = ac5.c5 AND ac4.a4id = ac5.a5id) s WHERE 1 = 1) f GROUP BY assigneddate UNION ALL(
SELECT a.date AS assigneddate,
0 AS numactionitems FROM(
SELECT '2015-03-05' + INTERVAL(a.a + (10 * b.a) + (100 * c.a)) day AS date FROM(
SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a CROSS JOIN(
SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b CROSS JOIN(
SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c) a) ORDER BY assigneddate ASC) t GROUP BY assigneddate LIMIT 282) t
WHERE assigneddate != '0000-00-00'
Purpose of Query
The purpose of this query is to get all records that contain date values and to collect the running count of all records that fall on a certain date. Date values are computed within the sqlfiddle below. It's final purpose is to be displayed in a graph that takes the running total as a line graph. It will be counting upwards so it is a growing graph.
The graph I am displaying it in is called a build-up graph of all my records (action items with date values).
Description of Issue
My problem is that I am getting the results of the query in at least 10 seconds.
Question
How can I accelerate and reduce the latency of the query so that I will not stall the loading of my graph?
Complete Schema and portion of my above query that Runs Successfully
(I am having difficulty getting the main query to run at all on sqlfiddle, though I can run it from my own machine).
http://sqlfiddle.com/#!9/865ee/11
Any help or suggestions would be tremendously appreciated!
EDIT
ADDED Sample Screenshot of my Categories Interface
Category (First Table) has a field called categoryname which assumes one of 4 values can be expanded or deleted which is - Team, Department, Source, Project_Phase.
CategoryName (Second Table) has a field called categoryvalue which is the actual allowed value for each category (First Table)
Example - Team 1, Team 2, Team 3 are categoryvalues within categoryname and corresponding the category of Team.
Category

Start by making that table of dates a permanent table, not a subquery.
This construct performs very poorly, and can usually be turned into JOINs without subqueries:
JOIN ( SELECT ... )
JOIN ( SELECT ... )
This is because there is no index on the subqueries, so full scans are needed.
Provide EXPLAIN for the entire query.
Addenda
A PRIMARY KEY is a key; don't add another key with the same column(s).
EAV schema leads to complexity and sluggishness that you are encountering.
Don't use TINYTEXT; it slows down tmp tables in complex queries; use VARCHAR(255). Don't use VARCHAR(255), use VARCHAR with a realistic limit.
Why do you need both categories and categorynames?

how to split a string to 4 characters sub strings then concat them by hyphen?

I need to write a mysql query that does the following:
I have list of serial numbers in a database table, the query should only update serial number rows where the length of a serial number is more than 15, then each serial number is divided to 4-digits sub-strings separated by a hyphen. example :
'10028641232912356' should become '1002-8641-1232-9123-56'
I started with this, which only inserts a hyphen after first 4 :
SELECT serial_no, CHAR_LENGTH(TRIM(serial_no)) as 'length',
CONCAT_WS('-',SUBSTRING(TRIM(serial_no),1,4),
SUBSTRING(TRIM(serial_no),5,4)) as result
FROM pos where
serial_no is not null and CHAR_LENGTH(TRIM(serial_no))>=15 ;
this is only the select statement, at first I just want to get (as in select) the new format of the serial no then i'll update it, but since I dont know the exact length of each serial number I need to figure out this part:
CONCAT_WS('-',SUBSTRING(TRIM(serial_no),1,4)
This is must only be done using mysql functions
Any help is appreciated

Introduction
Since we've figured out that your strings can have any length; your issue boils down to how to split strings with any length in to it's chunks of some length in MySQL. Unfortunately there is a serious problem in MySQL that prevent us from doing this in "native" way. And this problem is - MySQL does not support sequences. That means you can not use some sort of internal iterator like construct to loop over your string.
But there is always a way
Building sequence
We can use a trick to do this. First part of trick: use CROSS JOIN to produce the desired row sets. If you are not aware how it works then I'll remind you. It will produce a Cartesian product of the two row sets. The second part of the trick: use a well known formula.
N = d1x101 + d2x102 + ...
Actually you can do that with any base, not just 10. For this demonstration I will use 10. Usage:
SELECT
n1.i+10*n2.i AS num
FROM
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
So we're using UNION ALL for the numbers 0..9 to produce a multiplication part (our di). With the query above you'll get consecutive 0..99 numbers. This is because we're using base powers till 2 only, and 102=100. You can check fiddle to make sure that it will work properly.
Iterating through string
Now with these "pseudo-generator" we can emulate iteration through the string. To do that, there are MySQL variables that will be our iterator. Of course separation of string piece is a work for SUBSTR(). So basic skel is:
SELECT
SUBSTR(#str, #i:=#i+#len, #len) as chunk
FROM
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
CROSS JOIN
(SELECT #str:='ABCD1234EFGH5678IJKL90', #len:=4, #i:=1-#len) as init
LIMIT 6;
(fiddle for sample above is here). We are just iterating through sequence and then using the iterator to create the correct offset. All that is left to do now is gather our string and insert the hyphens. Hopefully there is GROUP_CONCAT() in MySQL for that:
SELECT
GROUP_CONCAT(chunk SEPARATOR '-') AS string
FROM
(SELECT
SUBSTR(#str, #i:=#i+#len, #len) as chunk
FROM
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
CROSS JOIN
(SELECT #str:='ABCD1234EFGH5678IJKL90', #len:=4, #i:=1-#len) as init
) AS data
WHERE chunk!='';
(again, fiddle is here).
With whole table
Now it is a sample and you want to select that from a table. It will be more complicated:
SELECT
serial_no,
GROUP_CONCAT(chunk SEPARATOR '-') AS serial
FROM
(SELECT
SUBSTR(
IF(#str=serial_no, #str, serial_no),
#i:=IF(#str=serial_no, #i+#len, 1),
#len
) AS chunk,
#str:=serial_no AS serial_no
FROM
(SELECT
serial_no
FROM
pos
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n1
CROSS JOIN
(SELECT 0 as i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as n2
ORDER BY
serial_no) AS data
CROSS JOIN
(SELECT #len:=4, #i:=1-#len) AS init) AS seq
WHERE
chunk!=''
GROUP BY
serial_no;
Fiddle is available here. There is another trick to produce the row sets. We'll use CROSS JOIN now instead of initializing string. So we should pass the current pos value of the serial_no. In order to make sure all the rows will be iterated properly we have to order them (that is done with inner ORDER BY).
Limitations
Well as you already know, sequence is limited with 99; thus, with our #len defined as 4 we'll be able to split only by a 4000 length string maximum. Also this will use the whole sequence in any case. Even if your string is much shorter (chances are it is). Thus performance impact may have place.
Is it worth the effort?
My point is: mostly, no. It may be ok to use it once, maybe. But it won't be re-usable and it won't be readable. Thus there is little sense in doing such string operations with DBMS functions because we have applications for such things. It should be used for that and using it you actually can create re-usable/scalable/or/whatever code.
Another way may be to create stored procedure in which we do the desired thing (so, split the string by given length & concatenate it with given delimiter). But honestly, it's just an attempt to "hide the problem". Because even if it would be re-usable, it still will have the same weakness; performance impact. Even more so if we're going to create code for the DBMS. Then again, why don't we place that code in the application? In 99% of cases DBMS is the place for data storage and the application is the place for code (e.g. logic). Mixing these two things almost always ends in a bad result.

Making an assumption that the serial number is unique then the following should do it:-
SELECT serial_no, GROUP_CONCAT(SUBSTRING(serial_no, anInt, 4) ORDER BY anInt SEPARATOR '-')
FROM
(
SELECT serial_no, anInt
FROM pos
CROSS JOIN
(
SELECT (4 * (units.i + 10 * tens.i + 100 * hundreds.i)) + 1 AS anInt
FROM (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
CROSS JOIN (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) hundreds
) sub1
WHERE LENGTH(serial_no) >= anInt
AND LENGTH(serial_no) > 15
) sub2
GROUP BY serial_no
This will cope with a serial number up to ~4000 characters long.
This works by using a series of unioned fixed queries to get the numbers 0 to 9. This is cross joined against itself 3 times with the first being units, the next being tens and the last being hundreds (you can carry on adding more if you want). From these the numbers between 0 and 999 are generated, then multiplied by 4 and with 1 added (so giving 1 to 3997 in steps of 4), which is the starting position of each group of 4. The WHERE clause checks that this generated number is less than the length of serial_no (if it is you land up with duplicates), and that serial_no is longer than 15.
This will generate a list of all the numbers, each one repeated as many times as their are groups of 4 numbers (or partial groups), along with the start position of a group.
The outer SELECT then takes this list and uses substring to extract each group, and uses GROUP_CONCAT to join he results together again with '-' as the separator between each group. If also specifies the start position of each group as the order to join them again (would probably be fine without this, but I wouldn't guarantee it).
SQL fiddle here:-
http://www.sqlfiddle.com/#!2/eb2d0/2

Changing a Query with a numbered result set (with gaps,) to return result with no gaps, containing every number.

I have a select statement: select a, b, [...]; which returns the results:
a|b
---------
1|8688798
2|355744
4|457437
7|27834
I want it to return:
a|b
---------
1|8688798
2|355744
3|0
4|457437
5|0
6|0
7|27834
An example query that does not do what I would like, since it does not have the gap numbers:
select
sub.num_of_ratings,
count(sub.rater)
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings;
Explanation of the query:
If a user rates another user, the rating is listed in the table ratings and the id of the rating user is kept in the field rater_id. Effectively I check for all users who are referred to in ratings and count how many ratings records I find for that user, which is rater / num_of_ratings, and then I use this result to find how many users have rated a given number of times.
At the end I know how many users rated once, how many users rated twice, etc. My problem is that the numbers for count(sub.rater) start fine from 1,2,3,4,5... However, for bigger numbers there are gaps. This is because there might be one user who rated 1028 times - but no user who rated 1027 times.
I don't want to apply stored procedures looping over the result or something like that. Is it possible to fill those gaps in the result without using stored procedures, looping, or creating temporary tables?

If you have a sequence of numbers, then you can do a JOIN with that table and fill in the gaps properly.
You can check out this questions on how to get the sequence:
generate an integer sequence in MySQL
Here is one of the answers posted that might be easily used with the limitation that generates numbers from 1 to 10,000:
SELECT #row := #row + 1 as row FROM
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t2,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t3,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t4,
(SELECT #row:=0) t5

Using a sequence of numbers, you can join your result set. For instance, assuming your number list is in a table called numbersList, with column number:
Select number, Count
from
numbersList left outer join
(select
sub.num_of_ratings,
count(sub.rater) as Count
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings) as num
on num.num_of_ratings=numbersList.number
where numbersList.number<max(num.num_of_ratings)
Your numbers list must be larger than your largest value, obviously, and the restriction will allow it to not have all numbers up to the maximum. (If MySQL does not allow that type of where clause, you can either leave the where clause out to list all numbers up to the maximum, or modify the query in various ways to achieve the same result.)

#mazzucci: the query is too magical and you are not actually explaining the query.
#David: I cannot create a table for that purpose (as stated in the question)
Basically what I need is a select that returns a gap-less list of numbers. Then I can left join on that result set and treat NULL as 0.
What I need is an arbitrary table that keeps more records than the length of the final list. I use the table user for that in the following example:
select #row := #row + 1 as index
from (select #row := -1) r, users u
limit 101;
This query returns a set of the numbers von 0 to 100. Using it as a subquery in a left join finally fills the gap.
users is just a dummy to keep the relational engine going and hence producing the numbers incrementally.
select t1.index as a, ifnull(t2.b, 0) as b
from (
select #row := #row + 1 as index
from (select #row := 0) r, users u
limit 7
) as t1
left join (
select a, b [...]
) as t2
on t1.index = t2.a;
I didn't try this very query live, so have merci with me if there is a little flaw. but technically it works. you get my point.
EDIT:
just used this concept to gain a gapless list of dates to left join measures onto it:
select #date := date_add(#date, interval 1 day) as date
from (select #date := '2010-10-14') d, users u
limit 700
starts from 2010/10/15 and iterates 699 more days.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL to return distinct rows with a max value of column? Complicated by duplicates of every other column - duplicates

Related

Optimizing a Select with Subquery that is loading VERY slow

How to include empty rows in Access query results

MySQL Query Optimization for running total query - How can I reduce query exectuion time?

how to split a string to 4 characters sub strings then concat them by hyphen?

Changing a Query with a numbered result set (with gaps,) to return result with no gaps, containing every number.

Categories

Resources