I have a database of phone call data from our phone system that I am trying to create a report on. These phone calls match up to a table of internal and external numbers. The report needs to try to match the phone call to an external number in our database first and if there is no match try to match it to an internal number.
I have created a sample data set and db-fiddle, and removed some data to hopefully explain it better:
CREATE TABLE `cdr` (
`callnumber` int(11) NOT NULL,
`origLegCallIdentifier` int(11) NOT NULL,
`dateTimeOrigination` datetime NOT NULL,
`callType` varchar(50) NOT NULL,
`chargeable` varchar(10) NOT NULL,
`callCharge` decimal(10,2) NOT NULL,
`origNodeId` int(11) NOT NULL,
`destLegIdentifier` int(11) NOT NULL,
`destNodeId` int(11) NOT NULL,
`callingPartyNumber` varchar(50) NOT NULL,
`callingPartyNumberPartition` varchar(50) NOT NULL,
`callingPartyNumberState` varchar(10) NOT NULL,
`callingPartyNumberSite` varchar(30) NOT NULL,
`originalCalledPartyNumber` varchar(50) NOT NULL,
`originalCalledPartyNumberPartition` varchar(50) NOT NULL,
`finalCalledPartyNumber` varchar(50) NOT NULL,
`finalCalledPartyNumberPartition` varchar(50) NOT NULL,
`lastRedirectDn` varchar(50) NOT NULL,
`lastRedirectDnPartition` varchar(50) NOT NULL,
`dateTimeConnect` datetime DEFAULT NULL,
`dateTimeDisconnect` datetime NOT NULL,
`duration` int(11) NOT NULL,
`origDeviceName` varchar(129) NOT NULL,
`destDeviceName` varchar(129) NOT NULL,
`origIpv4v6Addr` varchar(64) NOT NULL,
`destIpv4v6Addr` varchar(64) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `cdr` (`callnumber`, `origLegCallIdentifier`, `dateTimeOrigination`, `callType`, `chargeable`, `callCharge`, `origNodeId`, `destLegIdentifier`, `destNodeId`, `callingPartyNumber`, `callingPartyNumberPartition`, `callingPartyNumberState`, `callingPartyNumberSite`, `originalCalledPartyNumber`, `originalCalledPartyNumberPartition`, `finalCalledPartyNumber`, `finalCalledPartyNumberPartition`, `lastRedirectDn`, `lastRedirectDnPartition`, `dateTimeConnect`, `dateTimeDisconnect`, `duration`, `origDeviceName`, `destDeviceName`, `origIpv4v6Addr`, `destIpv4v6Addr`) VALUES
(52004, 69637277, '2020-08-31 03:05:05', 'outbound-national', 'yes', '0.00', 4, 69637278, 4, '6220', 'PT_INTERNAL', 'NSW', 'Site A', '0412345678', 'PT_NATIONAL_TIME_RESTRICT', '0412345678', 'PT_NATIONAL_TIME_RESTRICT', '0412345678', 'PT_NATIONAL_TIME_RESTRICT', NULL, '2020-08-31 03:05:08', 0, 'SEP00XXXXX', 'XXXXX', '1.1.1.1', '1.1.1.1');
CREATE TABLE `numbers` (
`numberid` int(11) NOT NULL,
`number` varchar(30) NOT NULL,
`memberid` int(11) NOT NULL,
`type` enum('internal','external') NOT NULL,
`description` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `numbers` (`numberid`, `number`, `memberid`, `type`, `description`) VALUES
(1555, '0412345678', 436, 'internal', ''),
(1556, '6220', 437, 'external', '');
https://www.db-fiddle.com/f/ofH6sENoce8tGVsoxMejwZ/1
The above example shows how it ends up with a duplicate for a single record because it matches 6220 as the callingPartyNumber and 0412345678 as the finalCalledPartyNumber in each respective select.
This is an example of what I want to see (union has been removed):
https://www.db-fiddle.com/f/bVSWESvnKJKvuNefLqH4aU/0
I want a single record for when it either matches a finalCalledPartyNumber first or then a callingPartyNumber. Records that don't match anything will not be shown.
Updated select using Caius's example
SELECT
DATE(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Date',
TIME(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Time',
cdr.callType,
cdr.callingPartyNumberState,
cdr.callingPartyNumber,
COALESCE(finalcalledparty.memberid, callingparty.memberid, originalcalledparty.memberid, 'No Match') as MemberID,
cdr.originalCalledPartyNumber,
cdr.finalCalledPartyNumber,
CONCAT(MOD(HOUR(SEC_TO_TIME(cdr.duration)), 24), ':', LPAD(MINUTE(SEC_TO_TIME(cdr.duration)),2,0), ':', LPAD(second(SEC_TO_TIME(cdr.duration)),2,0)) as 'duration',
cdr.callCharge
FROM `cdr`
LEFT JOIN numbers finalcalledparty ON finalcalledparty.number = cdr.finalCalledPartyNumber
LEFT JOIN numbers callingparty ON callingparty.number = cdr.callingPartyNumber
LEFT JOIN numbers originalcalledparty ON originalcalledparty.number = cdr.OriginalCalledPartyNumber
WHERE (cdr.callType LIKE '%outbound%' OR cdr.callType LIKE '%transfer%' OR cdr.callType LIKE '%forward%')
ORDER BY Date DESC, Time DESC
Select with members table join
SELECT
DATE(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Date',
TIME(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Time',
cdr.callType,
'Calling' as ChargeType,
cdr.callingPartyNumberState,
cdr.callingPartyNumber,
COALESCE(finalcalledmember.name, callingmember.name, 'No Match') as MemberName,
cdr.finalCalledPartyNumber,
CONCAT(MOD(HOUR(SEC_TO_TIME(cdr.duration)), 24), ':', LPAD(MINUTE(SEC_TO_TIME(cdr.duration)),2,0), ':', LPAD(second(SEC_TO_TIME(cdr.duration)),2,0)) as 'duration',
cdr.callCharge
FROM `cdr`
LEFT JOIN numbers callingparty ON callingparty.number = cdr.callingPartyNumber
LEFT JOIN numbers finalcalledparty ON finalcalledparty.number = cdr.finalCalledPartyNumber
LEFT JOIN members callingmember ON callingmember.memberid = callingparty.memberid
LEFT JOIN members finalcalledmember ON finalcalledmember.memberid = finalcalledparty.memberid
WHERE (callType LIKE '%outbound%' OR callType LIKE '%transfer%' OR callType LIKE '%forward%') AND DATE(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) = '2020-09-01'
ORDER BY Date DESC, Time DESC
The report needs to try to match the phone call to an external number in our database first and if there is no match try to match it to an internal number.
You can use a pair of left joins for this. Here's a simpler dataset:
Person, Number
John, e1
James, i2
Jenny, x3
ExternalNumber, Message
e1, Hello
InternalNumber
i2, Goodbye
SELECT p.Person, COALESCE(e.Message, i.Message, 'No Match')
FROM
Person p
LEFT JOIN Externals e ON p.Number = e.ExternalNumber
LEFT JOIN Internal e ON p.Number = i.InternalNumber
Results:
John, Hello
James, Goodbye
Jenny, No Match
Few things you need to appreciate about SQL in general:
A UNION makes a dataset grow taller (more rows)
A JOIN makes a dataset grow wider (more columns)
It is easy to compare things on the same row, more difficult to compare things on different rows
There isn't exactly a concept of "doing something now" and "doing something later" - i.e. your "try to match it to external first and if that doesn't work try match it to internal" isn't a good way to think about the problem, mentally. The SQL way would be to "match it to external and match it to internal, then preferentially pick the external match, then the internal match, then maybe no match"
COALESCE takes a list of arguments and, working left to right, returns the first one that isn't null. Coupled with LEFT JOIN putting nulls when the match fails, it means we can use it to prefer external matches over internal
Because it's easier to compare things on the same row, we just try and match the data against the external and internal numbers tables as a direct operation. We use LEFT JOIN so that if the match doesn't work out, at least it doesn't cause the row to disappear..
So you join both numbers tables in and the matches either work out for external (and you will pick external), work out for internal but not external (and you will pick internal), work out for both int and ext (and you will pick ext over int), or don't work out (and you might have a message to say No Match)
It should be pointed out that the COALESCE approach only really works well if the data won't naturally contain nulls. If the data looked like this:
Person, Number
John, e1
James, i2
Jenny, x3
ExternalNumber, Message
e1, NULL
InternalNumber
i2, Goodbye
Then this will be the result:
John, Goodbye
James, Goodbye
Jenny, No Match
Even though the join succeeded, the presence of a NULL in the ExternalNumber.Message means the InternalNumber.Message is used instead, and this might not be correct. We can solve this by using CASE WHEN instead, to test for a column that definitely won't be null when a record matches:
CASE
WHEN e.ExternalNumber IS NOT NULL THEN e.Message
WHEN i.InternalNumber IS NOT NULL THEN i.Message
ELSE 'No Match'
END
Because we test the column that is the key for the join the only way we can get a null there is when the join fails to find a match.
I'm far from a MYSQL expert, and I'm struggling with a relatively complicated query.
I have two tables:
A Data table with columns as follows:
`Location` bigint(20) unsigned NOT NULL,
`Source` bigint(20) unsigned NOT NULL,
`Param` bigint(20) unsigned NOT NULL,
`Type` bigint(20) unsigned NOT NULL,
`InitTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`ValidTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`Value` double DEFAULT NULL
A Location Group table with columns as follows:
`Group` bigint(20) unsigned NOT NULL,
`Location` bigint(20) unsigned NOT NULL,
The data table stores data of interest, where each 'value' is valid for a particular 'validtime'. However, the data in the table comes from a calculation which is run periodically. The initialisation time at which the calculation is run is stored in the 'inittime' field. A given calculation with particular inittime may result in, say 10 values being output with valid times (A - J). A more recent calculation, with a more recent inittime, may result in another 10 values being output with valid times (B - K). There is therefore an overlap in available values. I always want a result set of Values and ValidTimes for the most recent inittime (i.e. max(inittime)).
I can determine the most recent inittime using the following query:
SELECT MAX(InitTime)
FROM Data
WHERE
Location = 100060 AND
Source = 10 AND
Param = 1 AND
Type = 1;
This takes 0.072 secs to execute.
However, using this as a sub-query to retrieve data from the Data table results in an execution time of 45 seconds (it's a pretty huge table, but not super ridiculous).
Sub-Query:
SELECT Location, ValidTime, Value
FROM Data data
WHERE Source = 10
AND Location IN (SELECT Location FROM Location Group WHERE Group = 3)
AND InitTime = (SELECT max(data2.InitTime) FROM Data data2 WHERE data.Location = data2.Location AND data.Source = data2.Source AND data.Param = data2.Param AND data.Type = data2.Type)
ORDER BY Location, ValidTime ASC;
(Snipped ValidTime qualifiers for brevity)
I know there's likely some optimisation that would help here, but I'm not sure where to start. Instead, I created a stored procedure to effectively perform the MAX(InitTime) query, but because the MAX(InitTime) is determined by a combo of Location, Source, Param and Type, I need to pass in all the locations that comprise a particular group. I implemented a cursors-based stored procedure for this before realising there must be an easier way.
Putting aside the question of optimisation via indices, how could I efficiently perform a query on the data table using the most recent InitTime for a given location group, source, param and type?
Thanks in advance!
MySQL can do a poor job optimizing IN with a subquery (sometimes). Also, indexes might be able to help. So, I would write the query as:
SELECT d.Location, d.ValidTime, d.Value
FROM Data d
WHERE d.Source = 10 AND
EXISTS (SELECT 1 FROM LocationGroup lg WHERE d.Location = lg.Location and lg.Group = 3) AND
d.InitTime = (SELECT max(d2.InitTime)
FROM Data d2
WHERE d.Location = d2.Location AND
d.Source = d2.Source AND
d.Param = d2.Param AND
d.Type = d2.Type
)
ORDER BY d.Location, d.ValidTime ASC;
For this query, you want indexes on data(Location, Source, Param, Type, InitTime) and LocationGroup(Location, Group), and data(Source, Location, ValidTime).
I have a mind-bending problem with a MySQL / MariaDB query, with a table structure as follows:
event
id INT(11)
time DATETIME
description VARCHAR(1000)
report
id INT(11)
event_fk INT(11) Refers to event
reporttemplate_fk INT(11) Refers to reporttemplate
reporttemplate Localized report templates. Types: before event / after event, for each language
id INT(11)
type VARCHAR(255)
name VARCHAR(255)
template VARCHAR(10000)
reportvalue
report_fk INT(11) Refers to report
key VARCHAR(255)
value VARCHAR(255)
There are two kinds of reporttemplates, one for before event (all events have this) and one for after event (only some events have this). There are tens of different reportvalues for before report, and a subset of around a dozen reportvalues for after report.
The problem is this: how can i form a query that calculates, for each event, the count of matching key-value-pairs for before- and after-reports in reportvalue-table, when reports of both types exist for the event?
Something like this should do it:
select
e.id
count(before.id),
count(after.id)
from
event e
join reporttemplate before on e.id = before.event_fk and before.type = 'BEFORE'
join reporttemplate after on e.id = after.event_fk and after.type = 'AFTER'
group by
e.id
Thank you for replies, i was going to try the JOIN-route, but managed to pull through with this:
select
v1.key,
count(*)
from
reportvalue v1,
reportvalue v2,
report r1,
report r2,
event e
where
e.status = "RESOLVED" // status-column was missing from original question
and r1.event_fk = e.id
and r2.event_fk = e.id
and r1.reporttemplate_fk = [report_template_before_id] // parametrized
and r2.reporttemplate_fk = [report_template_after_id] // parametrized
and v1.report_fk = r1.id
and v2.report_fk = r2.id
and v1.key = v2.key
and v1.value <> v2.value
group by
key;
This seems to give the correct number of non-matching key-value pairs per event, which is the result i really needed here.
I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc
Im running the following query to get the stats for a user, based on which I pay them.
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, (sum(hit_uniques)/1000)*hit_paylevel as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user
The table in question looks like this:
CREATE TABLE IF NOT EXISTS `daily_hits` (
`hit_itemid` varchar(255) NOT NULL,
`hit_mainid` int(11) NOT NULL,
`hit_user` int(11) NOT NULL,
`hit_date` date NOT NULL,
`hit_hits` int(11) NOT NULL DEFAULT '0',
`hit_uniques` int(11) NOT NULL,
`hit_embed` int(11) NOT NULL,
`hit_paylevel` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`hit_itemid`,`hit_date`),
KEY `hit_user` (`hit_user`),
KEY `hit_mainid` (`hit_mainid`,`hit_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem in the calculation has to do with the hit_paylevel which acts as a multiplier. Default is one, the other option is 2 or 3, which essentially doubles or triples the earnings for that day.
If I loop through the days, the daily day_earnings is correct, its just that when I group them, it calculates everything as paylevel 1. This happens if the user was paylevel 1 in the beginning, and was later upgraded to a higher level. if user is pay level 2 from the start, it also calculates everything correctly.
Shouldn't this be sum(hit_uniques * hit_paylevel) / 1000?
Like #Denis said:
Change the query to
SELECT hit_paylevel, sum(hit_uniques) as day_unique_hits
, sum(hit_uniques * hit_paylevel) / 1000 as day_earnings
, hit_date
FROM daily_hits
WHERE hit_user = 'xxx' AND hit_date >= '2011-05-01' AND hit_date < '2011-06-01'
GROUP BY hit_user;
Why this fixes the problem
Doing the hit_paylevel outside the sum, first sums all hit_uniques and then picks a random hit_paylevel to multiply it by.
Not what you want. If you do both columns inside the sum MySQL will pair up the correct hit_uniques and hit_paylevels.
The dangers of group by
This is an important thing to remember on MySQL.
The group by clause works different from other databases.
On MSSQL *(or Oracle or PostgreSQL) you would have gotten an error
non-aggregate expression must appear in group by clause
Or words to that effect.
In your original query hit_paylevel is not in an aggregate (sum) and it's also not in the group by clause, so MySQL just picks a value at random.