PostgreSQL UPDATE equivalent for MySQL query - mysql

I have a simple MySQL query that I want to convert to PostgreSQL. After 3 days I finally quit as I don't understand what wrong here:
UPDATE webUsers u,
(SELECT IFNULL(count(s.id),0) AS id, p.associatedUserId FROM pool_worker p
LEFT JOIN shares s ON p.username=s.username
WHERE s.our_result='Y' GROUP BY p.associatedUserId) a
SET shares_this_round = a.id WHERE u.id = a.associatedUserId
I have tried to convert it but it says error on SET. Here is my query:
UPDATE webusers
SET (shares_this_round) = (a.id)
FROM (SELECT coalesce(count(s.id),0) AS id, p.associatedUserId FROM pool_worker p
LEFT JOIN shares s ON p.username=s.username WHERE s.our_result='Y' GROUP BY p.associatedUserId) a, webusers w WHERE u.id = a.associatedUserId
Can anyone please tell me what's wrong with it? I can't sleep just because of this.
------------------------------EDIT-------------------------------------
shares table
CREATE TABLE shares (
id bigint NOT NULL,
rem_host character varying(255) NOT NULL,
username character varying(120) NOT NULL,
our_result character(255) NOT NULL,
upstream_result character(255),
reason character varying(50),
solution character varying(1000) NOT NULL,
"time" timestamp without time zone DEFAULT now() NOT NULL
);
webusers table
CREATE TABLE webusers (
id integer NOT NULL,
admin integer NOT NULL,
username character varying(40) NOT NULL,
pass character varying(255) NOT NULL,
email character varying(255) NOT NULL,
"emailAuthPin" character varying(10) NOT NULL,
secret character varying(10) NOT NULL,
"loggedIp" character varying(255) NOT NULL,
"sessionTimeoutStamp" integer NOT NULL,
"accountLocked" integer NOT NULL,
"accountFailedAttempts" integer NOT NULL,
pin character varying(255) NOT NULL,
share_count integer DEFAULT 0 NOT NULL,
stale_share_count integer DEFAULT 0 NOT NULL,
shares_this_round integer DEFAULT 0 NOT NULL,
api_key character varying(255),
"activeEmail" integer,
donate_percent character varying(11) DEFAULT '1'::character varying,
btc_lock character(255) DEFAULT '0'::bpchar NOT NULL
);
pool_workes table
CREATE TABLE pool_worker (
id integer NOT NULL,
"associatedUserId" integer NOT NULL,
username character(50),
password character(255),
allowed_hosts text
);

First, I formatted to arrive at this less confusing but still incorrect query:
UPDATE webusers
SET (shares_this_round) = (a.id)
FROM (
SELECT coalesce(count(s.id),0) AS id, p.associatedUserId
FROM pool_worker p
LEFT JOIN shares s ON p.username=s.username
WHERE s.our_result='Y'
GROUP BY p.associatedUserId) a
, webusers w
WHERE u.id = a.associatedUserId
There are multiple distinct errors and more sub-optimal parts in this statement. Errors come first and with bold emphasis. The last few items are just recommendations.
Missing alias u for webuser. A trivial mistake.
Missing join between w and a. Results in a cross join, which hardly makes any sense and is a very expensive mistake as far as performance is concerned. It is also completely uncalled for, you can drop the redundant second instance of webuser from the query.
SET (shares_this_round) = (a.id) is a syntax error. You cannot wrap a column name in the SET clause in parenthesis. It would be pointless anyway, just like the parenthesis around a.id. The latter isn't a syntax error, though.
As it turns out after comments and question update, you created the table with double-quoted "CamelCase" identifiers (which I advise not to use, ever, for exactly the kind of problems we just ran into). Read the chapter Identifiers and Key Words in the manual to understand what went wrong. In short: non-standard identifiers (with upper-case letters or reserved words, ..) have to be double-quoted at all times.
I amended the query below to fit the new information.
The aggregate function count() never returns NULL by definition. COALESCE is pointless in this context. I quote the manual on aggregate functions:
It should be noted that except for count, these functions return a
null value when no rows are selected.
Emphasis mine. The count itself works, because NULL values are not counted, so you actually get 0 where no s.id is found.
I also use a different column alias (id_ct), because id for the count is just misleading.
WHERE s.our_result = 'Y' ... if our_result is of type boolean, like it seems it should be, you can simplify to just WHERE s.our_result. I am guessing here, because you did not provide the necessary table definition.
It is almost always a good idea to avoid UPDATEs that do not actually change anything (rare exceptions apply). I added a second WHERE clause to eliminate those:
AND w.shares_this_round IS DISTINCT FROM a.id
If shares_this_round is defined NOT NULL, you can use <> instead because id_ct cannot be NULL. (Again, missing info in question.)
USING(username) is just a notational shortcut that can be used here.
Put everything together to arrive at this correct form:
UPDATE webusers w
SET shares_this_round = a.id_ct
FROM (
SELECT p."associatedUserId", count(s.id) AS id_ct
FROM pool_worker p
LEFT JOIN shares s USING (username)
WHERE s.our_result = 'Y' -- boolean?
GROUP BY p."associatedUserId"
) a
WHERE w.id = a."associatedUserId"
AND w.shares_this_round IS DISTINCT FROM a.id_ct -- avoid empty updates

Related

MYSQL ERROR CODE: 1288 - can't update with join statement

Thanks for past help.
While doing an update using a join, I am getting the 'Error Code: 1288. The target table _____ of the UPDATE is not updatable' and figure out why. I can update the table with a simple update statement (UPDATE sales.customerABC Set contractID = 'x';) but can't using a join like this:
UPDATE (
SELECT * #where '*' contains columns a.uniqueID and a.contractID
FROM sales.customerABC
WHERE contractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, contractID
FROM sales.tblCustomers
WHERE contractID IS NOT NULL
) as b
ON a.uniqueID = b.uniqueID
SET a.contractID = b.contractID;
If changing that update statement a SELECT such as:
SELECT * FROM (
SELECT *
FROM opwSales.dealerFilesCTS
WHERE pcrsContractID IS NULL
) as a
LEFT JOIN (
SELECT uniqueID, pcrsContractID
FROM opwSales.dealerFileLoad
WHERE pcrsContractID IS NOT NULL
) as b
ON a."Unique ID" = b.uniqueID;
the result table would contain these columns:
a.uniqueID, a.contractID, b.uniqueID, b.contractID
59682204, NULL, NULL, NULL
a3e8e81d, NULL, NULL, NULL
cfd1dbf9, NULL, NULL, NULL
5ece009c, , 5ece009c, B123
5ece0d04, , 5ece0d04, B456
5ece7ab0, , 5ece7ab0, B789
cfd21d2a, NULL, NULL, NULL
cfd22701, NULL, NULL, NULL
cfd23032, NULL, NULL, NULL
I pretty much have all database privileges and can't find restrictions with the table reference data. Can't find much information online concerning the error code, either.
Thanks in advance guys.
You cannot update a sub-select because it's not a "real" table - MySQL cannot easily determine how the sub-select assignment maps back to the originating table.
Try:
UPDATE customerABC
JOIN tblCustomers USING (uniqueID)
SET customerABC.contractID = tblCustomers.contractID
WHERE customerABC.contractID IS NULL AND tblCustomers.contractID IS NOT NULL
Notes:
you can use a full JOIN instead of a LEFT JOIN, since you want uniqueID to exist and not be null in both tables. A LEFT JOIN would generate extra NULL rows from tblCustomers, only to have them shot down by the clause requirement that tblCustomers.contractID be not NULL. Since they allow more stringent restrictions on indexes, JOINs tend to be more efficient than LEFT JOINs.
since the field has the same name in both tables you can replace ON (a.field1 = b.field1) with the USING (field1) shortcut.
you obviously strongly want a covering index with (uniqueID, customerID) on both tables to maximize efficiency
this is so not going to work unless you have "real" tables for the update. The "tblCustomers" may be a view or a subselect, but customerABC may not. You might need a more complicated JOIN to pull out a complex WHERE which might be otherwise hidden inside a subselect, if the original 'SELECT * FROM customerABC' was indeed a more complex query than a straight SELECT. What this boils down to is, MySQL needs a strong unique key to know what it needs to update, and it must be in a single table. To reliably update more than one table I think you need two UPDATEs inside a properly write-locked transaction.

Can we use FIND_IN_SET() function for multiple column in same table

NOTE : I tried many SF solution, but none work for me. This is bit challenging for, any help will be appreciated.
Below is my SQL-Fiddle link : http://sqlfiddle.com/#!9/6daa20/9
I have tables below:
CREATE TABLE `tbl_pay_chat` (
nId int(11) NOT NULL AUTO_INCREMENT,
npayid int(11) NOT NULL,
nSender int(11) NOT NULL,
nTos varchar(255) binary DEFAULT NULL,
nCcs varchar(255) binary DEFAULT NULL,
sMailBody varchar(500) binary DEFAULT NULL,
PRIMARY KEY (nId)
)
ENGINE = INNODB,
CHARACTER SET utf8,
COLLATE utf8_bin;
INSERT INTO tbl_pay_chat
(nId,npayid,nSender,nTos,nCcs,sMailBody)
VALUES
(0,1,66,'3,10','98,133,10053','Hi this test maail'),
(0,1,66,'3,10','98,133,10053','test mail received');
_____________________________________________________________
CREATE TABLE `tbl_emp` (
empid int(11) NOT NULL,
fullname varchar(45) NOT NULL,
PRIMARY KEY (empid)
)
ENGINE = INNODB,
CHARACTER SET utf8,
COLLATE utf8_bin;
INSERT INTO `tbl_emp` (empid,fullname)
VALUES
(3, 'Rio'),
(10, 'Christ'),
(66, 'Jack'),
(98, 'Jude'),
(133, 'Mike'),
(10053, 'James');
What I want :
JOIN above two tables to get fullname in (nTos & nCcs) columns.
Also, I want total COUNT() of rows.
What I tried is below query but getting multiples time FULLNAME in 'nTos and nCcs column' also please suggest to find proper number of row count.
SELECT a.nId, a.npayid, e1.fullname AS nSender, sMailBody, GROUP_CONCAT(b.fullname ORDER BY b.empid)
AS nTos, GROUP_CONCAT(e.fullname ORDER BY e.empid) AS nCcs
FROM tbl_pay_chat a
INNER JOIN tbl_emp b
ON FIND_IN_SET(b.empid, a.nTos) > 0
INNER JOIN tbl_emp e
ON FIND_IN_SET(e.empid, a.nCcs) > 0
JOIN tbl_emp e1
ON e1.empid = a.nSender
GROUP BY a.nId ORDER BY a.nId DESC;
I hope I made my point clear. Please help.
You have a horrible data model. You should not be storing lists of ids in strings. Why? Here are some reasons:
Numbers should be stored as numbers not strings.
Relationships between tables should be declared using foreign key relationships.
SQL has pretty poor string manipulation capabilities.
The use of functions and type conversion in ON often prevents the use of indexes.
No doubt there are other good reasons. Your data model should be using properly declared junction tables for the n-m relationships.
That said, sometimes we are stuck with other people's really, really, really, really bad design decisions. There are some ways around this. I think the query that you want can be expressed as:
SELECT pc.nId, pc.npayid, s_e.fullname AS nSender, pc.sMailBody,
GROUP_CONCAT(DISTINCT to_e.fullname ORDER BY to_e.empid)
AS nTos,
GROUP_CONCAT(DISTINCT cc_e.fullname ORDER BY cc_e.empid) AS nCcs
FROM tbl_pay_chat pc INNER JOIN
tbl_emp to_e
ON FIND_IN_SET(to_e.empid, pc.nTos) > 0 INNER JOIN
tbl_emp cc_e
ON FIND_IN_SET(cc_e.empid, pc.nCcs) > 0 JOIN
tbl_emp s_e
ON s_e.empid = pc.nSender
GROUP BY pc.nId
ORDER BY pc.nId DESC;
Here is a db<>fiddle.

How can I make this select lookup another table and find first match?

I have a database of phone call data from our phone system that I am trying to create a report on. These phone calls match up to a table of internal and external numbers. The report needs to try to match the phone call to an external number in our database first and if there is no match try to match it to an internal number.
I have created a sample data set and db-fiddle, and removed some data to hopefully explain it better:
CREATE TABLE `cdr` (
`callnumber` int(11) NOT NULL,
`origLegCallIdentifier` int(11) NOT NULL,
`dateTimeOrigination` datetime NOT NULL,
`callType` varchar(50) NOT NULL,
`chargeable` varchar(10) NOT NULL,
`callCharge` decimal(10,2) NOT NULL,
`origNodeId` int(11) NOT NULL,
`destLegIdentifier` int(11) NOT NULL,
`destNodeId` int(11) NOT NULL,
`callingPartyNumber` varchar(50) NOT NULL,
`callingPartyNumberPartition` varchar(50) NOT NULL,
`callingPartyNumberState` varchar(10) NOT NULL,
`callingPartyNumberSite` varchar(30) NOT NULL,
`originalCalledPartyNumber` varchar(50) NOT NULL,
`originalCalledPartyNumberPartition` varchar(50) NOT NULL,
`finalCalledPartyNumber` varchar(50) NOT NULL,
`finalCalledPartyNumberPartition` varchar(50) NOT NULL,
`lastRedirectDn` varchar(50) NOT NULL,
`lastRedirectDnPartition` varchar(50) NOT NULL,
`dateTimeConnect` datetime DEFAULT NULL,
`dateTimeDisconnect` datetime NOT NULL,
`duration` int(11) NOT NULL,
`origDeviceName` varchar(129) NOT NULL,
`destDeviceName` varchar(129) NOT NULL,
`origIpv4v6Addr` varchar(64) NOT NULL,
`destIpv4v6Addr` varchar(64) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `cdr` (`callnumber`, `origLegCallIdentifier`, `dateTimeOrigination`, `callType`, `chargeable`, `callCharge`, `origNodeId`, `destLegIdentifier`, `destNodeId`, `callingPartyNumber`, `callingPartyNumberPartition`, `callingPartyNumberState`, `callingPartyNumberSite`, `originalCalledPartyNumber`, `originalCalledPartyNumberPartition`, `finalCalledPartyNumber`, `finalCalledPartyNumberPartition`, `lastRedirectDn`, `lastRedirectDnPartition`, `dateTimeConnect`, `dateTimeDisconnect`, `duration`, `origDeviceName`, `destDeviceName`, `origIpv4v6Addr`, `destIpv4v6Addr`) VALUES
(52004, 69637277, '2020-08-31 03:05:05', 'outbound-national', 'yes', '0.00', 4, 69637278, 4, '6220', 'PT_INTERNAL', 'NSW', 'Site A', '0412345678', 'PT_NATIONAL_TIME_RESTRICT', '0412345678', 'PT_NATIONAL_TIME_RESTRICT', '0412345678', 'PT_NATIONAL_TIME_RESTRICT', NULL, '2020-08-31 03:05:08', 0, 'SEP00XXXXX', 'XXXXX', '1.1.1.1', '1.1.1.1');
CREATE TABLE `numbers` (
`numberid` int(11) NOT NULL,
`number` varchar(30) NOT NULL,
`memberid` int(11) NOT NULL,
`type` enum('internal','external') NOT NULL,
`description` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `numbers` (`numberid`, `number`, `memberid`, `type`, `description`) VALUES
(1555, '0412345678', 436, 'internal', ''),
(1556, '6220', 437, 'external', '');
https://www.db-fiddle.com/f/ofH6sENoce8tGVsoxMejwZ/1
The above example shows how it ends up with a duplicate for a single record because it matches 6220 as the callingPartyNumber and 0412345678 as the finalCalledPartyNumber in each respective select.
This is an example of what I want to see (union has been removed):
https://www.db-fiddle.com/f/bVSWESvnKJKvuNefLqH4aU/0
I want a single record for when it either matches a finalCalledPartyNumber first or then a callingPartyNumber. Records that don't match anything will not be shown.
Updated select using Caius's example
SELECT
DATE(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Date',
TIME(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Time',
cdr.callType,
cdr.callingPartyNumberState,
cdr.callingPartyNumber,
COALESCE(finalcalledparty.memberid, callingparty.memberid, originalcalledparty.memberid, 'No Match') as MemberID,
cdr.originalCalledPartyNumber,
cdr.finalCalledPartyNumber,
CONCAT(MOD(HOUR(SEC_TO_TIME(cdr.duration)), 24), ':', LPAD(MINUTE(SEC_TO_TIME(cdr.duration)),2,0), ':', LPAD(second(SEC_TO_TIME(cdr.duration)),2,0)) as 'duration',
cdr.callCharge
FROM `cdr`
LEFT JOIN numbers finalcalledparty ON finalcalledparty.number = cdr.finalCalledPartyNumber
LEFT JOIN numbers callingparty ON callingparty.number = cdr.callingPartyNumber
LEFT JOIN numbers originalcalledparty ON originalcalledparty.number = cdr.OriginalCalledPartyNumber
WHERE (cdr.callType LIKE '%outbound%' OR cdr.callType LIKE '%transfer%' OR cdr.callType LIKE '%forward%')
ORDER BY Date DESC, Time DESC
Select with members table join
SELECT
DATE(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Date',
TIME(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) as 'Time',
cdr.callType,
'Calling' as ChargeType,
cdr.callingPartyNumberState,
cdr.callingPartyNumber,
COALESCE(finalcalledmember.name, callingmember.name, 'No Match') as MemberName,
cdr.finalCalledPartyNumber,
CONCAT(MOD(HOUR(SEC_TO_TIME(cdr.duration)), 24), ':', LPAD(MINUTE(SEC_TO_TIME(cdr.duration)),2,0), ':', LPAD(second(SEC_TO_TIME(cdr.duration)),2,0)) as 'duration',
cdr.callCharge
FROM `cdr`
LEFT JOIN numbers callingparty ON callingparty.number = cdr.callingPartyNumber
LEFT JOIN numbers finalcalledparty ON finalcalledparty.number = cdr.finalCalledPartyNumber
LEFT JOIN members callingmember ON callingmember.memberid = callingparty.memberid
LEFT JOIN members finalcalledmember ON finalcalledmember.memberid = finalcalledparty.memberid
WHERE (callType LIKE '%outbound%' OR callType LIKE '%transfer%' OR callType LIKE '%forward%') AND DATE(CONVERT_TZ(cdr.dateTimeOrigination,'+00:00',##global.time_zone)) = '2020-09-01'
ORDER BY Date DESC, Time DESC
The report needs to try to match the phone call to an external number in our database first and if there is no match try to match it to an internal number.
You can use a pair of left joins for this. Here's a simpler dataset:
Person, Number
John, e1
James, i2
Jenny, x3
ExternalNumber, Message
e1, Hello
InternalNumber
i2, Goodbye
SELECT p.Person, COALESCE(e.Message, i.Message, 'No Match')
FROM
Person p
LEFT JOIN Externals e ON p.Number = e.ExternalNumber
LEFT JOIN Internal e ON p.Number = i.InternalNumber
Results:
John, Hello
James, Goodbye
Jenny, No Match
Few things you need to appreciate about SQL in general:
A UNION makes a dataset grow taller (more rows)
A JOIN makes a dataset grow wider (more columns)
It is easy to compare things on the same row, more difficult to compare things on different rows
There isn't exactly a concept of "doing something now" and "doing something later" - i.e. your "try to match it to external first and if that doesn't work try match it to internal" isn't a good way to think about the problem, mentally. The SQL way would be to "match it to external and match it to internal, then preferentially pick the external match, then the internal match, then maybe no match"
COALESCE takes a list of arguments and, working left to right, returns the first one that isn't null. Coupled with LEFT JOIN putting nulls when the match fails, it means we can use it to prefer external matches over internal
Because it's easier to compare things on the same row, we just try and match the data against the external and internal numbers tables as a direct operation. We use LEFT JOIN so that if the match doesn't work out, at least it doesn't cause the row to disappear..
So you join both numbers tables in and the matches either work out for external (and you will pick external), work out for internal but not external (and you will pick internal), work out for both int and ext (and you will pick ext over int), or don't work out (and you might have a message to say No Match)
It should be pointed out that the COALESCE approach only really works well if the data won't naturally contain nulls. If the data looked like this:
Person, Number
John, e1
James, i2
Jenny, x3
ExternalNumber, Message
e1, NULL
InternalNumber
i2, Goodbye
Then this will be the result:
John, Goodbye
James, Goodbye
Jenny, No Match
Even though the join succeeded, the presence of a NULL in the ExternalNumber.Message means the InternalNumber.Message is used instead, and this might not be correct. We can solve this by using CASE WHEN instead, to test for a column that definitely won't be null when a record matches:
CASE
WHEN e.ExternalNumber IS NOT NULL THEN e.Message
WHEN i.InternalNumber IS NOT NULL THEN i.Message
ELSE 'No Match'
END
Because we test the column that is the key for the join the only way we can get a null there is when the join fails to find a match.

SQL alternative to sub-query in FROM

I have a table containing user to user messages. A conversation has all messages between two users. I am trying to get a list of all the different conversations and display only the last message sent in the listing.
I am able to do this with a SQL sub-query in FROM.
CREATE TABLE `messages` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`from_user_id` bigint(20) DEFAULT NULL,
`to_user_id` bigint(20) DEFAULT NULL,
`type` smallint(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`is_deleted` tinyint(1) NOT NULL,
`text` longtext COLLATE utf8_unicode_ci NOT NULL,
`heading` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at_utc` datetime DEFAULT NULL,
`read_at_utc` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
);
SELECT * FROM
(SELECT * FROM `messages` WHERE TYPE = 1 AND
(from_user_id = 22 OR to_user_id = 22)
ORDER BY created_at_utc DESC
) tb
GROUP BY from_user_id, to_user_id;
SQL Fiddle:
http://www.sqlfiddle.com/#!2/845275/2
Is there a way to do this without a sub-query?
(writing a DQL which supports sub-queries only in 'IN')
You seem to be trying to get the last contents of messages to or from user 22 with type = 1. Your method is explicitly not guaranteed to work, because the extra columns (not in the group by) can come from arbitrary rows. As explained in the [documentation][1]:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The query that you want is more along the lines of this (assuming that you have an auto-incrementing id column for messages):
select m.*
from (select m.from_user_id, m.to_user_id, max(m.id) as max_id
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22)
) lm join
messages m
on lm.max_id = m.id;
Or this:
select m.*
from message m
where m.type = 1 and (m.from_user_id = 22 or m.to_user_id = 22) and
not exists (select 1
from messages m2
where m2.type = m.type and m2.from_user_id = m.from_user_id and
m2.to_user_id = m.to_user_id and
m2.created_at_utc > m.created_at_utc
);
For this latter query, an index on messages(type, from_user_id, to_user_id, created_at_utc) would help performance.
Since this is a rather specific type of data query which goes outside common ORM use cases, DQL isn't really fit for this - it's optimized for walking well-defined relationships.
For your case however Doctrine fully supports native SQL with result set mapping. Using a NativeQuery with ResultSetMapping like this you can easily use the subquery this problem requires, and still map the results on native Doctrine entities, allowing you to still profit from all caching, usability and performance advantages.
Samples found here.
If you mean to get all conversations and all their last messages, then a subquery is necessary.
SELECT a.* FROM messages a
INNER JOIN (
SELECT
MAX(created_at_utc) as max_created,
from_user_id,
to_user_id
FROM messages
GROUP BY from_user_id, to_user_id
) b ON a.created_at_utc = b.max_created
AND a.from_user_id = b.from_user_id
AND a.to_user_id = b.to_user_id
And you could append the where condition as you like.
THE SQL FIDDLE.
I don't think your original query was even doing this correctly. Not sure what the GROUP BY was being used for other than maybe try to only return a single (unpredictable) result.
Just add a limit clause:
SELECT * FROM `messages`
WHERE `type` = 1 AND
(`from_user_id` = 22 OR `to_user_id` = 22)
ORDER BY `created_at_utc` DESC
LIMIT 1
For optimum query performance you need indexes on the following fields:
type
from_user_id
to_user_id
created_at_utc

optimising and scaling mysql structure + queries for large mailing groups

So I have a system that stores contacts and allows them to be put into groups. These groups can be defined by criteria (everyone with surname 'smith'), or by explicitly adding / excluding people.
The problem I am having is that when I list the mailing groups, I need to count how many contacts are in each one. This number can change as contacts are added / removed from the contacts table. On small groups / amounts of contacts it is fine, however using 50k ish contacts runs into problems
An example query I use for this is as follows:
SELECT COUNT(c_id) FROM contacts, mgroups
LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10' AND ( c_tags LIKE '%tag1%')) AND c_id NOT IN
( SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id
The criteria table does not feature in this query, as the problem presents itself when large groups are created explicitly, rather than with a criteria. This is required as criteria based groups grow or shrink on the fly as you modify your contacts, where as explicit is generally set in stone. So in this case, if you explicitly add 20k contacts to a group, it adds 20k rows to the table marked with that mg_id as a foreign key.
This basically takes ages / times out / gets the wrong number / generally doesn't work very well. I either need to figure out a more efficient query, or figure out a better way to store everything.
Any ideas?
The 5 main tables that make up the database
contacts - where the actual contacts reside
Field Type Null Default Comments
c_id int(8) No
site_id int(6) No
c_email varchar(500) No
c_source varchar(255) No
c_subscribed tinyint(1) No 0
c_special tinyint(1) No 0
c_domain text No
c_title varchar(12) No
c_name varchar(128) No
c_surname varchar(128) No
c_company varchar(128) No
c_jtitle text No
c_ad1 text No
c_ad2 text No
c_ad3 text No
c_county varchar(64) No
c_city varchar(128) No
c_postcode varchar(32) No
c_lat varchar(100) No
c_lng varchar(100) No
c_country varchar(64) No
c_tel varchar(20) No
c_mob varchar(20) No
c_dob date No
c_registered datetime No
c_updated datetime No
c_twitter varchar(255) No
c_facebook varchar(255) No
c_tags text No
c_special_1 text No
c_special_2 text No
c_special_3 text No
c_special_4 text No
c_special_5 text No
c_special_6 text No
c_special_7 text No
c_special_8 text No
mgroups - basic mailing group info
Field Type Null Default Comments
mg_id int(8) No
site_id int(6) No
mg_name varchar(255) No
mg_created datetime No
mgroups_criteria - criteria for said mailing groups
Field Type Null Default Comments
mc_id int(8) No
site_id int(6) No
mc_mg_id int(8) No
mc_criteria text No
mgroups_exclude - anyone to exclude from criteria
Field Type Null Default Comments
mex_id int(8) No
site_id int(6) No
mex_c_id int(8) No
mex_mg_id int(8) No
mgroups_explicit - anyone to explicitly add without the use of criteria
Field Type Null Default Comments
me_id int(8) No
site_id int(6) No
me_c_id int(8) No
me_mg_id int(8) No
And the indexs / explain of query. Must admit, indexes are not my strong point, any improvements?
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mgroups ALL PRIMARY,mg_id NULL NULL NULL 9 Using temporary; Using filesort
1 PRIMARY mgroups_explicit ref me_mg_id me_mg_id 4 engine_4.mgroups.mg_id 8750
1 PRIMARY contacts ALL PRIMARY,c_id NULL NULL NULL 86012 Using where; Using join buffer
2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const table...
I don't see any indexes in the schema above, you do have indexes don't you?
run an explain on the query
EXPLAIN
SELECT COUNT(c_id) FROM
contacts, mgroups LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE
mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10'
AND ( c_tags LIKE '%tag1%'))
AND c_id NOT IN (SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id
That will tell you about what indexes are being used how many records it has to sort through etc..
DC
Right so I got this answered elsewhere (Huge thanks to Hambut_Bulge), so for the sake of it being useful to anyone else heres the solution:
First things off you're mixing old and new (ANSI) style joins in the same query. This is considered a bad idea in SQL circles. By old style I mean we write a query with a join along these lines
SELECT a.column_name, b.column2
FROM table1 a, second_table b
WHERE a.id_key = b.fid_key
AND b.some_other_criteria = 'Y';
In the newer ANSI style we'd rewrite the above to this:
SELECT a.column_name, b.column2
FROM table1 a INNER JOIN second_table b ON a.id_key = b.fid_key
WHERE b.some_other_criteria = 'Y';
Its neater and easier to read which bits are join conditions and which are where clauses. Its also best to get into the habit of using ANSI style as old style support may (at some point) be discontinued.
Also try and be consistent in your use of dot notation and/or aliases. Again it makes big queries easier to read.
Back to your problem query, I began by starting to convert it into ANSI style and straight-away noticed that you don't have a join condition between contacts and mgroups. This means that optimizer will create a cross join (also called a cartesian product), which was probably something you don't want to do. The cross join (in case you didn't know) joins every row in the contacts table with every row in the mgroups table. So if you have 50,000 rows in contacts and 20,000 rows in mgroup you're going to get a joined result set containing 1,000,000,000 rows!
The other thing that is going to slow this query drastically is the subquery on mgroups_exclude. A subquery is executed once for each row in the outer query eg:
SELECT a.column1
FROM table1 a
WHERE a.id_key NOT IN ( SELECT * FROM table2 b WHERE a.id_key = b.fid_key);
Assume that table1 has 2,000,000 rows and table2 has 500,000. For each and every row in the outer query (table1) the database is going to have to do a full scan on the inner query. So to get a result the database will have read 1,000,000,000,000 rows and we may only be interested in 1,000! It will not touch any indexes no matter what.
To get around this we can use a left join (also called a left outer join) on the two tables.
SELECT a.column1
FROM table1 a LEFT JOIN table2 b ON a.id_key = b.fid_key
WHERE b.fid_key IS NULL;
An outer join does not require each record in the joined tables to have a matching record. So the example above we'd get all the records from table1 even if there is no match on table2. For non-matched records the database returns a NULL and we can test for that in the where clause. Now the optimizer can scan the indexes on the two tables id_key fields (assuming there are any), resulting in a much faster query.
So, to wrap up. I'd rewrite your orginal query thus:
SELECT COUNT( a.c_id )
FROM contacts a
INNER JOIN mgroups b ON a.c_id = b.mg_id
LEFT JOIN mgroups_explicit c ON b.mg_id = c.me_mg_id
LEFT JOIN mgroups_exclude d ON a.c_id = d.mex_c_id
WHERE b.mg_id = '20'
AND a.site_id = '10'
AND a.c_tags LIKE '%tag1%'
AND d.mex_c_id IS NULL
GROUP BY c_id;