what is the query equivalence in postgresql - mysql

what is the equivalence query on postgresql?
Build database and tables
CREATE DATABASE IF NOT EXISTS lemonade;
use lemonade;
CREATE TABLE users (
id int PRIMARY KEY NOT NULL auto_increment,
name varchar(50) NOT NULL,
email varchar(50) NOT NULL
);
CREATE TABLE memories (
id int PRIMARY KEY NOT NULL auto_increment,
content varchar(50) NOT NULL,
userID int,
FOREIGN KEY (userID) REFERENCES users(id)
);
INSERT INTO users (name, email) VALUES ("Ruan", "ruan#gmail.com");
INSERT INTO users (name, email) VALUES ("Pereira", "pereira#gmail.com");
INSERT INTO memories (content, userID) VALUES ("memoria 1", 1);
INSERT INTO memories (content, userID) VALUES ("memoria 2", 1);
INSERT INTO memories (content, userID) VALUES ("memoria 3", 2);
INSERT INTO memories (content, userID) VALUES ("memoria 4", 2);
query on mysql:
select ANY_VALUE(m.id), ANY_VALUE(m.content), m.userID, ANY_VALUE(u.id), ANY_VALUE(u.name), ANY_VALUE(u.email) from memories m inner join users u on m.userID = u.id group by userID;
result:
image with result of my query
query on postgresql:
?
result: expect result equal of up image

The ANY_VALUE function in MySQL is used when a query is doing a GROUP BY aggregation, but columns are being selected which are not mentioned in the GROUP BY clause nor appear inside aggregate functions. In the context of your query, this means that the only columns which can be selected are the userID or another column inside an aggregate function like MAX or SUM. Technically you can also select other columns from the users table as well, assuming they are functionally dependent on the userId. As the name implies, ANY_VALUE is telling MySQL to return any value for that column from each group of records.
As far as I know/expect, the value you get from ANY_VALUE is not guaranteed to be deterministic, and so would be logically equivalent to selecting a random value for that column from each group of records. Assuming you do not care which values you get back, in Postgres you could arbitrarily just select the earliest memory for each user:
SELECT
memory_id, content, id, name, email
FROM
(
SELECT m.id AS memory_id, m.content, u.id, u.name, u.email,
ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY m.id) rn
FROM memories m
INNER JOIN users u
ON m.userID = u.id
) t
WHERE rn = 1;
I think that in general you should avoid using ANY_VALUE in MySQL unless you absolutely have no other choice. A better long term solution would be to clean up the MySQL query and make it ANSI compliant. Then it would be straightforward how to port it to another database.

I think you need DISTINCT ON ( expression ...) clause.
New code is
select distinct on(u.id)
m.id, m.content, m.userID, u.id, u.name, u.email
from memories m
inner join users u on m.userID = u.id;
Note: DISTINCT ON is Non-Standard SQL Clause.
Form more information see

I just had this problem and simply solved it by using a min function. It works also on string columns. In your case it would give:
SELECT
m.userID,
min(m.id) as id,
min(m.content) as content,
min(u.id) as uid,
min(u.name) as name,
min(u.email) as email
FROM memories m
INNER JOIN users u on m.userID = u.id
GROUP BY userID;
If you were ready to use any_value, then it means that you don't care which value to return. Take the minimum one is one of those random values.

Related

Slow SQL query with LEFT JOIN

I have already read similar questions, but it does not help me.
I have query
SELECT `login`,
`photo`,
`username`,
`user`.`id`,
`name`,
`msg_info`
FROM `user`
LEFT JOIN `friends`
ON `friends`.`child` = `user`.`fb_id`
WHERE `friends`.`parent` = '1111'
ORDER BY `msg_info` DESC
Which tooks 0.7411 seconds (and even more)
It shows 158 total rows (ok i can limit it, but query still slow)
Each of tables friends and user has more than 200.000 rows
What can i do for query go faster?
Thank you!
As the comments pointed out, your left join is really not different than the following inner join query:
SELECT
login,
photo,
username,
user.id,
name,
msg_info
FROM user u
INNER JOIN friends f
ON f.child = u.fb_id
WHERE
f.parent = '1111'
ORDER BY
msg_info DESC;
We can try adding an index to the friends table on (parent, child, name, msg_info, ...). I am not sure which other columns belong to friends, but the basic idea is to create an index on parent, to speed up the WHERE clause, and hopefully take advantage of low cardinality on the parent column. Then, we include the child column to speed up the join. We also include all the other columns in the select clause to let the index cover the other columns we need.
CREATE INDEX idx ON friends (parent, child, name, msg_info, ...);
As #MrVimes suggeted, sometimes adding a condition to the JOIN clause can make a big difference:
SELECT login, photo, username, user.id, name, msg_info
FROM user u
INNER JOIN friends f ON f.child = u.fb_id AND f.parent = '1111'
ORDER BY msg_info DESC;
Assuming, of course, all your PK and FKs are properly defined and indexed.

MySQL optimize a union-query by using a join-query instead

I have 3 tables - one for users, one for their incoming payments, and one for their outgoing payments. I want to display all incoming and outgoing payments in a single result set. I can do this with multiple selects and a union but it seems cumbersome, and I suspect its slow due to the subqueries - and the tables are extremely large (though I am using indexes). Is there a faster way to achieve this? Maybe using a full outer join?
Here is a simplified version of the schema with some example data:
create table users (
id int auto_increment,
name varchar(20),
primary key (id)
) engine=InnoDB;
insert into users (name) values ('bob'),('fred');
create table user_incoming_payments (
user_id int,
funds_in int
) engine=InnoDB;
insert into user_incoming_payments
values (1,100),(1,101),(1,102),(1,103),
(2,200),(2,201),(2,202),(2,203);
create table user_outgoing_payments (
user_id int,
funds_out int
) engine=InnoDB;
insert into user_outgoing_payments
values (1,100),(1,101),(2,200),(2,201);
And here is the ugly looking query which generates the result I want for user bob:
select * from (
(select u.name, i.funds_in, 0 as 'funds_out' from users u
inner join user_incoming_payments i on u.id = i.user_id)
union
(select u.name, 0 as 'funds_in', o.funds_out from users u
inner join user_outgoing_payments o on u.id = o.user_id)
) a where a.name = 'bob'
order by a.funds_in asc, a.funds_out asc;
And here is as close as I can get to doing the same thing with joins - its not correct though because I want this result set to look the same as the previous and I wasn't sure how to use full outer join:
select *
from users u
right join user_incoming_payments i on u.id = i.user_id
right join user_outgoing_payments o on u.id = o.user_id
where u.name = 'bob';
SQL Fiddle here
MySQL doesn't support FULL OUTER JOIN. Even if it did support it, I don't think you would want that, as it would introduce a semi-cartesian product... with each row from incoming_ matching every row in outgoing_, creating extra rows.
If there were four rows from incoming_ and six rows from outgoing_, the set produced by a join operation would contain 24 rows.
This really looks more like you want a set concatenation operation. That is, you have two separate sets that you want to concatenate together. That's not a JOIN operation. That's a UNION ALL set operation.
SELECT ... FROM ...
UNION ALL
SELECT ... FROM ...
If you don't need to remove duplicates (and it looks like you wouldn't want to in this scenario, if there are multiple rows in incoming_ with the same value of funds_in, I don't think you want to remove any of the rows.)...
Then use the UNION ALL set operator which does not perform the check for and removal of duplicate rows.
The UNION operator removes duplicate rows. Which (again) I don't think you want.
The derived table isn't necessary.
And MySQL doesn't "push" the predicate from the outer table into the inline view. Which means that MySQL is going to materialized a derived table with all incoming and outgoing for all users. And the the outer query is going to look through that to find the rows. And until the most recent versions of MySQL, there were no indexes created on derived tables.
See the answer from Strawberry for an example of a more efficient query.
With the small example set, indexes aren't going to make any difference. With a large set, however, you are going to want to add appropriate covering indexes.
Also, with queries like this, I tend to include a discriminator column that tells me which query returned a row.
(
SELECT 'i' AS src
, ...
FROM ...
)
UNION ALL
(
SELECT 'o' AS src
, ...
FROM ...
)
ORDER BY ...
With this model, I'd probably write that query as follows, but I doubt it makes much difference...
select u.name
, i.funds_in
, 0 funds_out
from users u
join user_incoming_payments i
on u.id = i.user_id
where u.name = 'bob'
union all
select u.name
, 0 funds_in
, o.funds_out
from users u
join user_outgoing_payments o
on u.id = o.user_id
where u.name = 'bob'
order
by funds_in asc
, funds_out asc;
However, note that there's no PK here, which may prove problematic.
If it was me, I'd have one table for transactions, which would include a transaction_id PK, a timestamp for each each transaction, and a column to record whether a value was a credit or a debit.

Using an SQL LEFT JOIN with the MAX() and MIN() functions

Let's assume I have the following two tables:
CREATE TABLE users (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM;
CREATE TABLE logins (
user_id NOT NULL,
day DATE NOT NULL,
PRIMARY KEY (`user_id, `day`)
) ENGINE=MyISAM;
What I'm trying to do here is get a query for all users with the first day they logged in and the last day they logged in. The query I was executing to achieve this looks like the following:
SELECT u.id AS id, u.name AS name, MIN(l.day) AS first_login,
MAX(l.day) AS last_login
FROM users u
LEFT JOIN logins l ON u.id = l.user_id
The problem is that because of the use of MIN() and MAX(), I'm only receiving one row back in the entire result. I'm sure it's my use of those functions that's causing this. I should have one row per user, even if they do not have any login entries. This is the reason for me using a LEFT JOIN vs an INNER JOIN.
in order to use aggregate functions (min, max, ...) you need grouping. Try something like this:
SELECT u.id AS id, u.name AS name, MIN(l.day) AS first_login, MAX(l.day) AS last_login
FROM users u
LEFT JOIN logins l ON u.id = l.user_id
GROUP BY u.id
Any sensible database except MySQL would have given you an error on mixing row-terms and aggregate terms, making the error clearer. MySQL, unfortunately allows this behavior, making it harder to notice that you forgot the group by clause needed to generate a row per user:
SELECT u.id AS id,
u.name AS name,
MIN(l.day) AS first_login,
MAX(l.day) AS last_login
FROM users u
LEFT JOIN logins l ON u.id = l.user_id
GROUP BY u.id, u.name -- missing in the original query
Grouping is a waste of resources.
Use nested select statement instead.
eg.
SELECT
u.id AS id,
u.name AS name,
(
SELECT MAX(logins.day) FROM logins WHERE logins.user_id=u.id
) AS last_login
FROM users u;
MIN and MAX are aggregate functions.
You should use GROUP BY with some field from u, like id.

Finding newest or flagged row in a one to many relationship in MySQL

Sorry if the title is poor, I'm trying to do a join from a primary table to a related table and only return a single row that matches a flagged or the most recent row in the related table.
Primary table, let's call it group, has columns:
id, name
Related table, let's call it user, has columns:
id, group_id, email, default, updated_datetime
The query should return a single 'group', and either the matching record from 'user' with default=1 (preferred), or if no rows have default=1 then the row with MAX(updated_datettime). 'user' will have 1-n rows with group.id = user.group_id.
Example result:
group.id, group.name, user.email, user.default, user.updated_datetime
1, 'test', 'email', '0', '2015-06-10 12:00'
2, 'other', 'email', 1', '2015-06-08 10:00'
Both tables contain a lot of data so I'd prefer to do this with join's and not subqueries. I know how to do it with subqueries but I'm having trouble doing it with joins since the grouping has multiple conditions.
I'm fine with doing aliases, eg:
select a.* from (query) as a
I'm trying to avoid subqueries in the main select, eg:
select group.id, group.name, (query) as user.id
Thanks!
First, you will need to use 2 joins and a IF() in your select to select the default user or the last updated user.
Then, because you want to join using a MAX() condition, you will need subqueries.
Something like that should get you on the right track:
select group.id, if(default_user.id is not null, default_user.id, last_updated_user.id) as user_id
from group
left join user default_user on default_user.group_id = group.id and default_user.default = 1
join (
select user.* from (
select group_id, max(updated_datetime) as updated_datetime from user group by group_id
) as max_per_group join user using(group_id, updated_datetime)
) as last_updated_user on last_updated_user.group_id = group.id
But note that this isn't a fast query, and if you need to do it frequently, you might want to refactor your tables. For example, you could add foreign keys default_user_id and last_updated_user_id in the group table.

MySQL - return particular row if another joined row has a certain characteristic

Ok, so I've been working on this query and I'm pretty sure there has to be a better way of doing it rather than my current method which is nesting statement after statement within each other.
Here are my 2 main tables (rewritten quickly so they might not be perfect 'create table' code)
CREATE TABLE `person` (
`personid` INT PRIMARY KEY AUTO_INCREMENT,
`personuuid` VARCHAR(64),
`flags` INT,
`last_updated` DATETIME,
`first_name` VARCHAR(64),
`last_name` VARCHAR(64)
);
CREATE TABLE `person_status` (
`person_statusid` INT PRIMARY KEY AUTO_INCREMENT,
`person_statusesid` INT,
`personuuid` VARCHAR(64),
`groupsuuid` VARCHAR(64),
`start_date` DATE
);
Quick description: As you can see an individual person can have (potentially) multiple statuses. The most current status is determined by the start date (whichever has the greatest start date). Each person would also potentially have multiple entries within the person table. (Helps keep track of updating user information) The most up to date information would be the one that has the most recent last_updated value.
What I'm trying to do: I'm trying to get a list of all persons (aka personuuid) such that:
The entry that comes back to us has the most recent person status
The statuses only show up for a certain groupsuuid (aka filtered by group)
We also need additional filtering such that the name is filtered
We also need to make sure that flags is equal to 0
Here is the query I am running:
SELECT personuuid FROM (
SELECT personuuid, flags FROM (
SELECT DISTINCT personuuid, flags
FROM person
JOIN (
SELECT personuuid, t2.person_statusesid FROM (
SELECT personuuid, groupsuuid, t1.person_statusesid FROM (
SELECT personuuid, groupsuuid, p1.person_statusesid
FROM person_status as p1
ORDER BY start_date DESC, person_statusid DESC
) as t1
GROUP BY personuuid, groupsuuid
) as t2
WHERE groupsuuid='xxxxxxxxxx' AND person_statusesid = X
) AS t3 USING (personuuid)
WHERE (first_name LIKE '%TEST%' OR last_name LIKE '%TEST%')
ORDER BY person_statusesid, last_name, first_name, last_updated DESC
) as t4
GROUP BY personuuid
) as t5
WHERE flags <> 2;
As you can see what I'm doing is joining the person table with a table that orders the person status and then filters by group and statusesid. Then I am taking the two joined tables, filtering by names, and then I'm attempting to grab the most recent person row and making sure that it has the proper flag.
I figured since this seems like a fun puzzle I'd ask you guys to see if someone can come up with a decent solution. I'm not the most expert at these things so I only know more of the basic SQl commands so any advice would be beneficial. Thanks =)
I would approach this by bringing together the data that I need in one place. That is, the most recent status record and the most recent person record. This requires finding the most recent updated date for the two tables and then joining various tables and subqueries together:
select ps.*, p.*
from person_status ps join
(select personuuid, max(last_updated) as maxlu
from person_status ps
group by person_status
) psm
on psm.personuuid = psm.personuuid and psm.maxlu = ps.last_updated join
(select personuuid, max(last_updated) as maxlu
from person p
group by personuuid
) pm
on pm.personuuid = ps.personuuid join
person p
on pm.personuuid = p.personuuid and pm.maxlu = p.lastupdated
where p.flags = 0 and
ps.groupsuuid in (<list of groups>) and
additional filtering;
If you are going to have a data structure with an "effective date" (the lastupdated) for records, you should consider having an "end date" as well. This is a bit more work in the update (typically requires a trigger), but it makes queries such as this much, much easier.
EDIT:
Your original query has a fair number of subqueries in it. In any case, you can rewrite this without the subqueries to simplify matters. For the following an index on personuuid, lastupdated (for both tables) would be needed for performance.
select ps.*, p.*
from person_status ps join
person p
on ps.personuuid = p.personuuid and
p.lastupdated = (select p2.lastupdated
from person p2
where p2.personuuid = p.personuuid
order by p2.lastupdated desc
limit 1
) and
ps.lastupdated = (select ps2.lastupdated
from person_status ps2
where ps2.personuuid = ps.personuuid
order by ps2.lastupdated desc
limit 1
)
where p.flags = 0 and
ps.groupsuuid in (<list of groups>) and
additional filtering;
In this version, the subqueries are correlated and will turn into index lookups.