How to grab rows which contain a dual id reference - mysql

I have a messages table:
messages:
id(int)
send_id(int)
receive_id(int)
And I want to be able to select rows from this only when a->b and b->a exist, e.g.:
id send_id recieve_id
0, 15, 16
1, 16, 15
So that basically one message has been passed to each person. How would I be able to go about selecting just one of those two rows (either send or receive), and all of those for a specific id.
I want to only return results that have this duality.
My code currently uses a nested SELECT and doesn't work at all as needed.

You can achieve the result by taking advantage MySQL's LEAST and GREATEST built-in functions.
SELECT *
FROM messages
WHERE (LEAST(send_id, recieve_id), GREATEST(send_id, recieve_id), id)
IN
(
SELECT LEAST(send_id, recieve_id) as x,
GREATEST(send_id, recieve_id) as y,
MAX(id) msg_ID
FROM messages
GROUP BY x, y
);
SQLFiddle Demo
MySQL Comparison Operator (LEAST/GREATEST)

You have to define an additional synthesized column for this. Different alternatives: permanent as an index (fast), temporary if just for a selection once a month or on-the-fly inside the actual query.
Whatever alternative, that column should contain both ids, ordered in a numerical way and concatenated, maybe by some separation character like -. Now when you make a uniqueness restriction to that column only one of the two candidates can be entered into the result, the second one is rejected because it would violate that uniqueness rule.
The trick is the ordered concatenation instead of a normal combined index that would allow both variants due to the different order of ids.

Related

Not selecting duplicates in join / where query

I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...

Limit the result starting from a specific row with a given Id?

I want to write a query to select a subset of a table, only starting from a given id.
I know about limit x, y, but x here is the number of the raw to start from. But in my case I want to start from a specific Id, no matter what its location inside the table.
What I mean is that the query below selects from row number 5, but I want it to select 10 records from row with id, say 213odin2d211d21:
SELECT * FROM my_table Limit 5, 10
I can't find a way to do this. Any help will be appreciated.
Note that, the Id here is a mix of strings and integers. So I can't do
SELECT * FROM <table> WHERE id > (id)
What you want to do is not possible. By default, records in the database are not ordered. Without ORDER BY you can't expect the server to return your queries in any particular order. Since you are saying, that you store some kind of digit/char identifier as your id, for which less then and greater then are not defined, it is not clear which records "follow" your specific record.
You will either have to:
Define another column to sort your records on, or
Define a behaviour for comparing your ids (What is "less then"? What is "greater then"?)
That being said, you can of course define that you want to sort your id just like sorting strings! In this case, you can use STRCMP() to compare two strings. Your query would look like this:
SELECT * FROM <table> WHERE STRCMP(id,?) = 1 ORDER BY id LIMIT 10
This will select the first 10 records, with id "greater than" ?.

Which row's fields are returned when Grouping with MySQL?

I have a MySQL table with the fields id and string. ids are unique. strings are varchars and are non-unique.
I perform the following query:
SELECT id, string, COUNT( * ) AS frequency
FROM table
GROUP BY string
ORDER BY frequency DESC, id ASC
Questions
Assume the table contains three rows with identical string values, and ids 1, 2, and 3.
Which id is going to be returned ( 1, 2, or 3 )?
Which id is this query going to ORDER BY ( Same as is returned? ... see question 1 )?
Can you control which id is returned / used for ordering? eg. Return the largest id, or the first id from a GROUP.
What I'm ultimately trying to do is get a frequency occurrence for identical strings, order by that frequency, highest to lowest, and on a frequency tie, order by id with the smallest id from the group returned / ordered by. I made the situation more generic to figure out how MySQL handles this situation.
Which id is going to be returned ( 1, 2, or 3 )?
A: The server will choose for all the records that have the same name the id it wants (most likely the fastest to fetch, which is unpredictable). To cite the official documentation:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Much more information in this link.
Which id is this query going to ORDER BY ( Same as is returned? ... see question 1 )?
It makes no sense to find out in what order the data retrieved will be returned as you can't predict the result you are going to get. However, it is very likely that you get the result sorted by the unpredictable ID column.
Can you control which id is returned / used for ordering? eg. Return the largest id, or the first id from a GROUP.
You should be assuming at this point that you can't. Read again the documentation.
Making things even more clear: You can't predict the result of an improperly used GROUP BY clause. The main issue with MySQL is that it allows you to use it in a non-standard way but you need to know how to make use of that feature. The main point behind it is to group by fields that you know will always be the same. EG:
SELECT id, name, COUNT( * ) AS frequency
FROM table
GROUP BY id
Here, you know name will be unique as id functionally determines name. So the result you know is valid. If you grouped also by name this query would be more standard but will perform slightly worse in MySQL.
As a final note, take into account that, in my experience the results in those non-standard queries for the selected and non-grouped fields are usually the ones that you would get applying a GROUP BY and then an ORDER BY on that field. That is why so many times it seems to work. However, if you keep testing you will eventually find out that this happens 95% of the time. And you can not rely on that number.
The documentation says that when not grouping by all non-aggregate columns, one row for each unique combination if the grouped by columns is returned. The row selected is up to the server - ie "random"
However, in practice it is the first row encountered during processing. You can control which is encountered first by selecting from an inner query that is ordered in the order of preference of return.
For example to get the lowest id for each name (yes, undocumented, blah blah, but it works!):
SELECT id, name, COUNT( * ) AS frequency
FROM (select * from table order by id) x
GROUP BY name
ORDER BY frequency DESC, id ASC
I personally am comfortable relying on this behaviour and have never seen or heard of it behaving differently in real life. Many shun this as undocumented and "risky", but if it works, it works.

Query for multiple conditions in MySQL

I want to be able to query for multiple statements when I have a table that connects the id's from two other tables.
My three tables
destination:
id_destination, name_destination
keyword:
id_keyword, name_keyword
destination_keyword:
id_keyword, id_destination
Where the last one connects ids from the destination- and the keyword table, in order to associate destination with keywords.
A query to get the destination based on keyword would then look like
SELECT destination.name_destination FROM destination
NATURAL JOIN destination_keyword
NATURAL JOIN keyword
WHERE keyword.name_keyword like _keyword_
Is it possible to query for multiple keywords, let's say I wanted to get the destinations that matches all or some of the keywords in the list sunny, ocean, fishing and order by number of matches. How would I move forward? Should I restructure my tables? I am sort of new to SQL and would very much like some input.
Order your table joins starting with keyword and use a count on the number of time the destination is joined:
select
d.id_destination,
d.name_destination,
count(d.id_destination) as matches
from keyword k
join destination_keyword dk on dk.keyword = k.keyword
join destination d on d.id_destination = dk.id_destination
where name_keyword in ('sunny', 'ocean', 'fishing')
group by 1, 2
order by 3 desc
This query assumes that name_keyword values are single words like "sunny".
Using natural joins is not a good idea, because if the table structures change such that two naturally joined tables get altered to have columns the same name added, suddenly your query will stop working. Also by explicitly declaring the join condition, readers of your code will immediately understand how the tables are jones, and can modify it to add non-key conditions as required.
Requiring that only key columns share the same name is also restrictive, because it requires unnatural column names like "name_keyword" instead of simply "name" - the suffix "_keyword" is redundant and adds no value and exists only because your have to have it because you are using natural joins.
Natural joins save hardly any typing (and often cause more typing over all) and impose limitations on join types and names and are brittle.
They are to be avoided.
You can try something like the following:
SELECT dest.name_destination, count(*) FROM destination dest, destination_keyword dest_key, keyword key
WHERE key.id_keyword = dest_key.id_keyword
AND dest_key.id_destination = dest.id_destination
AND key.name_keyword IN ('sunny', 'ocean', 'fishing')
GROUP BY dest.name_destination
ORDER BY count(*), dest.name_destination
Haven't tested it, but if it is not correct it should show you the way to accomplish it.
You can do multiple LIKE statements:
Column LIKE 'value1' OR Column LIKE 'value2' OR ...
Or you could do a regular expression match:
Column LIKE 'something|somtthing|whatever'
The trick to ordering by number of matches has to do with understanding the GROUP BY clause and the ORDER BY clause. You either want one count for everything, or you want one count per something. So for the first case you just use the COUNT function by itself. In the second case you use the GROUP BY clause to "group" somethings/categories that you want counted. ORDER BY should be pretty straight forward.
I think based on the information you have provided your table structure is fine.
Hope this helps.
DISCLAIMER: My syntax isn't accurate.

MySQL: SELECT(x) WHERE vs COUNT WHERE?

This is going to be one of those questions but I need to ask it.
I have a large table which may or may not have one unique row. I therefore need a MySQL query that will just tell me TRUE or FALSE.
With my current knowledge, I see two options (pseudo code):
[id = primary key]
OPTION 1:
SELECT id FROM table WHERE x=1 LIMIT 1
... and then determine in PHP whether a result was returned.
OPTION 2:
SELECT COUNT(id) FROM table WHERE x=1
... and then just use the count.
Is either of these preferable for any reason, or is there perhaps an even better solution?
Thanks.
If the selection criterion is truly unique (i.e. yields at most one result), you are going to see massive performance improvement by having an index on the column (or columns) involved in that criterion.
create index my_unique_index on table(x)
If you want to enforce the uniqueness, that is not even an option, you must have
create unique index my_unique_index on table(x)
Having this index, querying on the unique criterion will perform very well, regardless of minor SQL tweaks like count(*), count(id), count(x), limit 1 and so on.
For clarity, I would write
select count(*) from table where x = ?
I would avoid LIMIT 1 for two other reasons:
It is non-standard SQL. I am not religious about that, use the MySQL-specific stuff where necessary (i.e. for paging data), but it is not necessary here.
If for some reason, you have more than one row of data, that is probably a serious bug in your application. With LIMIT 1, you are never going to see the problem. This is like counting dinosaurs in Jurassic Park with the assumption that the number can only possibly go down.
AFAIK, if you have an index on your ID column both queries will be more or less equal performance. The second query will need 1 less line of code in your program but that's not going to make any performance impact either.
Personally I typically do the first one of selecting the id from the row and limiting to 1 row. I like this better from a coding perspective. Instead of having to actually retrieve the data, I just check the number of rows returned.
If I were to compare speeds, I would say not doing a count in MySQL would be faster. I don't have any proof, but my guess would be that MySQL has to get all of the rows and then count how many there are. Altough...on second thought, it would have to do that in the first option as well so the code will know how many rows there are as well. But since you have COUNT(id) vs COUNT(*), I would say it might be slightly slower.
Intuitively, the first one could be faster since it can abort the table(or index) scan when finds the first value. But you should retrieve x not id, since if the engine it's using an index on x, it doesn't need to go to the block where the row actually is.
Another option could be:
select exists(select 1 from mytable where x = ?) from dual
Which already returns a boolean.
Typically, you use group by having clause do determine if there are duplicate rows in a table. If you have a table with id and a name. (Assuming id is the primary key, and you want to know if name is unique or repeated). You would use
select name, count(*) as total from mytable group by name having total > 1;
The above will return the number of names which are repeated and the number of times.
If you just want one query to get your answer as true or false, you can use a nested query, e.g.
select if(count(*) >= 1, True, False) from (select name, count(*) as total from mytable group by name having total > 1) a;
The above should return true, if your table has duplicate rows, otherwise false.