MySQL: Grouping results by perceptual hash similarity

MySQL: Grouping results by perceptual hash similarity - mysql

Let's say we have MySQL table Image with following columns
id
user_id
p_hash
I know how to calculate hamming distance (to reveal similar images) between newly inserted row's perceptual hash and all existing data in table. SQL query looks like this:
SELECT `Image`.*, BIT_COUNT(`p_hash` ^ :hash) as `hamming_distance`
FROM `Image`
HAVING `hamming_distance` < 5
I want to do the same to every existing image.(to check if there are similar images in the database)
So, I have go through every row of the Image table, do the same process as above and find similar images from the table.
Now the question is, after whole procedure I want to get similar image groups only if elements of each group has at least one different user_id id?
So if, found group of similar images belongs to one user, then skip it. But if it belongs to multiple different users then return it as one of results.
Please help to figure out.

Sounds like you want a self-join.
SELECT i1.id, GROUP_CONCAT(i2.id) AS similar_images
FROM Image AS i1
JOIN Image AS i2 ON i1.user_id != i2.user_id AND BIT_COUNT(i1.`p_hash` ^ i2.p_hash) < 5
GROUP BY i1.id
DEMO

Related

How to select comma-separated values from a field in one table joined to another table with a specific where condition?

I'm working on a mysql database select and cannot find a solution for this tricky problem.
There's one table "words" with id and names of objects (in this case possible objects in a picture).
words
ID object
house
tree
car
…
In the other table "pictures" all the information to a picture is saved. Besides to information to resolution, etc. there are especially informations on the objects in the picture. They are saved in the column objects by the ids from the table words like 1,5,122,345, etc.
Also the table pictures has a column "location", where the id of the place is written, where I took the picture.
pictures
location objectsinpicture ...
1 - 1,2,3,4
2 - 1,5,122,34
1 - 50,122,345
1 - 91,35,122,345
2 - 1,14,32
1 - 1,5,122,345
To tag new pictures of a particular place I want to become suggestions of already saved information. So I can create buttons in php to update the database instead of using a dropdown with multiple select.
What I have tried so far is the following:
SELECT words.id, words.object
FROM words, pictures
WHERE location = 2 AND FIND_IN_SET(words.id, pictures.objectsinpicture)
GROUP BY words.id
ORDER BY words.id
This nearly shows the expected values. But some information is missing. It doesn't show all the possible objects and I cannot find any reason for this.
What I want is for example all ids fo location 2 joined to the table words and to group double entries of objectsinpicture:
1,5,122,34
1,14,32
1,5,14,32,34,122
house
...
...
...
...
...
Maybe I need to use group_concat with comma separator. But this doesn't work, either. The problem seems to be where condition with the location.
I hope that anyone has an idea of solving this request.
Thanks in advance for any support!!!

This is a classic problem of denormalization causing problems.
What you need to do is store each object/picture association separately, in another table:
create table objectsinpicture (
picture_id int,
object_id int,
primary key (picture_id, object_id)
);
Instead of storing a comma-separated list, you would store one association per row in this table. It will grow to a large number of rows of course, but each row is just a pair of id's so the total size won't be too great.
Then you can query:
SELECT w.id, w.object
FROM pictures AS p
JOIN objectsinpicture AS o ON o.picture_id = p.id
JOIN words AS w ON o.object_id = w.id
WHERE p.location = 2;

How can I get the most recent result from a table with a column that has multiple of the same values

I'm trying to get make my table get only 1 of the results that I have in the table. My table stores a posted_by column that allows more than 1 user to post more than once. So, my posted_by column has multiple of the username Chowderrunnah because that user has posted more than once. I only want to display one of the results so each user can be clicked on in the table instead of displaying many of the same user. (I'm trying to create a support ticket system if you're wondering why).
At the moment, my table is getting all the results from my table, even duplicates with the same posted_by (username). I only want it to display one result if there are more than 1 of the same posted by (username).
At the moment, my code is only the minimum -
SELECT * from support WHERE last_post_by != 'Admin'
Help is much appreciated.
Edit
Here is the table I'm using
Here is my support ticket system

The following Query fixed my problem :)
SELECT * from support WHERE last_post_by != 'Admin' GROUP BY posted_by
Using GROUP BY allowed me to only display one of the duplicate posted_by results.

SQL Filter Results by User

I have a simple MySQL database with one table, call it X. Users interact with X via PHP. Simple stuff. Now I would like I would like to allow each user to flag specific "rows" in X so that they don't appear when they search X in the future.
Ex. Each user is shown, say rows 1 to 10. User A doesn't want to see rows 4, 8, 9. Ever. But User B, Q and Z love those rows and just can't live without them. Oh, and we can't forget User D who hates every row but 2. And so on...
How should I go about doing this?
Update:
I should have noted: I realize I can create another table with all the rows that people don't want, but what's the best way to design the table(s) in order to support an increasing number or users and data rows?

You need a table called X_blacklist.
It will have two columns: userid and postid. Those two columns together are the composite primary key.
For a user identified by userid to hide a post identified by postid she inserts that row into X_blacklist. Then when you display items from X to your users, you do this:
SELECT X.postid, X.postcontent
FROM X
LEFT JOIN X_blacklist ON X.postid = X_blacklist.postid AND X_blacklist.userid = ?
WHERE X_blacklist.postid IS NULL
This eliminates the items from your X table that are mentioned in the other table for the particular user.

SQL Joins - Why does this simple join work, despite the syntax making no sense?

I've read numerous tutorials and graphical representations of MySQL joins, and they still don't make sense to me.
I'm trying to type my own now, and they are working, but I just don't see how they're working.
Take this set of tables
images squares
---------------------------- ----------------------------------
image_id | name | square_id square_id | latitude | longitude
---------------------------- ----------------------------------
1 someImg 14 1 42.333 24.232
2 newImg 3 2 38.322 49.2320
3 blandImg 76 3 11.2345 99.4323
... ...
n n
This is a one to many relationship - one square can have many images, but an image can only have one square.
Now I run this simple join, but I'm not understanding the syntax of it at all...
SELECT images.image_id
FROM squares
LEFT JOIN images ON images.square_id=squares.square_id
WHERE images.square_id=711464;
Now, this actually works, which amazes me. It brings up a list of images within the square range.
But I'm having a hard time understanding the ON syntax.
What does ON do exactly?
Does it show how the two tables are related?
Mainly however, SELECT images.image_id FROM squares, makes the least sense.
How can I select a field in one table but FROM another?

Let's start with the FROM clause, which in it's entirety is:
FROM squares LEFT JOIN images ON images.square_id=squares.square_id
(it's not just FROM squares).
This defines the source of your data. You specify both tables, squares and images so they are both sources for the data that the query will work on.
Next, you use the on syntax to explain how this tables are related to one another. images.square_id=squares.square_id means: consider a row in the images table related to a row in the squares table if and only if the value of the field square_id of the images row is equal to the value of the field square_id of the squares table. At this moment, each row of the result is a combination of a row from the images table and a row from the squares table (I'll ignore the LEFT JOIN at the moment).
Next, you have the WHERE clause
WHERE images.square_id=711464
This means, from the rows that are in result set, just get those where the value of the square_id field, in that part of the result row that came from the images table, is exactly 711464.
And last comes the SELECT part.
SELECT images.image_id
This means, from the rows that are in the result set (a combination of a square row and a images row), take just the field image_id that is from the images table.

You should read the query as such:
SELECT images.image_id FROM
squares LEFT JOIN images
ON images.square_id=squares.square_id
WHERE
images.square_id=711464
So you first join the squares table with the images table, combining entries in images which have the same square_id as in the squares table. So, the ON syntax is actually specifying the condition on which the two tables should be joined.
Then, you do a filter using WHERE, which will take the entries with square_id=711464
Note that by the time you do the SELECT, you already joined the two tables, which will have the combined fields:
images
--------------------------------------------------
square_id | latitude | longitude | image_id | name
--------------------------------------------------
So, you can select the square_id from the resulting table.

It is more like :
SELECT images.image_id FROM (squares LEFT JOIN images ON images.square_id=squares.square_id WHERE images.square_id=711464)
So you don't select a field from another table - it is more like you create a new temporary table from the statement in brackets (actually having columns from multiple tables) and then perform SELECT on this table.
And yes, ON defines how the tables are related (for instance with foreign key)

SELECT [COLUMNS] --First Line Of Code - say it as 1
FROM --Second Line Of Code -- say it as 2
[Table1] Join [table2] On [Criteria] --say it as 3
Where [Some More Criteria] --Say it as 4
Whenever a
Select Column From
is done it gets the data from 3 which is collection of multiple table or single table.
Once the data is loaded it Executes the where criteria for filtering the data.
After data filtration Select statement will be Executed.
In your case:
Left Join will execute and generate table with 6 columns. based on Join Criteria. Than Where criteria will be executed to filter the data.
Select statement execution will takes place only after that.
Regarding using the Table Name as prefix is to avoid the conflict in Column Name.
P.S : This is for your understanding, actually the data load doesn't happen. To understand exactly how queries are executed you need to understand DB Engine.
As Of Now,Write query and leave the DB Engine for planning.
With Experience Comes The Knowledge.
Happy Coding

Selecting multiple rows based on specific categories (mysql)

I don't think this is a duplicate posting because I've looked around and this seems a bit more specific than whats already been asked (but I could be wrong).
I have 4 tables and one of them is just a lookup table
SELECT exercises.id as exid, name, sets, reps, type, movement, categories.id
FROM exercises
INNER JOIN exercisecategory ON exercises.id = exerciseid
INNER JOIN categories ON categoryid = categories.id
INNER JOIN workoutcategory ON workoutid = workoutcategory.id
WHERE (workoutcategory.id = '$workouttypeid')
AND rand_id > UNIX_TIMESTAMP()
ORDER BY rand_id ASC LIMIT 6;
exercises table contains a list of exercise names, sets, reps, and an id
categories table contains an id, musclegroup, and type of movement
workoutcategory table contains an id, and a more specific motion (ie: upper body push, or upper body pull)
exercisecategory table is the lookup table that contains (and matches the id's) for exerciseid, categoryid, and workoutid
I've also added a column to the exercises table that generates a random number upon entering the row in the database. This number is then updated only for the specified category when it is called, and then sorted and displays the ascending order of the top 6 listings. This generates a nice random entry for me. (Found that solution elsewhere here on SO).
This works fine for generating 6 random exercises from a specific top level category. But I'd like to drill down further. Here's an example...
select all rows inside categoryid 4
then still within the category 4 results, find all that have movementid 2, and then find one entry with a typeid 1, then another for typeid 2, etc
TLDR; Basically there's a few levels of categories and I'm looking to select a few from here and a few from there and they're all within this top level. I'm thinking this could all be executed within more than one query but im not sure how... in the end I'm looking to end with one array of the randomized entries.
Sorry for the long read, its the best explanation I've got.

Just realized I never came back to this posting...
I ended up using several mysql queries within a switch based on what is needed during the request. Worked out perfectly.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008