Set sequential number in mysql table only where rows have same value - mysql

I have a table in which a new entree gets a number 0, and a status of unpublished. Users can publish or unpublish rows. When they do, the published row should get a number that's consecutive, but the unpublished rows should be skipped. Like this:
status | number
=============================
unpublished | 0
published | 1
unpublished | 0
unpublished | 0
published | 2
published | 3
unpublished | 0
published | 4
Right now I use:
mysql_query("update albums
join (SELECT #i:=0) t
SET id =(#i:=#i+1)");
When a user publishes something, but that will add consecutive number to all rows.
I need something like the above, but with some sort of WHERE = published statement in it, but I don;t know how.
What solution should I look into?
Many thanks,
Sam

Try an IF in the UPDATE statement:-
UPDATE albums
JOIN (SELECT #i:=0) t
SET id = IF(status='published', #i:=#i+1, 0)
However this is not going to consistently work as I think you want I suspect without an ORDER BY clause (update does support and order clause).
EDIT - further info as requested:-
Albums is a MySQL table. The UPDATE query in MySQL does support the ORDER BY clause (to update records in a particular order), but only for queries where there is only a single table. In this query a sub query is joined to the albums table (ie, JOIN (SELECT #i:=0) t ); even though this is not actually a table it seems MySQL regards it as one and so will not allow an ORDER BY clause in this update.
However #i is a user defined variable and can be initialised by a separate SQL statement. If your query was 2 statements:-
SET #i:=0
UPDATE albums
SET id = IF(status='published', #i:=#i+1, 0)
ORDER BY albums.insert_date;
then that should do it (note, I have just assumed a random column name of insert_date to order the records by).
However many MySQL api's do not support multiple statements in a single query. As #i is a user variable it is related to the connection to the database. As such if you issue one query to initialise it and then a 2nd query (using the same connection) to process the UPDATE it should work.
If you are using php with MySQLi then you can use mysqli-multi-query to perform both in one go.

Kickstarts answer is almost what I need, only if an entree gets published it should always be gets the highest number. Right now it follows the database order, which is by date. Should I integrate an ORDER BY, or is there a different solution?
Thanks

Related

Getting random data from a MySQL database but not repeating data

I have a list of random products(1000's) each with an ID and I am bringing them up randomly. I would like that the items are not repeated. Currently what I am doing is the following:
select * from products where product_id <> previous_product_id order by rand() limit 1;
I am ensuring that the same product will not appear directly after. A repeat product usually appears a lot sooner then I would like (I believe I am right in saying this is the birthday problem). I have no idea what is the most effective way to get random data in a non-repeating fashion. I have thought of a way that I assume is highly inefficent:
I would assign the user an id (e.g. foo) and then when they have seen an item add it to a string that would be product_id_1 AND product_id_2 AND product_id_3 AND product_id_n. I would store this data with timestamp(explained further on).
+--------------------------------------------------------------------------------------------+
|user_id |timestamp | product_seen_string |
|--------------------------------------------------------------------------------------------|
|foo |01-01-14 12:00:00 |product_id_1 AND product_id_2 AND product_id_3 AND product_id_n |
+--------------------------------------------------------------------------------------------+
With this product_seen_string I would keep adding to the seen products (I would also update the timestamp) and then in the query I would do a first query based on the user_id to obtain this string and then add that returned string to the main query that obtains the random products like so:
select * from products where product_id <> product_id_1 AND product_id_2 AND product_id_3 AND product_id_n order by rand() limit 1;
I would also write into that if no products were returned then the data would be deleted so that the cycle can start again. As well as having a cron job that every ten minutes would run to see if the timestamp is older then an hour I would delete it.
The scripting language I am using is PHP
Selecting random rows is always tricky, and there are no perfect solutions that don't involve some compromise. Either compromise performance, or compromise even random distribution, or compromise on the chance of selecting duplicates, etc.
As #Giacomo1968 mentions in their answer, any solution with ORDER BY RAND() does not scale well. As the number of rows in your table gets larger, the cost of sorting the whole table in a filesort gets worse and worse. Giacomo1968 is correct that the query cannot be cached when the sort order is random. But I don't care about that so much because I usually disable the query cache anyway (it has its own scalability problems).
Here's a solution to pre-randomize the rows in the table, by creating a rownum column and assigning unique consecutive values:
ALTER TABLE products ADD COLUMN rownum INT UNSIGNED, ADD KEY (rownum);
SET #rownum := 0;
UPDATE products SET rownum = (#rownum:=#rownum+1) ORDER BY RAND();
Now you can get a random row by an index lookup, without sorting:
SELECT * FROM products WHERE rownum = 1;
Or you can get the next random row:
SELECT * FROM products WHERE rownum = 2;
Or you can get 10 random rows at a time, or any other number you want, with no duplicates:
SELECT * FROM products WHERE rownum BETWEEN 11 and 20;
You can re-randomize anytime you want:
SET #rownum := 0;
UPDATE products SET rownum = (#rownum:=#rownum+1) ORDER BY RAND();
It's still costly to do the random sorting, but now you don't have to do it on every SELECT query. You can do it on a schedule, hopefully at off-peak times.
You might consider a pagination-like solution. Rather than ordering by RAND() (not good form a performance standpoint anyway), why not simply use LIMIT clause and randomize the offset.
For example:
SELECT product_id FROM products ORDER BY product_id LIMIT X, 1
Where X is the offset you want to use. You could easily keep track of the offsets used in the application and randomize amongst the available remaining values.
PHP code to call this might look like this:
if(!isset($_SESSION['available_offsets'] || count($_SESSION['available_offsets']) === 0) {
$record_count = ...; // total number of records likely derived from query against table in question
// this creates array with numerical keys matching available offsets
// we don't care about the values
$available_offsets = array_fill(0, $record_count, '');
} else {
$available_offsets = $_SESSION['available_offsets'];
}
// pick offset from available offsets
$offset = array_rand($available_offsets);
// unset this as available offset
unset($available_offsets[$offset]);
// set remaining offsets to session for next page load
$_SESSION['available_offsets'] = $available_offsets;
$query = 'SELECT product_id FROM products ORDER BY product_id LIMIT ' . $offset . ', 1';
// make your query now
You could try adding a seed to the RAND to avoid repeats
select * from products where product_id <> previous_product_id order by rand(7) limit 1;
From http://www.techonthenet.com/mysql/functions/rand.php :
The syntax for the RAND function in MySQL is:
RAND( [seed] )
Parameters or Arguments
seed
Optional. If specified, it will produce a repeatable sequence of random numbers each time that seed value is provided.
First, unclear what scripting language you will use to piece this together, so will answer conceptually. I also added uppercase to your MySQL for readability. So first let’s look at this query:
SELECT * FROM products WHERE product_id <> previous_product_id ORDER BY RAND() LIMIT 1;
Superficially does what you want it to do, but if your database has thousands of items, RAND() is not recommended. It defeats all MySQL caching & is a resource hog. More details here, specially the area that reads:
A query cannot be cached if it contains any of the functions shown in
the following table.
That’s just not good.
But that said you might be able to improve it by just returning the product_id:
SELECT product_id FROM products WHERE product_id <> previous_product_id ORDER BY RAND() LIMIT 1;
And then you would have to do another query for the actual product data based on the product_id, but that would be much less taxing on the server than grabbing a whole set of randomized data.
But still, RAND() inherently will bog down your system due to lack of caching. And the problem will only get worse as your database grows.
Instead I would recommend having some kind of file-based solution. That grabs a random list of items like this:
SELECT product_id FROM products WHERE product_id <> previous_product_id ORDER BY RAND() LIMIT 0,100;
You will strictly be grabbing product ids and then saving them to—let’s say—a JSON file. The logic being is as random as you want this to be, you have to be realistic. So grabbing a slice of 100 items at a time will give a user a nice selection of items.
Then you would load that file in as an array and perhaps even randomize the array & grab one item off of the top; a quick way to assure that the item won’t be chosen again. I you wish you could shorten the array with each access by re-saving the JSON file. And then when it gets down to—let’s say—less than 10 items, fetch a new batch & start again.
But in general you RAND() is a neat trick that is useful for small datasets. Anything past that should be re-factored into code where you realistically leverage what you want users to see versus what can realistically be done in a scalable manner.
EDIT: Just another thing I noticed in your code on keeping track of product IDs. If you want to go down that route—instead of saving items to a JSON file—that would be fine as well. But what you are describing is not efficient:
+--------------------------------------------------------------------------------------------+
|user_id |timestamp | product_seen_string |
|--------------------------------------------------------------------------------------------|
|foo |01-01-14 12:00:00 |product_id_1 AND product_id_2 AND product_id_3 AND product_id_n |
+--------------------------------------------------------------------------------------------+
A better structure would be to simply store the IDs in a MySQL table like this; let’s call it products_seen:
+----------------------------------------------------+
|user_id |timestamp | product_seen_id |
|----------------------------------------------------|
|foo |01-01-14 12:00:00 |product_id_1 |
|foo |01-01-14 12:00:00 |product_id_2 |
|foo |01-01-14 12:00:00 |product_id_3 |
+----------------------------------------------------+
And then have a query like this
SELECT * FROM products, products_seen WHERE user_id = products_seen.user_id product_id <> product_seen_id ORDER BY RAND() LIMIT 1;
Basically, you would be cross referencing products_seen with products and making sure the products already seen by user_id are not shown again.
This is all pseudo-code, so please adjust to fit the concept.

SQL Query to return most recent row per user

Here is my table structure
So I have these columns in my table:
UserId Location Lastactivity
Let's say that there are 4 results with a UserId of 1 and each location is different. Let's say
index.php, chat.php, test.php, test1.php
.Then there are also timestamps.
Let's also add one more with a UserId of 4 location of chat.php and time of whatever.
Time is in the timestamp format.
I want to get it so that my sql query shows one result from each userid but only the latest one. So in 2 it would show the row which was added to the table most recently. Also I don't want it to show any results that have a lastactivity that was 15 or more minutes ago.
For the example I would just be displaying two rows returned.
Does anyone know what I should do?
I have tried:
SELECT * FROM session WHERE location='chat.php' GROUP BY userid
That returns two results but I believe if there are multiple results for the userid it returns a random one, it also returns results that have a lastactivity of more than 15 minutes.
I am using mysql
------MORE INFO-------
I want to query the database for all the rows where location='chat.php'. I only want one row per userid which is determined by whichever is the most recent submission. Then I also don't want any that are older than 15 minutes. Finally I want to count the number of rows returned and put them into a variable called testVar
Help would be appreciated.
Essentially what you are looking for boils down to you wanting the username and location with the most recent time stamp. So, you want to ignore all the records whose last activity is not the greatest.
Select sum(1) as testVar
From session s
Where location='chat.php'
and datediff(minute, s.lastactivity, getdate()) < 15
and not exists (Select *
From session s2
Where s.userid = s2.userid
and s2.lastactivity > s.lastactivity);
For each record, this query checks to see if there is another record for the same user where the time stamp is more recent. If there is, we want to ignore that record. We only want the rows where a record with more recent activity doesn't exist. It is a little strange to think about it this way, but the logic is equivalent.
By default this query will only grab one row per user, so a group by is not necessary. (This does get a little hairy if the time stamps are exactly the same for two records. In this case, no records will be pulled back)

how does an update work? is it a transaction?

I have this table:
Mytable(ID, IDGroupReference, Model ...)
I have many records in MyTable. The belong to a group, so all the records that belong to the same group has the the same IDGroupReference. IDGroup reference is the ID of one of the records that belong to the group. So all the records of a group has the same IDGroupReference, I can get all the records of the group with a single query:
select * from MyTable where IDGroupReference = 12345;
I can change one record from one group to another, in this case I want to change also all the records of the group too. I mean, I want to merge two groups in one.
In this case I can use this query:
Update Mytable set IDGroupReferencia = myIDReferenceGroup1 where IDGroupReference = IDGroupReferencieGroup2
I set the IDGroupReference of the group 2 with the IDGroupReference of the group one.
My doubt is about the concurrency, when two users try to change the group of two different records. Imagine the I have the group 1 with 10.000 records and tow users. User 1 try to change the record A of the group 1 to group 2 and user 2 try to change the record B from group 1 to group 3.
How the group has many records, 10.000, I think that when I try to update IDGroupReference with the query that I describe above, SQL Server update one by one, and how there are many records, it's is possible that some records are in the group b and other records go are in the group 3, when all of the must be in the same group, in the group 2 or 3, depends of which user is the last to update. But all of the records must be in the same group, not split.
So, when I use the update, how does it work? is a transaction and nobody can update any of the records that will be affected or a second user can update records in the middle of the update of the first user?
I mean:
group 1 with 10 records. User one execute the update. So the steps are:
SQL Server updates record 1.
SQL Server updates record 2
Meanwhile, a second user execute the query.
it is possible that the second user update the record 3 before is update by the query of the first user? Because if this happends, then the group 1 is splitted in two groups, some records go to group 2 and some of them go to group 3.
How can I ensure that all the records of the group 1 go to group 2 or group 3?
Thanks.
The solution is to use the hints of SQL Server. In this link there are more information.
The initial update:
Update Mytable set IDGroupReferencia = myIDReferenceGroup1 where IDGroupReference = IDGroupReferencieGroup2
It ss modify to:
Update Mytable with(tablock) set IDGroupReferencia = myIDReferenceGroup1 where IDGroupReference = IDGroupReferencieGroup2
By default SQL Server, with the update, only block the record that is being updated, but the rest can be modified. So I need to block all the table, to avoid that other update modify records in the middle of other update process.
The use of "with(tablock)" makes that, block the table when an update begins. then search for all the records that match with the where and update it. When the table is block, no other user can select or update records from this table. that is what I need in my particular case.

How to grab most popular rows in table?

I have a table with comments almost 2 million rows. We receive roughly 500 new comments per day. Each comment is assigned to a specific ID. I want to grab the most popular "discussions" based on the specific ID.
I have an index on the ID column.
What is best practice? Do I just group by this ID and then sort by the ID who has the most comments? Is this most efficient for a table this size?
Do I just group by this ID and then sort by the ID who has the most comments?
That's pretty much simply how I would do it. Let's just assume you want to retrieve the top 50:
SELECT id
FROM comments
GROUP BY id
ORDER BY COUNT(1) DESC
LIMIT 50
If your users are executing this query quite frequently in your application and you're finding that it's not running quite as fast as you'd like, one way you could optimize it is to store the result of the above query in a separate table (topdiscussions), and perhaps have a script or cron that runs intermittently every five minutes or so which would update that table.
Then in your application, just have your users select from the topdiscussions table so that they only need to select from 50 rows rather than 2 million.
The downside of this of course being that the selection will no longer be in real-time, but rather out of sync by up to five minutes or however often you want to update the table. How real-time you actually need it to be depends on the requirements of your system.
Edit: As per your comments to this answer, I know a little more about your schema and requirements. The following query retrieves the discussions that are the most active within the past day:
SELECT a.id, etc...
FROM discussions a
INNER JOIN comments b ON
a.id = b.discussion_id AND
b.date_posted > NOW() - INTERVAL 1 DAY
GROUP BY a.id
ORDER BY COUNT(1) DESC
LIMIT 50
I don't know your field names, but that's the general idea.
If I understand your question, the ID indicates the discussion to which a comment is attached. So, first you would need some notion of most popular.
1) Initialize a "Comment total" table by counting up comments by ID and setting a column called 'delta' to 0.
2) Periodically
2.1) Count the comments by ID
2.2) Subtract the old count from the new count and store the value into the delta column.
2.3) Replace the count of comments with the new count.
3) Select the 10 'hottest' discussions by selecting 10 row from comment total in order of descending delta.
Now the rest is trivial. That's just the comments whose discussion ID matches the ones you found in step 3.

mysql recount column in mysql based on rows in secondary table

I took over a database with two tables, lets name them entries and comments.
The entries table contains a column named comment_count which holds the amount of rows with entry_id in comments corresponding to that row in entries.
Lately this connection has become terribly out of sync due to version switching of the codebase. I need help to build a query to run in phpmyadmin to sync these numbers again. The amount of rows in entries is around 8000 and the rows in comments is around 80000 so it shouldn't be any problems to run the sync-query.
Structure:
entries countains:
id | comment_count | etc
comments contains
id | blogentry_id | etc
The only way I can think of is to loop each entry in the entries table with php and update individually but that seems extremly fragile compared to a pure SQL solution.
I'd appriciate for any help!
INSERT
INTO entries (id, comment_count)
SELECT blogentry_id, COUNT(*) AS cnt
FROM comments
GROUP BY
blogentry_id
ON DUPLICATE KEY
UPDATE comment_count = cnt
I think a pure SQL solution would invlolve using a subquery to gather the counts from the comments table having the entries table as the driver. Something like the following should "loop" over the entries table and for each row perform the subquery (that may be the incorrect terminology) and update the comment count to be that of the corresponding counts off of the auxillary table. Hope that helps!
UPDATE entries ent
SET comment_count =
(SELECT COUNT ( * )
FROM comments cmt
WHERE cmt.blogentry_id = ent.id)