Using sql to keep track of words and their count - mysql

I have a situation where a user enters certain words at a time,say {bat,ball,tennis,car,actor,ping}.I have a database with the following structure
------------------------------
word (PK) | count
------------------------------
ball | 4
cat | 2
gear | 1
|
I want to insert each word into the table .If the word is already present,increment the counter by 1 else insert the word (as it is new) and set its count to 1.
Is it possible using a single query?If yes, how can I do it?

If your word column is truly the primary key, you should be able to do something like this.
INSERT INTO table_name (`word`, `count`) VALUES("ball", 1)
ON DUPLICATE KEY UPDATE `count` = `count` + 1
Pretty straight forward taking advantage of the database to perform the update in the database layer.

Normally I avoid answer that amount to links, but in this case I think the question is really just a two-parted, each of which has been asked here before.
There are two steps you have to do.
You have to split your keywords set into a table of some flavor, Last I knew MySQL did not have a split strings, but how to do it has been asked several times on SO. See Can Mysql Split a column?
Then you can use INSERT...ON DUPLICATE as discussed in "How do I update if exists, insert if not (AKA “upsert” or “merge” in MySQL?"

Related

Mysql Update one column of multiple rows in one query

I've looked over all of the related questions i've find, but couldn't get one which will answer mine.
i got a table like this:
id | name | age | active | ...... | ... |
where "id" is the primary key, and the ... meaning there are something like 30 columns.
the "active" column is of tinyint type.
My task:
Update ids 1,4,12,55,111 (those are just an example, it can be 1000 different id in total) with active = 1 in a single query.
I did:
UPDATE table SET active = 1 WHERE id IN (1,4,12,55,111)
its inside a transaction, cause i'm updating something else in this process.
the engine is InnoDB
My problem:
Someone told me that doing such a query is equivalent to 5 queries at execution, cause the IN will translate to the a given number of OR, and run them one after another.
eventually, instead of 1 i get N which is the number in the IN.
he suggests to create a temp table, insert all the new values in it, and then update by join.
Does he right? both of the equivalency and performance.
What do you suggest? i've thought INSERT INTO .. ON DUPLICATE UPDATE will help but i don't have all the data for the row, only it id, and that i want to set active = 1 on it.
Maybe this query is better?
UPDATE table SET
active = CASE
WHEN id='1' THEN '1'
WHEN id='4' THEN '1'
WHEN id='12' THEN '1'
WHEN id='55' THEN '1'
WHEN id='111' THEN '1'
ELSE active END
WHERE campaign_id > 0; //otherwise it throws an error about updating without where clause in safe mode, and i don't know if i could toggle safe mode off.
Thanks.
It's the other way around. OR can sometimes be turned into IN. IN is then efficiently executed, especially if there is an index on the column. If you have 1000 entries in the IN, it will do 1000 probes into the table based on id.
If you are running a new enough version of MySQL, I think you can do EXPLAIN EXTENDED UPDATE ...OR...; SHOW WARNINGS; to see this conversion;
The UPDATE CASE... will probably tediously check each and every row.
It would probably be better on other users of the system if you broke the UPDATE up into multiple UPDATEs, each having 100-1000 rows. More on chunking .
Where did you get the ids in the first place? If it was via a SELECT, then perhaps it would be practical to combine it with the UPDATE to make it one step instead of two.
I think below is better because it uses primary key.
UPDATE table SET active = 1 WHERE id<=5

Mysql Auto Increment For Group Entries

I need to setup a table that will have two auto increment fields. 1 field will be a standard primary key for each record added. The other field will be used to link multiple records together.
Here is an example.
field 1 | field 2
1 1
2 1
3 1
4 2
5 2
6 3
Notice that each value in field 1 has the auto increment. Field 2 has an auto increment that increases slightly differently. records 1,2 and 3 were made at the same time. records 4 and 5 were made at the same time. record 6 was made individually.
Would it be best to read the last entry for field 2 and then increment it by one in my php program? Just looking for the best solution.
You should have two separate tables.
ItemsToBeInserted
id, batch_id, field, field, field
BatchesOfInserts
id, created_time, field, field field
You would then create a batch record, and add the insert id for that batch to all of the items that are going to be part of the batch.
You get bonus points if you add a batch_hash field to the batches table and then check that each batch is unique so that you don't accidentally submit the same batch twice.
If you are looking for a more awful way to do it that only uses one table, you could do something like:
$batch = //Code to run and get 'SELECT MAX(BATCH_ID) + 1 AS NEW_BATCH_ID FROM myTable'
and add that id to all of the inserted records. I wouldn't recommend that though. You will run into trouble down the line.
MySQL only offers one auto-increment column per table. You can't define two, nor does it make sense to do that.
Your question doesn't say what logic you want to use to control the incrementing of the second field you've called auto-increment. Presumably your PHP program will drive that logic.
Don't use PHP to query the largest ID number, then increment it and use it. If you do your system is vulnerable to race conditions. That is, if more than one instance of your PHP program tries that simultaneously, they will occasionally get the same number by mistake.
The Oracle DBMS has an object called a sequence which gives back guaranteed-unique numbers. But you're using MySQL. You can obtain unique numbers with a programming pattern like the following.
First create a table for the sequence. It has an auto-increment field and nothing else.
CREATE TABLE sequence (
sequence_id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`sequence_id`)
)
Then when you need a unique number in your program, issue these three queries one after the other:
INSERT INTO sequence () VALUES ();
DELETE FROM sequence WHERE sequence_id < LAST_INSERT_ID();
SELECT LAST_INSERT_ID() AS sequence;
The third query is guaranteed to return a unique sequence number. This guarantee holds even if you have dozens of different client programs connected to your database. That's the beauty of AUTO_INCREMENT.
The second query (DELETE) keeps the table from getting big and wasting space. We don't care about any rows in the table except for the most recent one.

Avoid duplicate rows, without reference to keys or indexes?

I have a MySQL table in which each row is a TV episode. It looks like this:
showTitle | season | episode | episodeTitle | airdate | absoluteEpisode
----------------------------------------------------------------------------------------
The X-Files 5 12 Bad Blood 1998-02-22 109
The X-Files 5 13 Patient X 1998-03-01 110
(Where absoluteEpisode is the episode's overall number counting from episode 1.)
It is populated using a Ruby program I wrote which fetches the data from a web service. Periodically, I'd like to run the program again to fetch new episodes. The question then becomes, how do I avoid adding duplicates of the already-existing rows? None of the columns in this table are suitable for use as a primary key or unique field.
I had two ideas. The first was to create a new column, md5, with an MD5 hash of all of those values, and make that a unique column, to prevent two rows with identical data from being added. That seems like it would work, but be messy.
My second was to use this solution from StackOverflow. But I can't quite get that to work. My SQL query is
INSERT INTO `tv`.`episodes` (showTitle,episodeTitle,season,episode,date,absoluteEpisode)
SELECT '#{show}','#{title}','#{y['airdate']}' FROM `tv`.`episodes`
WHERE NOT EXISTS (SELECT * from `tv`.`episodes`
WHERE showTitle='#{show}' AND episodeTitle='#{title}' AND season='#{season_string}' AND episode='#{y['seasonnum']}' AND date='#{y['airdate']}' AND absoluteEpisode='#{y['epnum']}'")
The #{...} bits are Ruby variables. This gets me the obvious error You have an error in your SQL syntax.
Flipping through the books and documentation I can find on the subject, I'm still not sure how to properly execute this query, or if it's not a smart way of solving my problem. I'd appreciate any advice!
why not create a primary key from the showTitle, season, and episode, this will solve the problem because I guess because the episode number can not be duplicate under the same season, and that's apply for the same TV show,
example
x-files==>season 1==>episode 1 this will be primary key as one unit

How to let data "disappear" from database? MySQL

I've got a bit of a stupid question. The thing is my program has to have the function to delete data from my database. Yay, not really the problem. But how can I delete data without the danger that others can see, that there has been something deleted.
User Table:
U_ID U_NAME
1 Chris
2 Peter
OTHER TABLE
ID TIMESTAMP FK_U_D
1 2012-12-01 1
2 2012-12-02 1
Sooooo the ID's are AUTO_INCREMENT, so if I delete one of them there's a gap. Furthermore, the timestamp is also bigger than the row before, so ascending.
I want to let the data with ID 1 disappear from the user's profile (U_ID 1).
If I delete it, there is a gap. If I just change the FK_U_ID to 2 (Peter) it's obvious, because when I insert data, there are 20 or 30 data rows with the same U_ID...so it's obvious that there has been a modification.
If I set the FK_U_ID NULL --> same sh** like when I change it to another U_ID.
Is there any solution to get this work? I know that if nobody but me has access to the database, it's just no problem. But just in case, if somebody controls my program it should not be obvious that there has been modifications.
So here we go.
For the ID gaps issue you can use GUIDs as #SLaks suggests, but then you can't use the native RDBMS auto_increment which means you have to create the GUID and insert it along with the rest of the record data upon creation. Of course, you don't really need the ID to be globally unique, you could just store a random string of 20 characters or something, but then you have to do a DB read to see if that ID is taken and repeat (recursively) that process until you find an unused ID... could be quite taxing.
It's not at all clear why you would want to "hide" evidence that a delete was performed. That sounds like a really bad idea. I'm not a fan of promulgating misinformation.
Two of the characteristics of an ideal primary key are:
- anonymous (be void of any useful information, doesn't matter what it's set to)
- immutable (once assigned, it will never be changed.)
But, if we set that whole discussion aside...
I can answer a slightly different question (an answer you might find helpful to your particular situation)
The only way to eliminate a "gap" in the values in a column with an AUTO_INCREMENT would be to change the column values from their current values to a contiguous sequence of new values. If there are any foreign keys that reference that column, the values in those columns would need to be updated as well, to preserve the relationship. That will likely leave the current auto_increment value of the table higher than the largest value of the id column, so I'd want to reset that as well, to avoid a "gap" on the next insert.
(I have done re-sequencing of auto_increment values in development and test environments, to "cleanup" lookup tables, and to move the id values of some tables to ranges that are distinct from ranges in other tables... that let's me test SQL to make sure the SQL join predicates aren't inadvertently referencing the wrong table, and returning rows that look correct by accident... those are some reasons I've done reassignment if auto_increment values)
Note that the database can "automagically" update foreign key values (for InnnoDB tables) when you change the primary key value, as long as the foreign key constraint is defined with ON UPDATE CASCADE, and FOREIGN_KEY_CHECKS is not disabled.
If there are no foreign keys to deal with, and assuming that all of the current values of id are positive integers, then I've been able to do something like this: (with appropriate backups in place, so I can recover if things don't work right)
UPDATE mytable t
JOIN (
SELECT s.id AS old_id
, #i := #i + 1 AS new_id
FROM mytable s
CROSS
JOIN (SELECT #i := 0) i
ORDER BY s.id
) c
ON t.id = c.old_id
SET t.id = c.new_id
WHERE t.id <> c.new_id
To reset the table AUTO_INCREMENT back down to the largest id value in the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
Typically, I will create a table and populate it from that query in the inline view (aliased as c) above. I can then use that table to update both foreign key columns and the primary key column, first disabling the FOREIGN_KEY_CHECKS and then re-enabling it. (In a concurrent environment, where other processes might be inserting/updating/deleting rows from one of the tables, I would of course first obtain an exclusive lock on all of the tables to be updated.)
Taking up again, the discussion I set aside earlier... this type of "administrative" function can be useful in a test environment, when setting up test cases. But it is NOT a function that is ever performed in a production environment, with live data.

Is it possible to insert a new row at top of MySQL table?

All rows in MySQL tables are being inserted like this:
1
2
3
Is there any way how to insert new row at a top of table so that table looks like this?
3
2
1
Yes, yes, I know "order by" but let me explain the problem. I have a dating website and users can search profiles by sex, age, city, etc. There are more than 20 search criteria and it's not possible to create indexes for each possible combination. So, if I use "order by", the search usually ends with "using temporary, using filesort" and this causes a very high server load. If I remove "order by" oldest profiles are shown as first and users have to go to the last page to see the new profiles. That's very bad because first pages of search results always look the same and users have a feeling that there are no new profiles. That's why I asked this question. If it's not possible to insert last row at top of table, can you suggest anything else?
The order in which the results are returned when there's no ORDER BY clause depends on the RDBM. In the case of MySQL, or at least most engines, if you don't explicitly specify the order it will be ascending, from oldest to new entries. Where the row is located "physically" doesn't matter. I'm not sure if all mysql engines work that way though. I.e., in PostgreSQL the "default" order shows the most recently updated rows first. This might be the way some of the MySQL engines work too.
Anyway, the point is - if you want the results ordered - always specify sort order, don't just depend on something default that seems to work. In you case you want something trivial - you want the users in descending order, so just use:
SELECT * FROM users ORDER BY id DESC
I think you just need to make sure that if you always need to show the latest data first, all of your indexes need to specify the date/time field first, and all of your queries order by that field first.
If ORDER BY is slowing everything down then you need to optimise your queries or your database structure, i would say.
Maybe if you add the id 'by hand', and give it a negative value, but i (and probably nobody) would recommend you to do that:
Regular insert, e.g.
insert into t values (...);
Update with set, e.g.
update t set id = -id where id = last_insert_id();
Normally you specify a auto_incrementing primary key.
However, you can just specify the primary key like so:
CREATE TABLE table1 (
id signed integer primary key default 1, <<-- no auto_increment, but has a default value
other fields .....
Now add a BEFORE INSERT trigger that changes the primary key.
DELIMITER $$
CREATE TRIGGER ai_table1_each BEFORE INSERT ON table1 FOR EACH ROW
BEGIN
DECLARE new_id INTEGER;
SELECT COALESCE(MIN(id), 0) -1 INTO new_id FROM table1;
SET NEW.id = new_id;
END $$
DELIMITER ;
Now your id will start at -1 and run down from there.
The insert trigger will make sure no concurrency problems occur.
I know that a lot of time has passed since the above question was asked. But I have something to add to the comments:
I'm using MySQL version: 5.7.18-0ubuntu0.16.04.1
When no ORDER BY clause is used with SELECT it is noticeable that records are displayed, regardless of the order in which they are added, in the table's Prime Key sequence.