insert ignore or replace ignore not working - mysql

I'm moving data from one SQL table to a second table using
insert ignore or replace into
I isn't working, I believe because I don't have a unique key. I also don't know where I would put the key.
I need the second table to display the last zone the number was seen on that date. I can add a time column if needed.
Example Data:
number
zone
date
1
zone3
01-02-03
1
zone1
01-02-03
1
zone3
01-02-03
2
zone1
01-02-03
3
zone2
01-02-03
If I put number as a unique key it doesn't get added when the date changes.
If I add date as a unique key only one row gets added on that date.
The query:
REPLACE INTO database.table2 (number,zone,date)
SELECT number,zone,date
FROM database.table1
GROUP BY number,date;
I hoped with the number and date grouped that it won't duplicate record, but it is still adding multiples.

Related

SPSS: How do I generate ID numbers from client ID variable that contains duplicate IDs in the order of the first date of each ID

Previously, I asked how to generate ID numbers from a client ID variable that contains duplicate IDs. I will use the same example data in this question but I would like to know how to generate ID numbers in the order of the first date of each ID. My client ID variable is string and has to remain as string.
My Data looks like:
ClientID TimeStamp
15137.45692 15/03/2021
10489.15789 03/02/2021
14143.96745 01/01/2021
15137.45692 15/01/2021
15137.45692 27/02/2021
14143.96745 08/03/2021
I would like it to look like:
ID ClientID TimeStamp
1 14143.96745 01/01/2021
1 14143.96745 08/03/2021
2 15137.45692 15/01/2021
2 15137.45692 27/02/2021
2 15137.45692 15/03/2021
3 10489.15789 03/02/2021
The previous code I tried was this:
sort cases by ClientID.
compute ID=1.
if $casenum>1 ID=lag(ID)+(ClientID<>lag(ClientID)).
exe.
However, whilst it gave me ID numbers for each ID, those ID numbers weren't ordered by TimeStamp.
In order to create the new ID the data needs to be sorted by ClientID. But then the new IDs will have the same order of the ClientID, while the order you want is not by the ClientID but by the first date of appearance. So first we need to calculate the first date for every ClientID, then we can use that to sort before creating the new ID.
Note: you need to make sure TimeStamp is defined as a date variable.
aggregate outfile=* mode=addvariables /break=ClientID /firstDate=min(TimeStamp).
sort cases by firstDate ClientID.
compute ID=1.
if $casenum>1 ID=lag(ID)+(ClientID<>lag(ClientID)).
exe.

getting data from two non related tables where the date of visit should be followed up after a certain period of time

I have a table “InitialVisit” which records date the user visited for a particular purpose, these visits could be duplicated but can be differentiated by the visit date and purpose, I have another table “SubsequentVisit” which has the subsequent visits after the initial visit, which has data available only one year. The “InitialVisit” Table is historic and has five year worth of data, but is not too large, but the “SubsequentVisit” is very large more than 50M records.
I want to find the subsequent visit by the user after the date left from the second table in one month. The data is collected raw so there is no primary or foreign keys involved
The data snippet is
“InitialVisit”
UserID DateVisit `DateLeft ` `Purpose`
1 `01-01-2016` 02-01-2017 F20
2 23-11-2016 12-12-2016 R43
1 03-03-2016 04-04-2016 F27
3 06-07-2014 09-07-2014 K22
4 09-09-2016 10-09-2016 Y77
5 04-07-2016 02-08-2016 F22
“SubsequentVisit”
UserID SubsequentVisit
1 03-01-2017
1 20-04-2016
2 27-12-2016
I would really appreciate a simple and fast query where I can get the result
UserID
3
4
5
Is there a quicker way to achieve this?
As the tables do not have any keys/indices, you can try adding an index on UserID column, e.g.:
ALTER TABLE InitialVisit ADD INDEX (UserID);
ALTER TABLE SubsequentVisit ADD INDEX (UserID);
To get all the users who do not have any subsequent visits, you can use EXISTS, e.g.:
SELECT UserID
FROM InitialVisit iv
WHERE
iv.DateLeft BETWEEN "2017-01-01" AND DATE_ADD("2017-01-01", INTERVAL 6 MONTHS)
AND NOT EXISTS (
SELECT 1 FROM SubsequentVisit WHERE UserID = iv.UserID
);

SELECT unique values and the associated timestamp without having the timestamp making things unique

I apologize for the poorly worded question. It's best illustrated through an example and what I've come up with so far:
Table "myInfo" has columns:
1. id (PK)
2. key
3. value
4. metaId
Table "meta" has columns:
1. id (PK)
2. metaName
3. metaValue
4. instanceNum
metaId in the "myInfo" table correlates to a instanceNum in the "meta" table. The value of the "value" column changes sometimes over different rows with the metaId. Think of the metaId as a link to a timestamp value in the "meta" table("timestamp" and its value would go into the metaName and metaValue columns respectively).
I want to select the distinct values of the 'value' column in "myInfo". So far I have:
SELECT DISTINCT mi.key, mi.value
FROM myInfo as mi JOIN metadata as meta
WHERE mi.metaId=meta.instanceNum
AND meta.key = 'timestamp'
AND mi.key='maxWeight';
But I ALSO want the timestamps associated with those values. So I want the output to look something like:
key value timestamp
maxWeight 10 tons 15:00:05 2011-01-01
maxWeight 5 tons 08:00:07 2011-10-12
maxWeight 25 tons 13:05:09 2013-08-01
I can't place timestamp as one of the columns in my SELECT because then it will return duplicate mi.attrValue values too since the timestamp makes every row unique. I tried putting the DISTINCT keyword behind only mi.attrValue but I got a MySQL error.
This is completely untested but grouping by a concat might work.
SELECT mi.key, mi.value, meta.value
FROM myInfo as mi JOIN metadata as meta ON mi.metaId = meta.id
WHERE mi.key = 'maxWeight'
AND meta.key = 'timestamp'
GROUP BY CONCAT(mi.key, mi.value)
Although you'll still have the problem of which timestamp is shown. For example, if you have 3 records for a given key/value pair, which record will be shown in your result set?
key value timestamp
maxWeight 10 tons 15:00:05 2011-01-01
maxWeight 10 tons 08:00:07 2011-10-12
maxWeight 10 tons 13:05:09 2013-08-01
The group by query will show just one of these results - but which? You'll need to think about ordering a group (which is a whole new ballgame)
SELECT mi.key, mi.value, min(meta.value)
FROM myInfo as mi JOIN metadata as meta
WHERE mi.metaId=meta.instanceNum
AND meta.key = 'timestamp'
AND mi.key='maxWeight'
group by mi.key,mi.value
Would get you what you want with the earliest timestamp value.

Please help using MySQL to select into where the data to write requires one select statement and the records to be created require another

I have a scenario where I want to select a set of records in a table, then based on a second select write new records into the table from the first select.
The table I wish to select from then write to is:
bullets
-------
id
product_code
catalogue_category_id
bullet_text
sort_sequence
Note id is the key and is an auto incrementing integer
I select the records as follows:
SELECT product_code, catalogue_category_id, bullet_text, sort_sequence
FROM bullets
WHERE product_code = '10001'
For this product this gives 4 rows, the number of bullets may vary from one product to another. The result is:
10001, , Bullet point for testing - 10001,
10001, , 2nd bullet point,
10001, , 3rd bullet point,
10001, , 4th bullet,
In this case the catalogue_category_id and sort_sequence are empty, this will not always be the case.
I then want to select a number of product codes and write, in this case, 4 records one for each bullet point.
The second select statement to get the list of product codes is
SELECT product_code
FROM master
WHERE product_group = '1'
AND product_code != '10001'
This gives 17 product codes back but it could be less or more depending on the product_group being selected.
The new records will comprise:
id - this will be auto incremented
product_code - this will be the new product code from the second select statement
catalogue_category_id - this will be the selected data from the first select statement
bullet_text - this will be the selected data from the first select statement
sort_sequence - this will be the selected data from the first select statement
So in this example I would write a total of 68 new records into bullets, 4 each for each of the 17 product codes.
I think I need a stored procedure to do this but have searched and can't wrap my head around the results I have looked at. Any help much appreciated.
The 68 records will be written to the bullets table as new records. The selected records may have all or any combination of the four fields populated:
product_code, catalogue_category_id, bullet_text, sort_sequence
Essentially I am looking to duplicate the selected records with the exception of the id and the product code. For example say I have 3 product codes,
10002, 10003 & 10004
returned using my second select statement then I would get 12 new records, 3 sets of 4 that are almost identical to the initial 4 from my first SELECT statement, the ID would autoincrement and the product code would beL
10002 for the first 4 new records
10003 for the next 4 new records
10004 for the last 4
I will write whatever is selected for each field. Using your example where 2 of the 4 records selected have data in the catalogue_category_id field then 34 of the 68 new records would have data in the catalogue_category_id field.
As I answered here all you need is CROSS JOIN
INSERT INTO bullets (product_code, catalogue_category_id, bullet_text, sort_sequence)
SELECT m.product_code, b.catalogue_category_id, b.bullet_text, b.sort_sequence
FROM bullets b CROSS JOIN master m
WHERE b.product_code = 10001
AND m.product_group = 3
AND m.product_code <> 10001;
Here is SQLFiddle demo

MySQL - The most occuring for the specific day?

I'm stuck on this problem.
Basically I need to find out for each department how to figure out which days had the most sales made in them. The results display the department number and the date of the day and a department number can appear several times in the results if there were several days that have equally made the most sales.
This is what I have so far:
SELECT departmentNo, sDate FROM Department
HAVING MAX(sDate)
ORDER BY departmentNo, sDate;
I tried using the max function to find which dates occurred most. But it only returns one row of values. To clarify more, the dates that has the most sales should appear with the corresponding column called departmentNo. Also, if two dates for department A has equal amount of most sales then department A would appear twice with both dates showing too.
NOTE: only dates with the most sales should appear and the departmentNo.
I've started mySQL for few weeks now but still struggling to grasp the likes of subqueries and store functions. But i'll learn from experiences. Thank you in advance.
UPDATED:
Results I should get:
DepartmentNo Column 1: 1 | Date Column 2: 15/08/2000
DepartmentNo Column 1: 2 | Date Column 2: 01/10/2012
DepartmentNo Column 1: 3 | Date Column 2: 01/06/1999
DepartmentNo Column 1: 4 | Date Column 2: 08/03/2002
DepartmentNo Column 1: nth | Date Column 2: nth date
These are the data:
INSERT INTO Department VALUES ('1','tv','2012-05-20','13:20:01','19:40:23','2');
INSERT INTO Department VALUES ('2','radio','2012-07-22','09:32:23','14:18:51','4');
INSERT INTO Department VALUES ('3','tv','2012-09-14','15:15:43','23:45:38','3');
INSERT INTO Department VALUES ('2','tv','2012-06-18','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('1','radio','2012-06-18','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('2','batteries','2012-06-18','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2012-06-18','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('1','computer','2012-06-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','dishwasher','2011-06-18','13:20:23','19:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','13:20:57','20:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','11:20:57','20:40:23','4');
INSERT INTO Department VALUES ('1','mobile','2012-05-18','13:20:31','19:40:23','4');
INSERT INTO Department VALUES ('1','mouse','2012-05-18','13:20:34','19:40:23','4');
INSERT INTO Department VALUES ('1','radio','2012-05-18','13:20:12','19:40:23','4');
INSERT INTO Department VALUES ('2','lawnmowerphones','2012-05-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','tv','2012-05-12','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('2','radio','2011-05-23','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('1','batteries','2011-05-21','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2011-05-01','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('3','mobile','2011-05-09','13:20:31','19:40:23','4');
For department1 the date 2012-05-18 would appear because that date occurred the most. And for every department, it should only show the one with the most sales, and if same amount of sales appears on the same date then both will appear, e.g. Department 1 will appear twice with both the dates of max sales.
I've tested the following query based on the table and two columns you've provided along with sample data. So, let me describe it for you. The inner-most "PREQUERY" is doing a count by department and date. The results of this will be pre-ordered by Department first, THEN the highest count in DESCENDING ORDER (so highest sales count is listed FIRST), it doesn't matter what date the count happened.
Next, by utilizing MySQL #variables, I'm pre-declaring two to be used in the query. #variables are like inline programming with MySQL. They can be declared once and then changed as applied to each record being processed. So, I'm defaulting to a bogus department value and a zero sales count.
Now, I'm grabbing the results of the PreQuery (Dept, #Sales and Date), but now, adding a test. If it is the FIRST ENTRY for a given department, use that record's "NumberOfSales" and put into the #maxSales variable and store as a final column name "MaxSaleCnt". The next column name uses the #lastDept and is set to whatever the current record's Department # is. So it can be compared to the next record.
If the next record is the same department, then it just keeps whatever the #maxSales value was from the previous, thus keeping the same first count(*) result for ALL entries on each respective department.
Now, the closure. I've added a HAVING clause (not a WHERE as that restricts what records get tested, but HAVING processes AFTER the records are part of the PROCESSED set. So now, it would have all 5 columns. I am saying ONLY KEEP those records where the final NumberOfSales for the record MATCHES the MaxSaleCnt for the department. If one, two or more dates, no problem it returns them all per respective department.
So, one department could have 5 dates with 10 sales each, and another department has 2 dates with only 3 sales each, and another with only 1 date with 6 sales.
select
Final.DepartmentNo,
Final.NumberOfSales,
Final.sDate
from
(select
PreQuery.DepartmentNo,
PreQuery.NumberOfSales,
PreQuery.sDate,
#maxSales := if( PreQuery.DepartmentNo = #lastDept, #maxSales, PreQuery.NumberOfSales ) MaxSaleCnt,
#lastDept := PreQuery.DepartmentNo
from
( select
D.DepartmentNo,
D.sDate,
count(*) as NumberOfSales
from
Department D
group by
D.DepartmentNo,
D.sDate
order by
D.DepartmentNo,
NumberOfSales DESC ) PreQuery,
( select #lastDept := '~',
#maxSales := 0 ) sqlvars
having
NumberOfSales = MaxSaleCnt ) Final
To clarify the "#" and "~" per you final comment. The "#" indicates a local variable to the program (or in this case and in-line sql variable) that can be used in the query. The '~' is nothing more than a simple string that probability would never exist that of any of your departments, so when it is compared to the first qualified record, does an IF( '~' = YourFirstDepartmentNumber, then use this answer, otherwise use this answer).
Now, how do the above work. Lets say the following is the results of your data returned by the inner-most query, grouped and ordered by the most sales at the top going down... SLIGHTLY altered from your data, lets just assume the following to simulate multiple dates on Dept 2 that have the same sales quantity...
Row# DeptNo Sales Date # Sales
1 1 2012-05-18 3
2 1 2012-06-18 2
3 1 2012-05-20 1
4 2 2012-06-18 4
5 2 2011-05-23 4
6 2 2012-05-18 2
7 2 2012-05-12 1
8 3 2011-06-18 2
9 3 2012-09-14 1
Keep track of the actual rows. The innermost query that finishes as alias "PreQuery" returns all the rows in the order you see here. Then, that is joined (implied) with the declarations of the # sqlvariables (special to MySQL, other sql engines dont do this) and starts their values with the lastDept = '~' and the maxSales = 0 (via assignment with #someVariable := result of this side ).
Now, think of the above being handled as a
DO WHILE WE HAVE RECORDS LEFT
Get the department #, Number of Sales and sDate from the record.
IF the PreQuery Record's Department # = whatever is in the #lastDept
set MaxSales = whatever is ALREADY established as max sales for this dept
This basically keeps the MaxSales the same value for ALL in the same Dept #
ELSE
set MaxSales = the # of sales since this is a new department number and is the highest count
END IF
NOW, set #lastDept = the department you just processed to it
can be compared when you get to the next record.
Skip to the next record to be processed and go back to the start of this loop
END DO WHILE LOOP
Now, the reason you need to have the #MaxSales and THEN the #LastDept as returned columns is they must be computed for each record to be used to compare to the NEXT record. This technique can be used for MANY application purposes. If you click on my name, look at my tags and click on the MySQL tag, it will show you the many MySQL answers I've responded to. Many of them do utilize # sqlvariables. In addition, there are many other people who are very good at working queries, so dont just look in one place. As for any question, if you find a good answer that you find helpful, even if you didn't post the question, clicking on an up-arrow next to the answer helps others indicate what really helped them understand and get resolution to questions -- again, even if its not your question. Good luck on your MySQL growth.
I think this can be achieved with a single query, but my experiences for similar functionality have involved either WITH (as defined in SQL'99) using either Oracle or MSSQL.
The best (only?) way to approach a problem like this is to break in into smaller components. (I don't think your provided statement provides all columns, so I'm going to have to make a few assumptions.)
First, how many sales were made for each day for each group:
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
Next, what's the most sales for each department
SELECT tmp.department, MAX(tmp.dept_count)
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
Finally, putting the two together:
SELECT a.department, a.dept_count, b.sale_date
FROM (
SELECT tmp.department, MAX(tmp.dept_count) AS max_dept_count
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
) AS a
JOIN (
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
) AS b
ON a.department = b.department
AND a.max_dept_count = b.dept_count