I am moving data from Spreadsheets to MySQL.
So we know that in Spreadsheets usually there is no ID, instead, just text.
City;Country;...
New York;USA;...
Berlim;Germany;...
Munich,Germany,...
With that in mind, let's consider two tables:
Country : [ID, name]
City : [ID , country (FK) , name]
I dont want to create several countries with the same name -- but I want to use the existing one. Perfect, so, let's add a FUNCTION in the INSERT state that searches, insert (if needed) and return the Country ID.
So I created a Function to FIRST assess whether the Country exists if not then create a country
getCountry (parameter IN strCountry varchar(100))
BEGIN
SELECT ID INTO #id from `country` WHERE country.country = strCountry ;
IF (#id is NULL OR #id= 0) THEN
INSERT INTO `country` (country) VALUES (strCountry);
if (ROW_COUNT()>0) THEN
SET #id = LAST_INSERT_ID();
else
SET #id = NULL;
END IF;
END IF ;
RETURN #id;
END
And then I have DOZENS OF THOUSANDS of INSERTS such as
INSERT INTO city (name, country) VALUES ('name of the city', getCountry('new or existing one'));
The Function works well when executed alone, such as
SELECT getCountry('Aruba');
However, when I execute that in that VERY LONG SQL (22K+ rows) then it does not work.... it uses basically the latest ID that was created BEFORE starting the execution. Maybe I should "wait" the function execute and return a proper result? But How?
What am I doing wrong?
Instead of function why not use a Stored Procedure, then the procedure will process the checking and insertion.
https://www.mysqltutorial.org/getting-started-with-mysql-stored-procedures.aspx
DELIMITER $$
CREATE PROCEDURE `sp_city_add`(in p_city varchar(100), in p_country varchar(100))
BEGIN
DECLARE country_id INT;
IF (SELECT COUNT(1) FROM country WHERE country.country = p_country) = 0 THEN
INSERT INTO country (country) VALUE (p_country);
SET country_id = LAST_INSERT_ID();
ELSE
SELECT ID INTO country_id FROM country WHERE country.country = p_country;
END IF;
INSERT INTO city (name, country) VALUES (p_city, country_id);
END$$
DELIMITER ;
And if you want to execute a procedure
CALL sp_city_add('Bogota', 'Colombia');
CALL sp_city_add('Phnom Penh', 'Cambodia');
CALL sp_city_add('Yaounde', 'Cameroon');
CALL sp_city_add('Ottawa', 'Canada');
CALL sp_city_add('Santiago', 'Chile');
CALL sp_city_add('Beijing', 'China');
CALL sp_city_add('Bogotá', 'Colombia');
CALL sp_city_add('Moroni', 'Comoros');
You can also add a condition to check if the city and country exists to prevent duplicate entry.
I can't find any documentation of it, but maybe there's a conflict when you do an INSERT in a function that's called during another INSERT. So try splitting them up using a variable:
SELECT #country := getCountry('new or existing one');
INSERT INTO city (name, country) VALUES ('name of the city', #country);
Using the idea of #Barman, PLUS adding COMMIT to each row I could solve that:
SELECT #id := getCountry("Colombia");INSERT into city ( city, country) VALUES ('Bogota',#id);COMMIT;
SELECT #id := getCountry("Colombia");INSERT into city ( city, country) VALUES ('Medelin',#id);COMMIT;
SELECT #id := getCountry("Brazil");INSERT into city ( city, country) VALUES ('Medelin',#id);COMMIT;
SELECT #id := getCountry("Brazil");INSERT into city ( city, country) VALUES ('Sao Paulo',#id);COMMIT;
SELECT #id := getCountry("Brazil");INSERT into city ( city, country) VALUES ('Curitiba',#id);COMMIT;
SELECT #id := getCountry("USA");INSERT into city ( city, country) VALUES ('Boston',#id);COMMIT;
SELECT #id := getCountry("USA");INSERT into city ( city, country) VALUES ('DallaS',#id);COMMIT;
Without the COMMIT at the end of each row, MySQL was not calculating the variable anymore, instead, just throwing some last result it collected.
Related
I have quite a few many-to-many relationships. To simplify the process of inserting data into the respective three tables, I have the below function that I adapt for the various M:M relationships, and it works like a charm. However, for situations when dealing with many new records, I would like to simplify the insert process even further.
At the moment I am using an .xls sheet with columns (and their order of sequence) corresponding to how they are written in the function (ex. surname, fname, email, phone, docutype, year, title, citation, digital, url, call_number, report_no, docu_description)
I then import that .xls incl. data to a new table in the database, and using Navicat's 'Copy as insert statement', and further copy & replacing the function-call to the statement, I end up with function call statements for all records in the table looking similar to this:
SELECT junction_insert_into_author_reportav ('Smith', 'Victoria',
some#email.com, NULL, 'Report', '2010', ' Geographical Place Names',
'Some citation, 'f', 'NULL', 'REP/63', NULL, NULL);
This works okay but I would like to reduce the steps involved even further if possible. For example by being able to pass the newly created table that I imported the .xls sheet into, as a parameter to the function -and then deleting the new table again after the insert statements in the function has run. I am just unsure how to do this, and if at all it is possible?
Here is an example of the function as it looks and works at the moment:
CREATE OR REPLACE FUNCTION junction_insert_into_author_reportav (
p_surname VARCHAR,
p_fname VARCHAR,
p_email VARCHAR,
p_phone TEXT,
p_docutype VARCHAR,
p_year int4,
p_title VARCHAR,
p_citation VARCHAR,
p_digital bool,
p_url TEXT,
p_call_no VARCHAR,
p_report_no VARCHAR,
p_docu_description VARCHAR
) RETURNS void AS $BODY$
DECLARE
v_authorId INT;
v_reportavId INT;
BEGIN
SELECT
author_id INTO v_authorId
FROM
author
WHERE
surname = p_surname
AND fname = p_fname;
SELECT
reportav_id INTO v_reportavId
FROM
report_av
WHERE
title = p_title;
IF
( v_authorId IS NULL ) THEN
INSERT INTO author ( surname, fname, email, phone )
VALUES
( p_surname, p_fname, p_email, p_phone ) RETURNING author_id INTO v_authorId;
END IF;
IF
( v_reportavId IS NULL ) THEN
INSERT INTO report_av ( docu_type, YEAR, title, citation, digital, url, call_number, report_no, docu_description )
VALUES
( p_docutype, p_year, p_title, p_citation, p_digital, p_url, p_call_no, p_report_no, p_docu_description ) RETURNING reportav_id INTO v_reportavId;
END IF;
INSERT INTO jnc_author_reportav
VALUES ( v_authorId, v_reportavId );
END;
$BODY$ LANGUAGE plpgsql VOLATILE COST 100
I need to implement without using cursor. below script is using cursor and it's taking 5 hours for 140k records.
How to improve the performance in sql server?
in the orginal table have over 100k records.
SET NOCOUNT ON
CREATE TABLE #temp (
RecordID int identity,
Address varchar(50),
City varchar(30),
State varchar(5),
GPSLat numeric(9,6),
GPSLong numeric(9,6),
MapURL varchar(255))
INSERT INTO #temp (Address, City, State)
VALUES ('1033 Southwest 152nd Street', 'Burien', 'WA')
INSERT INTO #temp (Address, City, State)
VALUES ('11910 Northeast 154th Street', 'Brush Prairie', 'WA')
INSERT INTO #temp (Address, City, State)
VALUES ('500 SeaWorld Drive', 'San Diego', 'CA')
INSERT INTO #temp (Address, City, State)
VALUES ('1 Legoland Drive', 'Carlsbad', 'CA')
DECLARE curGeo CURSOR LOCAL STATIC FOR
SELECT RecordID, Address, City, State
FROM #temp
DECLARE #RecordID int
DECLARE #Address varchar(50)
DECLARE #City varchar(30)
DECLARE #State varchar(5)
DECLARE #GPSLatitude numeric(9, 6)
DECLARE #GPSLongitude numeric(9, 6)
DECLARE #MapURL varchar(255)
OPEN curGeo
FETCH curGeo INTO
#RecordID,
#Address,
#City,
#State
WHILE ##FETCH_STATUS = 0 BEGIN
BEGIN TRY
EXEC opsstream.sputilGeocode
#Address = #Address OUTPUT,
#City = #City OUTPUT,
#State = #State OUTPUT,
#GPSLatitude = #GPSLatitude OUTPUT,
#GPSLongitude = #GPSLongitude OUTPUT,
#MapURL = #MapURL OUTPUT
UPDATE #temp
SET
GPSLat = #GPSLatitude,
GPSLong = #GPSLongitude,
MapURL = #MapURL
WHERE
RecordID = #RecordID
END TRY
BEGIN CATCH
PRINT 'Warning: RecordID ' + CAST(#RecordID AS varchar(100)) + ' could not be geocoded.'
END CATCH
FETCH curGeo INTO
#RecordID,
#Address,
#City,
#State
END
SELECT * FROM #temp
You have a procedure call in your code, so I'm quite sure the problem is not the cursor, but the row-by-row logic done with the procedure. You might improve the performance of the cursor by defining it as fast_forward, but that might not be noticeable.
You have a procedure that you call, so either you need to change the procedure to accept a table valued parameter, and then of course change your procedure so that it's not a row-by-row operation or change it to a table valued function, but if you use the multi statement one, it's probably not going to improve your performance.
I have the below Stored Procedure:
DELIMITER $$
DROP PROCEDURE IF EXISTS spCashDonation$$
CREATE PROCEDURE spCashDonation(IN fname varchar(50),IN lname varchar(50),IN telNo bigint, IN pmode tinyint,IN amt decimal(8,2), OUT rno varchar(20))
BEGIN
Set #rmain := (select trim(concat('DNB', DATE_FORMAT(CURRENT_DATE(), '%y'), DATE_FORMAT(CURRENT_DATE(), '%m'))));
IF ((trim(DATE_FORMAT(CURRENT_DATE(),'%m')) = 01) OR (trim(DATE_FORMAT(CURRENT_DATE(),'%m')) = 1)) THEN
Set #rpart = 1;
END IF;
IF ((trim(DATE_FORMAT(CURRENT_DATE(),'%m')) != 01) OR (trim(DATE_FORMAT(CURRENT_DATE(),'%m')) != 1)) THEN
Set #rpart := (select coalesce(max(ReceiptPart),0) from Donation) + 1;
END IF;
INSERT INTO Donation (ReceiptMain, ReceiptPart, firstName, lastName, telNo, payMode, Amount) VALUES (#rmain, #rpart, fname, lname, telNo, pmode, amt);
Set #lid := (select LAST_INSERT_ID()from donation);
select concat(ReceiptMain,ReceiptPart) into rno from donation where id = #lid;
END$$
DELIMITER ;
Call spCashDonation ('RAJIV','IYER',7506033048,0,1000,#rno);
select #rno;
When the table has no record, the first insert goes through fine. The upon the second insert it throws an error as
Error Code: 1242. Subquery returns more than 1 row
When I query for the last insert id, I get more than 1 value. So, I modified the last part of the procedure to:
Set #lid := (select max(LAST_INSERT_ID()) from donation);
Please advice, if this is fine as it should not hinder any concurrent inserts and future CRUD operations. Thanks in advance.
Set #lid := (select LAST_INSERT_ID() from donation);
In the above line remove the FROM statement. If more than one record in the Donation table it will return the same number of times the LAST_INSERT_ID() value.
So simply use Set #lid := (SELECT LAST_INSERT_ID()); it will work in your case.
I created the following stored procedure:
CREATE DEFINER=`root`#`localhost` PROCEDURE `add_summit`(IN `assoc_code` CHAR(5), IN `assoc_name` CHAR(50), IN `reg_code` CHAR(2), IN `reg_name` CHAR(100), IN `code` CHAR(20), IN `name` CHAR(100), IN `sota_id` CHAR(5), IN `altitude_m` SMALLINT(5), IN `altitude_ft` SMALLINT(5), IN `longitude` DECIMAL(10,4), IN `latitude` DECIMAL(10,4), IN `points` TINYINT(3), IN `bonus_points` TINYINT(3), IN `valid_from` DATE, IN `valid_to` DATE)
BEGIN
declare assoc_id SMALLINT(5);
declare region_id SMALLINT(5);
declare summit_id MEDIUMINT(8);
-- ASSOCIATION check if an association with the given code and name already exists
SELECT id INTO assoc_id FROM association WHERE code = assoc_code LIMIT 1;
IF (assoc_id IS NULL) THEN
INSERT INTO association(code, name) VALUES (assoc_code, assoc_name);
set assoc_id = (select last_insert_id());
END IF;
-- REGION check if a region with the given code and name already exists
SET region_id = (SELECT id FROM region WHERE code = reg_code AND name = reg_name AND association_id = assoc_id);
IF (region_id IS NULL) THEN
INSERT INTO region(association_id, code, name) VALUES (assoc_id, reg_code, reg_name);
set region_id = (select last_insert_id());
END IF;
-- SUMMIT check if a summit with given parameters already exists
SET summit_id = (SELECT id FROM summit WHERE association_id = assoc_id AND region_id = region_id);
IF (summit_id IS NULL) THEN
INSERT INTO summit(code, name, sota_id, association_id, region_id, altitude_m, altitude_ft, longitude,
latitude, points, bonus_points, valid_from, valid_to)
VALUES (code, name, sota_id, assoc_id, region_id, altitude_m, altitude_ft, longitude, latitude,
points, bonus_points, valid_from, valid_to);
END IF;
END$$
basically, it should check if a record exists in some tables and, if it doesn't, it should insert it and use the inserted id (auto increment).
The problem is that even if the record exists (for instance in the association table), assoc_id keeps returning null and that leads to record duplication.
I'm new to stored procedures so I may be doing some stupid errors. I've been trying to debug this SP for hours but I cannot find the problem.
A newbie mistake.
I forgot to specify the table name in the field comparison and that leads to some conflicts with param names (for example the param name).
A good idea is to specify some kind of prefix for parameters (like p_) and always specify the name of the table in the SP.
I have 3 tables-
1. Country (CountryName, CID (PK- AutoIncrement))
2. State (SID(PK- AutoIncrement), StateName, CID (FK to Country)
3. City (CityName, CID, SID (FK to State)
Now I need to insert only the name into the three tables with CountryName, StateName and CityName.. The IDs need to get updated.
Create PROCEDURE sp_place(
#CountryName char(50),
#StateName varchar(50),
#CityName nchar(20)
)
AS
DECLARE #CountryID int, #StateID int, #CityID int;
Set NOCOUNT OFF
BEGIN TRANSACTION
INSERT INTO dbo.Country VALUES (#CountryName);
SET #CountryID = SCOPE_IDENTITY();
IF ##ERROR <> 0
BEGIN
ROLLBACK
RETURN
END
Insert into dbo.State VALUES (#StateName, #CountryID);
SET #StateID = SCOPE_IDENTITY();
IF ##ERROR <> 0
BEGIN
ROLLBACK
RETURN
END
Insert into dbo.City VALUES (#CityName, #StateID);
SET #CityID= SCOPE_IDENTITY();
Commit
When I Enter Country twice, the value shouldn't get changed.
Eg: If I enter India the value of CountryID=1, when I again enter India, the value of CountryID shouldn't get increased.
How'd I perform that? My SP changes for every insertion.
You can check if the country already exist and retrieve the countryID
IF NOT EXISTS(Select 1 FROM Country Where CountryName=#Country)
BEGIN
INSERT INTO dbo.Country VALUES (#CountryName);
SET #CountryID = SCOPE_IDENTITY();
END
ELSE
Select #CountryID = CountryID From Country Where CountryName=#Country
You can do the same for State and City if required
Hello try with this syntax
IF EXISTS (SELECT * FROM Country WHERE CountryName= #CountryName)
BEGIN
UPDATE dbo.Country
SET CountryName = #CountryName
WHERE CountryId = (SELECT CountryId FROM dbo.Country WHERE CountryName= #CountryName);
END
ELSE
BEGIN
INSERT INTO dbo.Country(CountryName) VALUES (#CountryName);
END
-- For the identity you must just add identity to your column in your creation script
Why dont you set Unique Constraint on CountryName column that won't allow you to insert duplicate countries at all
You need the MERGE syntax
http://technet.microsoft.com/en-us/library/bb510625.aspx
or to check manually (ie: with IF EXISTS (...) ) for the existence of the country before inserting.