I have a huge number of data stored in PDF files which I would like to convert into a SQL database. I can extract the tables from the PDF files with some online tools. I also know how to import this into MySQL. BUT:
The list contains users with names, birth dates and some other properties. A user may exist in other PDF files too. So when I'm about to convert the next file into Excel and import it to MySQL, I want to check if that user already exists in my table. And this should be done based on several properties - we may have the same user name, but with different date of birth, that can be a new record. But if all the selected properties match then that specific user would be a duplicate and shouldn't be imported.
I guess this is something I can do with a copy from temporary table but not sure what the selection should be. Let's say user name is stored in column A, date of birth in column B and city in column C. What would be the right script to verify these in the existing table and skip copy if all three match with an existing record?
Thanks!
1- Create a permanent table
Create table UploadData
(
id int not null AUTO_INCREMENT,
name varchar(50),
dob datetime,
city varchar(30)
)
2- Import your data in Excel to your SQL DB. This is how you do it in Sql Server mentioned below, not sure about MySQL but might be something similar. You said you know how to do it already in your question, that's why I am not specifying each step for MySQL
Right-click to your DB, go to Tasks -> Import Data, From: Microsoft Excel, To: Your DB name, Select UploadData table, (check Edit Columns to make sure the columns are matching), finish uploading from Excel to your SQL DB.
3- Check if data exists in your main table, if not, add.
CREATE TEMPORARY TABLE #matchingData (id int, name varchar(50), dob datetime, city (varchar(30))
INSERT INTO #matchingData
select u.id, u.name, u.dob, u.city
from main_table m
inner join UploadData u on u.name = m=name
and u.dob = m.dob
and u.city = m.city
insert into main_table (name, dob, city)
select name, dob, city
from UploadData
where id not in (select id from #matchingData)
4- No need UploadData table anymore. So: DROP TABLE UploadData
Add primary key constraints to Column A, Column B and Column C
It will avoid duplicate rows but can have duplicate values under single column.
Note: There is a limit on maximum number of primary keys in a particular table.
Related
So For example. I have 1 table
and the name of the table is Suppliers
Contains :
1. SupplierName
2. SupplierID
I want to create another new table name Contracts
which contain new columns for
1. ContractID (new column)
2. SupplierID(from "Suppliers" table)
3. ContractValue (new column)
How do i do it?
I have researched and most of them told me to use Create table and then select, But it wont work and also ive tried alter table but still not working.
CREATE TABLE Contracts (
ContractID INT NOT NULL,
SELECT SupplierID
FROM Suppliers,
ContractValue INT NOT NULL,
ContractStart DATE NOT NULL)
These codes are not working so I'm not sure what is the solution.
CREATE TABLE Contracts (
ContractID INT NOT NULL,
(SELECT SupplierID
FROM Suppliers),
ContractValue INT NOT NULL,
ContractStart DATE NOT NULL)
I expect the result to be new table with ContractID (new column), SupplierID (from table Suppliers) and another new column named ContractValue
Think of Select query result set as a table or data grid.
So "SELECT [some fields] FROM [some table]" returns data grid where each row contains some fields from the table.
Therefore you can define table as select query with data OR alternatively specify the structure and create empty table. Most likely you don't want to mix those two approaches.
In your case, SupplierID field of contract table is a reference to SupplierID of Supplier table. In SQL it's called "foreign key". Theoretically you can use select statement in order to create new table and when you play a lot with database queries, you'll choose most convenient and faster way depending on your needs.
But when you start learning, it's better to create an empty table with structure and then insert data using new fields and existing data for the foreign key.
Therefore, the query will be something like:
CREATE TABLE Contracts (
ContractID INT NOT NULL,
SupplierID INT NOT NULL,
ContractValue INT,
ContractStart DATE
);
And then you can insert data using existing values from supplier table:
INSERT INTO Contracts (SupplierID)
SELECT SupplierID FROM Suppliers
Of course this is very simplified description
First, you have to specify ContractID as primary key. Then the query above will work only if you specify primary key as auto increment value, otherwise you have to use some logic and specify it explicitly.
In addition you have to specify default values if you want to use NOT NULL fields.
You can also specify SupplierID as foreign key, so only existing values will be added and some other integrity relationships will be supported.
See any MySQL or SQL documentation for details.
I don't know whether the below way could solve your problem
Make a copy of Suppliers table
Delete unnecessary column from the copied table
Add new column that you want to it.
You can use CTAS command.
CREATE TABLE Contracts as
SELECT
0 as ContractID,
SupplierID,
0 as ContractValue,
now() as ContractStart
FROM Suppliers;
This will create a table with all fields. The default value is to specify the dataType. You can update the table with relevant value or have a join in the select clause itself.
The basic syntax for creating a table from another table is as follows
CREATE TABLE NEW_TABLE_NAME AS
SELECT [ column1, column2...columnN ]
FROM EXISTING_TABLE_NAME
[ WHERE ]
Here, column1, column2... are the fields of the existing table and the same would be used to create fields of the new table.
Example
Following is an example, which would create a table SALARY using the CUSTOMERS table and having the fields customer ID and customer SALARY −
SQL> CREATE TABLE SALARY AS
SELECT ID, SALARY
FROM CUSTOMERS;
last week I did, as you want to do.
Only two steps I was followed:
Export existing table.
Open in notepad++ and change the existing table name, add my new columns and Import.
Thanks
I've a table with suppliers what has the following structure:
Pay attention to provider field. It's a VARCHAR now. I've got to support this system from another developer and now we need to have the list of providers and store additional info so I created another table for storing providers actually.
It has the following structure: id, name , margin, outer_name, etc.
I plan to change type of provider to INT(32) and it will point to provider table.
The problem is that MySQL doesn't support transactions for changing the structure of database.
If I changed type of field from string to integer I lose all previous data. IF something goes wrong in the middle. I'm lost.
Would it be ok dump data to file using serialisation and reading them from there?
Are there any better ways to do it?
Please follow the below steps to migrate data to new table and alter column.
1. Insert all data into new table (Provider Table)
INSERT INTO providerTable (NAME)
SELECT DISTINCT provider
FROM suppliers;
2. Update providerID into Main (Supplier) table
UPDATE suppliers s
INNER JOIN providerTable p ON s.provider = p.name
SET s.provider = p.id;
Before altering the table please verify your data into supplier table.
3. Then alter column datatype of Main (Supplier) table
ALTER TABLE suppliers CHANGE provider provider INT(4) NOT NULL
Using this approach you don't need to take backup of table. You won't loss any data.
I'm using MySQL with Sequel Pro on a Mac OS X. I would like to copy one field (i.e., a column called "GAME_DY") from one table into an empty field of another table called "DAY_ID". Both tables are part of the same database. I've searched the answers to similar/same questions, but cannot find an answer with manipulable code that works.
Database: retrosheet
Table 1 (Existing table called "game") from which I would like to copy field "GAME_DY"
Fields:
GAME_ID,
YEAR_ID,
GAME_DT,
GAME_CT,
GAME_DY,
START_GAME_TM,
....
Table 2 (existing Table called "starting_pitcher_game_log") to which I would like to copy to field "DAY_ID":
Fields:
PIT_ID,
GAME_ID,
W,
L,
IP,
BFP,
H,
R,
DAY_ID,
DATE
....
I want to copy "GAME_DY from TABLE 1 into "DAY_ID" in TABLE 2.
Can this be done using MySQL queries?
If you want to update existing records in the starting_pitcher_game_log field, you could use SQL like this:
UPDATE starting_pitcher_game_log SET DAY_ID = (
SELECT GAME_DY FROM game, starting_pitcher_game_log
WHERE game.GAME_ID = starting_pitcher_game_log.GAME_ID
)
I'm copying data from one database table to another database table. This essentially is copying the data from our old format to our new format. So, in addition to simply copying columns value-for-value, I also need to do some conversions in the copy statements as well.
For, example, here is what I have to do the copying...
INSERT INTO new_database.table1 (id, product, is_default)
SELECT id, product, is_default FROM old_database.table1
The id and the product are working fine. But, in this example, the old_database stored "is_default" as a VARCHAR(1), either 'Y' or 'N'. The new_database stores "is_default" as a BOOLEAN.
How can I do the conversion between formats within the INSERT SELECT statement I'm already using?
Try like below:
INSERT INTO new_database.table1 (id, product, is_default)
SELECT id, product, IF(is_default='Y',1,0) as isdefault FROM old_database.table1
I have a contact management system and an sql dump of my contacts with five or six columns of data in it I want to import into three specific tables. Wondering what the best way to go about this is. I have already uploaded the sql dump...its a single table now in in my contact management database.
The tables in the crm require in the companies table only the contactID...and in the songs table:
companyID,
contactID,
date added (not required) and
notes (not required)
Then there is the third table, the contact table which only requires contactname.
I have already uploaded data to each of the three tables (not sure if my order is correct on this) but now need to update and match the data in the fourth table (originally the sql dump) with the other three and update everything with its unique identifier.
Table Structures:
+DUMP_CONTACTS
id <<< I dont need this ID, the IDs given to each row in the CRM are the important ones.
contact_name
company
year
event_name
event_description
====Destination Tables====
+++CONTACTS TABLE++
*contactID < primary key
*contact_name
+++COMPANIES TABLE+++
*companyID < primary key
*name
*contact_ID
*year
++++Events++++
*EventID < primary key
*companyID
*contactID
*eventname
*description
There are parts of your post that I still don't understand, so I'm going to give you SQL and then you can run them in a testing environment and we can take it from there and/or go back and start again:
-- Populate CONTACTS_TABLE with contact_name from uploaded dump
INSERT INTO CONTACTS_TABLE (contact_name)
SELECT contact_name FROM DUMP_CONTACTS
-- Populate COMPANIES with information from both CONTACTS_TABLE + dump
INSERT INTO COMPANIES (name, contact_ID, year)
SELECT d.company, c.contactID, d.year
FROM DUMP_CONTACTS AS d
INNER JOIN CONTACTS_TABLE AS c
ON d.contact_name = c.contact_name
-- Populate SONGS_TABLE with info from COMPANIES
INSERT INTO SONGS_TABLE (companyID, contactID)
SELECT cm.companyID, cm.contact_ID
FROM COMPANIES AS cm
-- Populate Events with info from COMPANIES + dump
INSERT INTO Events (companyID, contactID, eventname, description)
SELECT cm.companyID, cm.contact_ID, d.event_name, d.event_description
FROM DUMP_CONTACTS AS d
INNER JOIN COMPANIES AS cm
ON d.company = cm.name
I first populate CONTACTS_TABLE and then, since the contactID is required for records in COMPANIES, insert records from CONTACTS_TABLE joining the dump. SONGS_TABLE takes data directly from COMPANIES, and lastly the Events gets its data by joining COMPANIES and the dump.