I initially had a .csv file, and I imported the data into SQL. It comprises of footballers' data, so each footballer has a football club. Hence, when I create a reference table for the club, it becomes like this, since it reads the football club for each player entry.
id club big_club
1 Arsenal 1
2 Arsenal 1
3 Arsenal 1
......
15 Brighton 0
16 Brighton 0
17 Brighton 0
However, I want
id club big_club
1 Arsenal 1
2 Brighton 0
3 Chelsea 1
4 Everton 0
......
and so on. Currently, I'm thinking of 2 options.
1) Load data and filter directly (most preferred)
2) Load data in first, then update table to find the distinct values
I would like assistance in both. Option 2 sounds rather simple but unfortunately, I only know how to do it from a SELECT DISTINCT standpoint and not an UPDATE standpoint.
For loading data into the table, this is what I have.
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/epldata_final.csv'
INTO TABLE big_clubs
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(#name, #club, #age, #position, #position_cat, #market_value, #page_views, #fpl_value,
#fpl_sel, #fpl_points, #region,
#nationality, #new_foreign, #age_cat, #club_id, #big_club, #new_signing)
SET id=#id, club_id = #club_id, club = #club;
I tried SET club_id = DISTINCT #club_id but that doesn't work.
Would appreciate help / guidance for both methods.
Related
I am currently working on a.Net web form solution which generates a brief service report for admins to monitor the services done by technicians.As of now , i am having some trouble in coming up with an efficient SQL (for MySQl) which return data rows along with the missing rows based on the SertvicePrtNum , which is in order.
For Example :-
This is my raw data in the table :-
Id ServiceRptNum Customer_ID Date of Service
---- ------------- ----------- ---------------
1 1001 3 09/10/1997
2 1003 8 10/06/2005
3 1005 1 21/02/2003
4 1007 7 1/06/2011
5 1010 4 4/11/2012
6 1002 2 16/01/2003
Here the ServiceRptNum , 1004 is missing in the table. So i want the db to return the result as : -
Id ServiceRptNum Customer_ID Date of Service
---- ------------- ----------- ---------------
1 1001 3 09/10/1997
2 1002 2 16/01/2003
3 1003 8 10/06/2005
- 1004 - -
4 1005 1 21/02/2003
- 1006 - -
5 1007 7 1/06/2011
- 1008 - -
- 1009 - -
6 1010 4 4/11/2012
Here , the sql additionally generated 1004,1006,1008,1009 since it cannot find those records.
Please note that the Id is automatically generated (auto_increment)while insert of the data.But the Service ReportNum is not , this is to enable the admin to add the service report later on with the manually generated report Num (report num in the hardcopy of the company Servicebook).
You basically need to invent a constant, sequential stream of numbers and then left join your real data to them. For this method to work, you need a table with enough rows in it to generate a counter big enough:
select ID, 1000+n as servicerptnum, customer_id, `Date of Service` from
(
SELECT #curRow := #curRow + 1 AS n
FROM somebigtable
JOIN (SELECT #curRow := 0) r
WHERE #curRow<100
) numbergen
LEFT JOIN
tablewithmissingservicerptnum
ON
servicerptnum = 1000+n
You need to alter some things in the code above because you never told us the name of your table with missing rptnums. You also need to utilise another table in your database with more rows than this table because the way this method works is to count the rows in the bigger table, giving each a number. If you don't have any table bigger than this one, we can probably get enough rows by cross joining a smaller table to itself or by using this table. Replace somebigtable with thistable CROSS JOIN thistable where this table is the name of the table with missing servicerptnums
If you want just the rows that are missing, add a WHERE servicerptnum is null to the end of the sql
Edit, I see you've changed your numbering from:
1001
1002
...
1009
10010
To:
1009
1010
The join condition used to be servicerptnum = concat('100', cast(n as varchar)), it is now servicerptnum = 1000+n..
Look here for ideas on how to generate a group of continuous integers, then select from that left outer join your table. You should get a row for every number but all the values will be null for the missing numbers.
I am wondering if any of you would be able to help me. I am trying to loop through table 1 (which has duplicate values of the plant codes) and based on the unique plant codes, create a new record for the two other tables. For each unique Plant code I want to create a new row in the other two tables and regarding the non unique PtypeID I link any one of the PTypeID's for all inserts it doesnt matter which I choose and for the rest of the fields like name etc. I would like to set those myself, I am just stuck on the logic of how to insert based on looping through a certain table and adding to another. So here is the data:
Table 1
PlantCode PlantID PTypeID
MEX 1 10
USA 2 11
USA 2 12
AUS 3 13
CHL 4 14
Table 2
PTypeID PtypeName PRID
123 Supplier 1
23 General 2
45 Customer 3
90 Broker 4
90 Broker 5
Table 3
PCreatedDate PRID PRName
2005-03-21 14:44:27.157 1 Classification
2005-03-29 00:00:00.000 2 Follow Up
2005-04-13 09:27:17.720 3 Step 1
2005-04-13 10:31:37.680 4 Step 2
2005-04-13 10:32:17.663 5 General Process
Any help at all would be greatly appreciated
I'm unclear on what relationship there is between Table 1 and either of the other two, so this is going to be a bit general.
First, there are two options and both require a select statement to get the unique values of PlantCode out of table1, along with one of the PTypeId's associated with it, so let's do that:
select PlantCode, min(PTypeId)
from table1
group by PlantCode;
This gets the lowest valued PTypeId associated with the PlantCode. You could use max(PTypeId) instead which gets the highest value if you wanted: for 'USA' min will give you 11 and max will give you 12.
Having selected that data you can either write some code (C#, C++, java, whatever) to read through the results row by row and insert new data into table2 and table3. I'm not going to show that, but I'll show how the do it using pure SQL.
insert into table2 (PTypeId, PTypeName, PRID)
select PTypeId, 'YourChoiceOfName', 24 -- set PRID to 24 for all
from
(
select PlantCode, min(PTypeId) as PTypeId
from table1
group by PlantCode
) x;
and follow that with a similar insert.... select... for table3.
Hope that helps.
I'm building a e-Commerce platform (PHP + MySQL) and I want to add a attribute (feature) to products, the ability to specify (enable/disable) the selling status for specific city.
Here are simplified tables:
cities
id name
==========
1 Roma
2 Berlin
3 Paris
4 London
products
id name cities
==================
1 TV 1,2,4
2 Phone 1,3,4
3 Book 1,2,3,4
4 Guitar 3
In this simple example is easy to query (using FIND_IN_SET or LIKE) to check the availability of product for specific city.
This is OK for 4 city in this example or even 100 cities but will be practical for a large number of cities and for very large number of products?
For better "performance" or better database design should I add another table to table to JOIN in query (productid, cityid, status) ?
availability
id productid cityid status
=============================
1 1 1 1
2 1 2 1
3 1 4 1
4 2 1 1
5 2 3 1
6 2 4 1
7 3 1 1
8 3 2 1
9 3 3 1
10 3 4 1
11 4 3 1
For better "performance" or better database design should I add
another table
YES definitely you should create another table to hold that information likewise you posted rather storing in , separated list which is against Normalization concept. Also, there is no way you can gain better performance when you try to JOIN and find out the details pf products available in which cities.
At any point in time if you want to get back a comma separated list like 1,2,4 of values then you can do a GROUP BY productid and use GROUP_CONCAT(cityid) to get the same.
I have 2 tables with different number of columns, and I need to export the data using SSIS to a text file. For example, I have customer table, tblCustomers; order table, tblOrders
tblCustomers (id, name, address, state, zip)
id name address state zip’
100 custA address1 NY 12345
99 custB address2 FL 54321
and
tblOrders(id, cust_id, name, quantity, total, date)
id cust_id name quantity total date
1 100 candy 10 100.00 04/01/2014
2 99 veg 1 2.00 04/01/2014
3 99 fruit 2 0.99 04/01/2014
4 100 veg 1 3.99 04/05/2014
The result file would be as following
“custA”, “100”, “recordtypeA”, “address1”, “NY”, “12345”
“custA”, “100”, “recordtypeB”, “candy”, “10”, “100.00”, “04/01/2014”
“custA”, “100”, “recordtypeB”, “veg”, “1”, “3.99”, “04/05/2014”
“custB”, “99”, “recordtypeA”, “address2”, “FL”, “54321”
“custB”, “99”, “recordtypeB”, “veg”, “1”, “2.00”, “04/01/2014”
“custB”, “99”, “recordtypeB”, “fruit”, “2”, “0.99”, “04/01/2014”
Can anyone please guild me as how to do this?
I presume you meant "guide", not "guild" - I hope your typing is more careful when you code?
I would create a Data Flow Task in an SSIS package. In that I would first add an OLE DB Source and point it at tblOrders. Then I would add a Lookup to add the data from tblCustomers, by matching tblOrders.Cust_id to tblCustomers.id.
I would use a SQL Query that joins the tables, and sets up the data, use that as a source and export that.
Note that the first row has 6 columns and the second one has 7. It's generally difficult (well not as easy as a standard file) to import these types of header/detail files. How is this file being used once created? If it needs to be imported somewhere you'd be better of just joining the data up and having 10 columns, or exporting them seperately.
Supoose I have the following:
tbl_options
===========
id name
1 experience
2 languages
3 hourly_rate
tbl_option_attributes
=====================
id option_id name value
1 1 beginner 1
2 1 advanced 2
3 2 english 1
4 2 french 2
5 2 spanish 3
6 3 £10 p/h 10
7 3 £20 p/h 20
tbl_user_options
================
user_id option_id value
1 1 2
1 2 1
1 2 2
1 2 3
1 3 20
In the above example tbl_user_options stores option data for the user. We can store multiple entries for some options.
Now I wish to extend this, i.e. for "languages" I want the user to be able to specify their proficiency in a language (basic/intermediate/advanced). There will also be other fields that will have extended attributes.
So my question is, can these extended attributes be stored in the same table (tbl_user_options) or do I need to create more tables? Obviously if I put in a field "language_proficiency" it won't apply to the other fields. But this way I only have one user options table to manage. What do you think?
EDIT: This is what I propose
tbl_user_options
================
user_id option_id value lang_prof
1 1 2 null
1 2 1 2
1 2 2 3
1 2 3 3
1 3 20 null
My gut instinct would be to split the User/Language/Proficiency relationship out into its own tables. Even if you kept it in the same table with your other options, you'd need to write special code to handle the language case, so you might as well use a new table structure.
Unless your data model is in constant flux, I would rather have tbl_languages and tabl_user_languages tables to store those types of data:
tbl_languages
================
lang_id name
1 English
2 French
3 Spanish
tbl_user_languages
================
user_id lang_id proficiency hourly_rate
1 1 1 20
1 2 2 10
2 2 1 15
2 2 3 20
3 3 2 10
Designing a system that is "too generic" is a Turing tarpit trap for a relational SQL database. A document-based database is better suited to arbitrary key-value stores.
Excepting certain optimisations, your database model should match your domain model as closely as possible to minimise the object-relational impedance mismatch.
This design lets you display a sensible table of user language proficiencies and hourly rates with only two inner joins:
SELECT
ul.user_id,
u.name,
l.name,
ul.proficiency,
ul.hourly_rate
FROM tbl_user_languages ul
INNER JOIN tbl_languages l
ON l.lang_id = ul.lang_id
INNER JOIN tbl_users u
ON u.user_id = ul.user_id
ORDER BY
l.name, u.hour
Optionally you can split out a list of language proficiencies into a tbl_profiencies table, where 1 == Beginner, 2 == Advanced, 3 == Expert and join it onto tbl_user_languages.
i'm thinking it's a mistake to put "languages" as an option. while reading your text it seems to me that english is an option, and it might have an attribute from option_attributes.