Create a Pandas table via string matching - mysql

I have a 3 column table where two columns are URLs and 1 column is a string that might be contained in the urls. The first 100,000 rows can be found at this link:
https://raw.githubusercontent.com/Slusks/LeagueDataAnalysis/main/short_urls.csv
In theory, values in eurl and surl should be the same, and for every value of each, there should be a gameid that matches both, ie:
https://datapoint/Identity1/foobar.com | Identity1 | https://datapoint/Identity1/foobar.com
I've tried some SQL queries on the data and cant get them to line up
SELECT
*
from
table
where
eurl = surl;
since the values started out in different tables, I also tried joining on table1.url = table2.url and that hasn't worked either. It just shows up blank:
SELECT
s.url, e.gameid
FROM
elixerdata e
JOIN
scrapeddata s ON e.url = s.url;
I'm trying to get the gameID's to match up to the surl column and using the eurl column as validation to confirm that it worked correctly.I'm probably not providing enough code or steps to get good feedback but I figure I might as well ask since I am low on ideas myself.
EDIT1:
I cleaned the quotes off by loading the table into python and then re-writing it to a csv with pandas. The data in the csv appears to not have any quotes, then I load it into SQL with the following:
drop table if exists urltable;
create table urltable(
eurl varchar(255),
gameid varchar(20),
surl varchar(255));
LOAD DATA LOCAL INFILE 'csvfile.csv' into table urltable
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
When I read the table in MySQL Workbench there are no quotes, but if I export that table back to a csv, all the quotes are back, only for the surl column though.

Related

Hive imports data from csv to an incorrect columns in table

Below is my table creation and a sample from my csv;
DROP TABLE IF EXISTS xxx.fbp;
CREATE TABLE IF NOT EXISTS xxx.fbp (id bigint, p_name string, h_name string, ufi int, city string, country string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
74905,xxx,xyz,-5420642,City One,France
74993,xxx,zyx,-874432,City,Germany
75729,xxx,yzx,-1284248,City Two Long Name,France
I then load the data into a hive table with the following query:
LOAD DATA
INPATH '/user/xxx/hdfs_import/fbp.csv'
INTO TABLE xxx.fbp;
It seems that there is data leaking from the 5th csv "column" into the 6th column of the table. So, I'm seeing city data in my country column.
SELECT country, count(country) from xxx.fbp group by country
+---------+------+
| country | _c1 |
| Germany | 1143 |
| City | 1 |
+---------+------+
I'm not sure why city data is occasionally being imported to the country column. The csv is downloaded from Google Sheets and I've removed the header.
The reason could be your line termination is not '\n', Windows based tool add additional characters which creates issue. Also may be you have feilds using column separator creating this.
Solution:
1. Try print line which have issue by 'where country = City' clause, this will give you some idea how Hive created the record.
2. Try binary storage format to be 100% sure about data processed by Hive.
Hope it helps.
The issue was within the CSV itself. Some columns, such as p.name contained , in several fields. This would cause a line ending to end sooner than expected. I had to clean the data and remove all ,. After that, it imported correctly. Quickly done with python.
with open("fbp.csv") as infile, open("outfile.csv", "w") as outfile:
for line in infile:
outfile.write(line.replace(",", ""))

mysql update table if record not in temp table

Alright, I have multiple MySQL statements that lead into an issue I'm having updating a particular table. First let me show you my code, then I'll explain what I'm trying to do:
/*STEP 1 - create a temporary table to temporarily store the loaded csv*/
CREATE TEMPORARY TABLE IF NOT EXISTS `temptable1` LIKE `first60dayactivity`;
/*STEP 2. load the csv into the previously created temporary table*/
LOAD DATA LOCAL INFILE '/Users/me/Downloads/some.csv'
IGNORE INTO TABLE `{temptable}`
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
SET CUSTID = 1030,
CREATED = NOW(),
isactive = 1;
/*STEP 3. update first60dayactivity table changing isactive for records that are not in the temptable*/
UPDATE `first60dayactivity` fa
INNER JOIN `temptable1` temp
ON temp.`mid` = fa.`mid`
AND temp.`primarypartnername` = fa.`primarypartnername`
AND temp.`market` = fa.`market`
AND temp.`agedays` = fa.`agedays`
AND temp.`opendate` = fa.`opendate`
AND temp.`CUSTID` = fa.`CUSTID`
SET fa.isactive = IF( temp.`mid` IS NULL, 0, 1 );
/*STEP 4. insert the temp table records into the real table*/
.....blah blah blah.....
Ok, first create a temporary table so that we have a table to hold the imported .csv data. Next, import the .csv data into the temporary table (all this works perfectly so far).
Here is where I run into an issue. I'm wanting to update the isactive column of each record of the first60dayactivity table to 0 if the record is NOT found in temptable1 (after my import). Ultimately, I'm gathering a .csv, the .csv has the new live data that should be considered "active" and I need to set the old data to inactive. So, the update does an INNER JOIN to match on several column to see if the record is found in the temptable1, if it isn't then set the activity to 0, if it is found in temptable1 then ensure the activity status is 1.
The problem here is that all records in first60dayactivity are retaining the 1 property to indicate it is active. Nothing is getting updated to 0 even though I have proof new records exist within temptable1... Can someone tell me what I'm doing wrong in my query?
Thanks in advance!
temp.mid can never be NULL because you use this column in your join condition and you use an INNER JOIN.
Your join (without the insert) should return the matching rows. Using a LEFT JOIN for the update should do what I suppose you want to do.

mysql insert update LOAD DATA LOCAL INFILE

i am using LOAD DATA LOCAL INFILE to load data into temp table mid.then i use a update query to update found in table product.The only matching field in both is the model.
$q = "LOAD DATA LOCAL INFILE 'Mid.csv' INTO TABLE mid
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES
(#col1,#col2,#col3,#col4,#col5,#col6) set model=#col1,price=#col3,stock=#col6 ";
mysql_query($q, $db);
mysql_query('UPDATE mid m, products p set p.products_price= m.price,p.products_quantity= m.stock where p.products_model= m.model');
It works and update the product table.the issue i am having is that there new records in mid table which don't get inserted as i am using the update statement.
I have looked at the insert query and update on duplicate.I have seen loads of examples of when it has to work on one table but none where i have to match it against another table.
Either i am searching for the wrong thing or there is another way to to do this.
i would appreciate any help.
regards
naf
I'm not sure what the other columns in the product table are, but here's a basic approach that should work for you based on the 3 columns in your example, assuming the products_model column is unique in the products table:
insert into products (products_price,products_quantity,products_model)
select price, stock, model
from mid
on duplicate key update
products_price = values(products_price),
products_quantity = values(products_quantity)

Import CSV Pulling One Column Field from Existing Table

I'm learning MySQL and PHP (running XAMPP and also using HeidiSQL) but have a live project for work that I'm trying to use it instead of the gazillion spreadsheets in which the information is currently located.
I want to import 1,000+ rows into a table (tbl_searches) where one of the columns is a string (contract_no). Information not in the the spreadsheet required by tbl_searches includes search_id (PK and is AUTO_INCREMENT) and contract_id. So the only field I am really missing is contract_id. I have a table (tbl_contracts) that contains contract_id and contract_no. So I think I can have the import use the string contract_no to reference that table to grab the contract_id for the contract_no, but I don't know how.
[EDIT] I forgot to mention I have successfully imported the info using HeidiSQL after I exported the tbl_contracts to Excel and then used it the Excel VLOOKUP function but that ended up yielding incorrect data somehow.
You can do it like this
LOAD DATA LOCAL INFILE '/path/to/your/file.csv'
INTO TABLE table1
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' -- or '\r\n' if the file has been prepared on Windows
(#field1, #contract_no, #field2, #field3,...)
SET column1 = #field1,
contract_id = (SELECT contract_id
FROM tbl_contracts
WHERE contract_no = #contract_no
LIMIT 1),
column2 = #field2,
column3 = #field3
...
try something like this: (I am assuming that you have data in tbl_contracts)
<?php
$handle = fopen("data_for_table_searches.csv", "r");
while (($data = fgetcsv($handle,",")) !== FALSE) { // get CSV data from you file
$contract_id = query("SELECT contract_id FROM tbl_contracts WHERE contract_number = " . $data[<row for contract number>]); // whatever is the equivalent in heidi SQL, to get contract id
query("INSERT INTO tbl_searches values($contract_id, data[0], data[1], data[2],...)"); // whatever is the equivalent in heidi SQL, insert data, including contract id into tbl_searches
}
fclose($handle);
?>
Thanks for everyone's input. peterm's guidance helped me get the data imported. Rahul, I should have mentioned that I was not using PHP for this task, but rather just trying to get the data into the tables using HeidiSQL. user4035 asked for more detail and so that's here too.
I have three tables in the database.
tbl_status has two fields, status_ID (AUTO_INCREMENT) and status_name.
tbl_contracts has two columns, contract_ID (AUTO_INCREMENT) and contract_no (a string).
The last table (tbl_searches) will be the active(?) table in that this is where the users' actions will be recorded.
The first two of these tables were easily populated. tbl_status has 11 rows that will describe the status of the contract and these were just typed into an Excel spreadsheet and imported via CSV through HeidiSQL.
For the second table I had 1,000+ "contracts" to import and so I left the first column in Excel blank and the second column containing the string of the contract and imported them the same way.
The third table has seven fields: search_id (AUTO_INCREMENT), contract_id, contract_no, status_id, notes, initials and search_date (I forgot about that one until just now).
I wanted to insert the spreadsheet that had the search information on it into tbl_searches. It has the contract_no, but not the contract_id. I needed to insert the rows and have the query grab the contract_id from tbl_contracts. It took me a bit to get it right without errors and some unexpected results. (The following query omits the need for search_date.)
LOAD DATA LOCAL INFILE '\\\\PATH\\PATH\\PATH\\PATH\\FILENAME.csv'
INTO TABLE `hoa_work`.`tbl_searches`
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES --because the first row of the CSV has column headers
(#search_id, #contract_id, #contract_no, #status_id, #notes, #initials)
SET
search_id = NULL, --is an AUTO_INCREMENT field
contract_id = (SELECT contract_id
FROM tbl_contracts
WHERE contract_no = #contract_no
LIMIT 1),
contract_no = #contract_no,
status_id = #status_id,
notes = #notes,
initials = #initials;
/* Affected rows: 1,011 Found rows: 0 Warnings: 0 Duration for 1 query: 0.406 sec. */
I learned here that the #blah are user variables. If I run the following query it will tell me how the variable is defined. Since I was inserting 1,000+ rows from the CSV file it gave me the answer for the last row that it inserted.
SELECT #contract_no
If you have any suggested improvements on the way I ultimately wrote the query please do tell me.
-Matt

Mysql Load Data for existing column of a table

Initially I have uploaded Using load Data Infile row is having like 100000 Im Using Ubuntu
Example:data
ToneCode....Artist...MovieName...Language
1....................Mj..........Null........... English
3....................AB..........Null........... English
4....................CD.........Null........... English
5....................EF..........Null........... English
But Now I have To update Column MovieName Starting From ToneCode 1 till 100000 row I’m having data in .csv file to update .
Please suggest how to upload the .Csv file for existing table with data
I think the fastest way to do this, using purely MySQL and no extra scripting, would be as follows:
CREATE a temporary table, two columns ToneCode and MovieName same as in your target table
load the data from your new CSV file into that using LOAD DATA INFILE
UPDATE your target table using the INNER JOIN-like syntax that http://dev.mysql.com/doc/refman/5.1/en/update.html describes:
UPDATE items,month SET items.price=month.price WHERE items.id=month.id;
this would “join” the two tables items and month (by using just the “comma-syntax” for an INNER JOIN) using the id column as the join criterion, and update the items.price column with the value of the month.price column.
I Have found a solution as u Guys mentioned above
Soln: example
create table A(Id int primary Key, Name Varchar(20),Artist Varchar(20),MovieName Varchar(20));
Add all my 100000 row using
Load data infile '/Path/file.csv' into table tablename(A) fields terminated by ',' enclosed by'"'
lines terminated by '\n'
(Id,Name,Artist) here movie value is null
create temporary table TA(Id int primary Key,MovieName Varchar(20));
Uploaded data to temporary table TA
Load data infile '/Path/file.csv' into table tablename(A) fields terminated by ',' enclosed by'"'
lines terminated by '\n'(IDx,MovieName)
Now using join as u said
Update Tablename(TA),TableName(A) set A.MovieName=TA.MovieName Where A.Id=TA.Id