Load file into mysql - mysql

I need to load a tab delimited file into mysql database. My database is set up with columns: ID;A;B;C;D;E and my file is a dump of columns ID and D. How can I load this file into my db and replace just columns ID and D, which out changing the values of columns C and E. When I load it in now, columns C and E are changed to DEFAULT:NULL.

I already answered a similar question like this here, but in your case, you'd want to load the csv into a temporary table, then use a simple update SQL statement to update the specific columbs from your temporary table to your production table.

You can update specific column using this command:
UPDATE the_table
SET D=<value-of-D>
WHERE ID=<value-of-ID>
Then run this command for each row in the tab-delimited file, replacing the D and ID values for each.

You can use a stored procedure or a PHP program to do the needful.
For MySQL, the SP would need to open the file using load_file() and storing the value in a variable. Then the program needs to loop through by finding "\n" which stands for new line to get a whole line in a string.
Then the program needs to be find the first tab position using locate() and by using substring() get the first ID col. Then the program needs to find the 4th tab i.e. 3 more tabs by using locate() and its 3rd parameter. This will be the starting position of your D column. Then find the next tab character again using locate() and its 3rd parameter which will give you the end character of the D column. Using substring() get the content of the D column. Use a update command to update the row's D column using ID as the search key in the where clause.
Since the above will loop through all lines you will update all rows of data.

Related

How to make MySQL MATCH...AGAINST use various word separators?

I have a table with 300K string values. These values contain all types of word separators so it looks like this:
id value
1 A B C
2 A B_C
3 A_B-C
4 A-B-C
Let's say I want to find all four rows containing A and B. This query
SELECT * FROM table WHERE MATCH(value) AGAINST('+A +B' IN BOOLEAN MODE);
will return only one row with space separated values:
1 A B C
Is there a way to make MATCH...AGAINST use other word separators? I tried to use LIKE and it was too slow.
You will probably want to alter your app and schema just a little bit to solve this problem. You have two tasks:
Task 1: Transform your existing data
Assuming you need to keep the source data unchanged:
Step 1: Add a field to your schema, "searchFriendly", same datatype as the source data.
Step 2: Write a script to transform the data you already have. Get the whole data set and do string replaces to get spaces.
Step 3: Save that transformed data to the new searchFriendly field.
Task 2: Modify the app so that all future database save/update's on this data, also perform the transformation and save that data as well.
Step 1: Find the part of the app that saves these records.
Step 2: Before actually writing the data to the database, perform the transformation.
Step 3: Add the transformed data to your API call to save/update the record, under the searchFriendly field.

MySQL to CSV - separating multiple values

I have downloaded a MySQL table as CSV, which has over thousand entries of the following type:
id,gender,garment-color
1,male,white
2,"male,female",black
3,female,"red,pink"
Now, when I am trying to create a chart out of this data, it is taking "male" as one value, and "male,female" as a separate value.
So, for the above example, rather than counting 2 "male", and 3 "female", the chart is showing 3 separate categories ("male", "female", "male,female"), with one count each.
I want the output as follows, for chart to have the correct count:
id,gender,garment-color
1,male,white
2,male,black
2,female,black
3,female,red
3,female,pink
The only way I know is to copy the row in MS Excel and adjust the values manually, which is too tedious for 1000+ entries. Is there a better way?
From MySQL command line or whatever tool you are using to send queries to MySQL:
select * from the_table
into outfile '/tmp/out.txt' fields terminated by ',' enclosed by '"'
Then download /tmp/out.txt' from the server and it should be good to go assuming your data is good. If it is not, you might need to massage it with some SQL function use in theselect`.
The csv likely came from a poorly designed/normalized database that had both those values in the same row. You could try using selects and updates, along some built in string functions, on such rows to spawn additional rows containing the additional values and update their original rows to remove those values; but you will have to repeat until all commas are removed (if there is more than one in some field), and will have to determine if a row containing multiple fields with such comma-separated lists need multiplied out (i.e. should 2 gender and 4 color mean 8 rows total).
More likely, you'll probably want to create additional tables for X_garmentcolors, and X_genders; where X is whatever the original table is supposed to be describing. These tables would have an X_id field referencing the original row and a [garmentcolor|gender] value field holding one of the values in the original rows lists. Ideally, they should actually reference [gender|garmentcolor] lookup tables instead of holding actual values; but you'd have to do the grunt work of picking out all the unique colors and genders from your data first. Once that is done, you can do something like:
INSERT INTO X_[garmentcolor|gender] (X_id, Y_id)
SELECT X.X_id, Y.Y_id
FROM originalTable AS X
INNER JOIN valueTable AS Y
ON X.Y_valuelist LIKE CONCAT('%,' Y.value) -- Value at end of list
OR X.Y_valuelist LIKE CONCAT('%,' Y.value, ',%') -- Value in middle of list
OR X.Y_valuelist LIKE CONCAT(Y.value, ',%') -- Value at start of list
OR X.Y_valuelist = Y.value -- Value is entire list
;

How to prevent use of the first row Pandas DataFrame as column names when using to_sql

I have a dataframe loaded from a CSV file, which includes a header row. After assigning the returned dataframe from read_csv, I'm trying to add the rows to a MySQL database table using SQLAlchemy engine, my method call looks like this:
my_dataframe.to_sql(name="my_table",
con=alch_engine,
if_exists="append",
chunksize=50,
index=False,
index_label=None)
However, the table already exists, and the values of the dataframe header don't match the column names, and so I get a MySQL error (1054, "Unknown Column 'Col1' in 'field_list'")
I would like not to use the first row at all and run the insert query without specifying the column names. I have not found a solution for this from the Pandas manual.
Thank you for your help,
AFAIK you cannot do that with .to_sql(). But you can modify the dataframe to match the column names in the table. Provided db_cols is a list/array/series/iterable containing the names, this should do:
(my_dataframe
.rename(columns=dict(zip(df.columns, db_cols)))
.to_sql(name="my_table",
con=alch_engine,
if_exists="append",
chunksize=50,
index=False,
index_label=None))
Old.. but came across this.. far as i know, when you create the dataframe in the first place, you can specify header=None.. then the dataframe has no column names and the first row is treated as data.
i've only used it for excel.. but i assume csv is the same:
my_dataframe = pd.read_csv(full_path, header=None)
Then when you use to_sql, it won't have the column names. it seems then pandas attempts to use numbers as the column names for it's insert statement. I suppose it depends on the db engine to accept that as valid.
ie it generates something like:
INSERT INTO [table] (0, 1) VALUES (%(0)s, %(1)s)
[sorry, not sure how to escape the quote in this comment box to show them around the column names above]
Found out a simple way out for this problem.
First, read the very first line i.e. the header and save it as a list(header_list).
Second, create a Dataframe without skipping any rows. Do not use the names argument.
df = pandas.read_csv(input_file, quotechar='"', skiprows = skip_row_count, nrows = num_of_lines_per_iter)
This will create the table with the first row as Table Header and insert the rest of the rows as data.
Third, if the table exists, create a data frame, this time use the names argument.
df = pandas.read_csv(input_file, quotechar='"', skiprows=skip_row_count, nrows=num_of_lines_per_iter, names = header)
This will ensure the data in the data frame is inserted into the corresponding columns by matching the column names in data frame against the column names in the table.
Finally, you can use skiprows argument to skip the header.

How can a CSV file with counter columns be loaded to a Cassandra CQL3 table

I have a CSV file in the following format
key1,key2,key3,counter1,counter2,counter3,counter4
1,2,1,0,0,0,1
1,2,2,0,1,0,4
The CQL3 table has all value columns of type 'counter'. When I try to use the COPY command to load the CSV I get the usual error which asks for an UPDATE instead of an INSERT.
The question is : how can I tell CQL to use an UPDATE ?
Is there any other way to do this ?
using sstables solved this issue. Although a little slower than what i expected , it does the job
To update a counter column you have to delete it (with Consistency set to ALL) and then insert a new value (same consistency).
So my advice is to use an HashMap in you program and determine which value you want to write to the counter column (oldest, highest, lowest, ...).

Creating variables and reusing within a mysql update query? possible?

I am struggling with this query and want to know if I am wasting my time and need to write a php script or is something like the following actually possible?
UPDATE my_table
SET #userid = user_id
AND SET filename('http://pathto/newfilename_'#userid'.jpg')
FROM my_table
WHERE filename
LIKE '%_%' AND filename
LIKE '%jpg'AND filename
NOT LIKE 'http%';
Basically I have 700 odd files that need renaming in the database as they do not match the filenames as I am changing system, they are called in the database.
The format is 2_gfhgfhf.jpg which translates to userid_randomjumble.jpg
But not all files in the database are in this format only about 700 out of thousands. So I want to identify names that contain _ but don't contain http (thats the correct format that I don't want to touch).
I can do that fine but now comes the tricky bit!!
I want to replace that file name userid_randomjumble.jpg with http://pathto/filename_userid.jpg So I want to set the column user_id in that row to a variable and insert it into my new filename.
The above doesn't work for obvious reasons but I am not sure if there is a way round what I'm trying to do. I have no idea if it's possible? Am I wasting my time with this and should I turn to PHP with mysql and stop being lazy? Or is there a way to get this to work?
Yes it is possible without the php. Here is a simple example
SET #a:=0;
SELECT * FROM table WHERE field_name = #a;
Yes you can do it using straightforward SQL:
UPDATE my_table
SET filename = CONCAT('http://pathto/newfilename_', userid, '.jpg')
WHERE filename LIKE '%\_%jpg'
AND filename NOT LIKE 'http%';
Notes:
No need for variables. Any columns of rows being updated may be referenced
In mysql, use CONCAT() to add text values together
With LIKE, an underscore (_) has a special meaning - it means "any single character". If you want to match a literal underscore, you must escape it with a backslash (\)
Your two LIKE predicates may be safely merged into one for a simpler query