How to remove unwanted columns and fields in Notepad++ - csv

I have a feed with the following columns:
product_name,description,aw_product_id,store_price,merchant_image_url,merchant_deep_link,merchant_category,merchant_product_id
Each line afterwards has all the information in this order. I only require the product_name for each line, not everything that comes afterwards.
So my question is, how do I remove everything and only keep the product_name?

You could use a regex to replace the comma and everything after it with nothing:
Search: ,.*
Replace: (nothing)

As you want the first column, you can just use regex to extract the data, however things would be a lot more trickier if you wanted a column from the middle.
If that's the case, importing into a spreadsheet program such as Excel as a CSV file will extract all the data into columns which then allows you to highlight that column (or columns) and extract the data as necessary.

You could use the Column mode (ALT + Mouseselect) to select only the part (column) you want.
This could be tricky if the product name length is very unequal.
An other way would be Find+Replace with a clever RegEx. Thats what I would do in your case.
As the product name is the first column, deleting everthing behind the comma should do the trick. So use this regex and replace with an empty string:
Find: ,[\w]*
Replace:

To remove the 6th column from a CSV file:
Find:(.*?)(,.*?)(,.*?)(,.*?)(,.*?)(?:,.*?)(,.*)
Replace:${1}${2}${3}${4}${5}${6}
Search Mode: Regular Expression

Related

transpose a table using talend

I would like to transpose a table like the one below:
into this:
I wanted to mention that the files are CSV files.
Thanks,
There is a solution to this, but it's inelegant and inefficient and may not work incase of a huge dataset (may run out of memory).
You can denormalise the whole input, by defining all the schema columns in the tDenormalize component, pass it across to a tMap to concatenate all the columns using a special character in between. The special character is just an identifier for the next component we are going to use.
Connect the tMaps output to a tNormalize and use the special character as the Item Seperator while the column to normalise should be the only column (which you concatenated to in the previous tMap) available.
This should do what you're looking for. If you wish to process the data after this instead of just transposing, you can use tExtractDelimitedFields component and use the "," as your field Seperator since its a csv.

MySQLAdmin replace text in a field with percent in text

Using MySQLAdmin. Moved data from Windows server and trying to replace case in urls but not finding the matches. Need slashes as I don't want to replace text in anything but the urls (in post table). I think the %20 are the problem somwhow?
UPDATE table_name SET field = replace(field, '/user%20name/', '/User%20Name/')
The actual string is more like:
https://www.example.com/forum/uploads/user%20name/GFCI%20Stds%20Rev%202006%20.pdf
In a case you are using MariaDB you have REGEXP_REPLACE() function.
But best approach is to dump the table into the file. Open it in a Notepad ++
and run regex replace like specified on a pic:
Pattern is: (https:[\/\w\s\.]+uploads/)(\w+)\%20(\w+)((\/.*)+)
Replace with: $1\u$2\%20\u$3$4
Then import the table again
Hope this help
If its MariaDB, you can do the following:
UPDATE table_name SET field = REGEXP_REPLACE(field, '\/user%20name\/', '\/User%20Name\/');
First, please check, what is actually stored in the database: %20 is a html-entity which represents a whitespace. Usually, when you are storing this inside the database, it will be represented as an actual whitespace (converted before you store it) -> Hence your replace doesn't match the actual data.
The second option that might be possible - depending on what you want to do: You are seeing the URL containing %20, therefore you created your database records (which you would like to fetch) with that additional %20 - And when you now try to query your results based on the actual url, the %20 is replaced with an "actual" whitespace (before your query) and hence it doesn't match your stored data.

How to replace delimiters from a string in SQL Server

I have the following data
abc
pqr
xyz,
jkl mno
This is one string separated by delimiters like space, new line, comma, tab.
There could be two or more consecutive spaces or tabs or any delimiter after or before a word.
I would like to be able to do the following
Get the individual words removing all leading and trailing delimiters off it
Append the individual words with "OR"
I am trying to achieve this to build a T-SQL query separated by OR clause.
Thanks
I think you can achieve what you need (although I think using a programming language is way better) using just SQL, here is my approach.
Kindly note that I will just handle commas, newlines and multiple-spaces, but you can simple follow using the same technique to remove the rest of your undesired characters
so let's assume that we have a table names ExampleData with a column named DataBefore and another called DataAfter.
DataBefore: has the line value that you want to clean
DataAfter: will host the cleaned text
First we need to trim the preceding & leading space(s) from the text
Update ExampleData
set DataAfter = LTRIM(RTRIM(DataBefore))
Second, we should clean all the commas, and replace them with spaces (doesn't matter if we will end up with many spaces together)
Update ExampleData
set DataAfter = replace(replace(DataAfter,',',' '),char(13),' ')
This is the part in which you may continue and remove any other characters using the same technique, and replace it by a space
So far we have a text that has no spaces before or after, and every comma, newline, TAB, dash, etc character replaced by a space, let's continue our cleaning procedure.
We can now safely move on to replace the spaces between words with just one, this is made by using the following SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' ')
as per your needs, we need to place an OR between each word, this is achievable with this SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' OR ')
we are done now, as a final step that may or may not make a change, we need to remove any space at the end of the whole text, just in case an unwanted character was at the end of the text and as a result got replaced by a space, this can be achieved by the following statement:
Update ExampleData
set DataAfter = RTRIM(DataAfter)
we are now done. :)
as a test, I've generated the following text inside the DataBefore column:
this is just a, test, to be sure, that everything is, working, great .
and after running the previous commands, ended up with this value inside the DataAfter column:
this OR is OR just OR a OR test OR to OR be OR sure OR that OR everything OR is OR working OR great OR .
Hope that this is what you want, let me know if you need any extra help :)

Delete all characters before and after quotation marks

I have a CSV file, which has two columns and 4500 rows. In one column, I have several phrases that are surrounded in quotation marks. I need to delete all the text that comes before and after the quotations marks.
For example:
How would you say "Hello, my Friend" when speaking outside?
should become "Hello, my Friend"
I also have several rows that have the word NULL in the second column. I need these rows deleted in full.
What's the best way of doing something like this? I have been looking at regular expressions, but I'm not sure if they are flexible enough to do what I want to do, or how you would use them on a CSV file (I need the table structure to remain).
EDIT:
1) At the moment I am just using Apple Numbers, but I know that wont don't it, so I am happy to any suggestions. It must support Kanji characters.
2) I have removed all the NULL rows, so that is no longer needed (I simply added a column of numbers, sorted the table so all the NULLs were together, deleted them and the sorted back by the column of numbers).
Find a text editor that supports regular expression search and replace.
Something like this would match ,NULL in the second column: ^.*,NULL.*$. Replace it with "DELETEMEDELETEME" to mark the line, or as an empty string or find a way to have it match on `\n' or '\r' to catch the line break and remove the entire line completely.
Stripping out parts of the quoted string might work like this:
^(.*,){n}(.*)(\".\")(.*)(,.*)$ replaced with \1\3\5 where n is the number of columns preceding the one you want to edit. Repeat (.*,) if that's not available. It will depend on the regex flavor of your tool.

Is there a way to run multiple deletes with a csv list?

I have a list of about 2,300K rows of bad data that I'd like to delete from my database. Is there a way that I can delete all of these rows using a single sql statement? I can 'WHERE IN' but the issue is that these values are not quoted and fail. Thanks
Use SQL and a little editor regex-fu:
Use excel or whatever to to get the list of keys you want to delete.
Copy that list into your favorite text editor. (Ultraedit, editplus, notepad++, heck even pfe)
Search replace string: \n => ',' (newline becomes quote comma quote)
Add a quote to the beginning and end of the list, surround by parenthesis, and stick in your WHERE IN clause.
Good to go.
CSV? Could you instead make a list of the good rows, drop the table, and insert the good rows?