Delete all characters before and after quotation marks - csv

I have a CSV file, which has two columns and 4500 rows. In one column, I have several phrases that are surrounded in quotation marks. I need to delete all the text that comes before and after the quotations marks.
For example:
How would you say "Hello, my Friend" when speaking outside?
should become "Hello, my Friend"
I also have several rows that have the word NULL in the second column. I need these rows deleted in full.
What's the best way of doing something like this? I have been looking at regular expressions, but I'm not sure if they are flexible enough to do what I want to do, or how you would use them on a CSV file (I need the table structure to remain).
EDIT:
1) At the moment I am just using Apple Numbers, but I know that wont don't it, so I am happy to any suggestions. It must support Kanji characters.
2) I have removed all the NULL rows, so that is no longer needed (I simply added a column of numbers, sorted the table so all the NULLs were together, deleted them and the sorted back by the column of numbers).

Find a text editor that supports regular expression search and replace.
Something like this would match ,NULL in the second column: ^.*,NULL.*$. Replace it with "DELETEMEDELETEME" to mark the line, or as an empty string or find a way to have it match on `\n' or '\r' to catch the line break and remove the entire line completely.
Stripping out parts of the quoted string might work like this:
^(.*,){n}(.*)(\".\")(.*)(,.*)$ replaced with \1\3\5 where n is the number of columns preceding the one you want to edit. Repeat (.*,) if that's not available. It will depend on the regex flavor of your tool.

Related

Access - Field Validation Rule - Limit to 1 language

I'm currently trying to place a validation rule on a text field which is supposed to contain several English words as well as numbers and no other languages or characters. I've tried setting the validation rule as:
Is Null or Not Like "*[!a-z]*"
Is Null or Not Like "*[!a-z0-9]*"
Is Null or Not Like "*[!a-z]*" Or Not Like "*[!0-9]*"
Which results in limiting the field to a either a null or a single word. As the field requires several words and numbers none of those solutions were appropriate. I've also tried simply removing the asterisk at the beginning of the block:
Is Null or Not Like "[!a-z]*"
This produces a result that is very close to what I need. However, some foreign (primarily Chinese) characters are showing up in the fields when data is imported.
Is there a reliable way to limit a field to only English words with numbers?
Your third approach is closest. Just add a space to your list of allowed characters:
Is Null or Not Like "*[!a-z0-9 ]*"
Note that character return and line feed characters are disallowed, so importing content with newline characters will fail.

MySql Specific Search - Replace String

I need to search words that contain multiple number prefixes.
Example:
0119
0129
0139
0149
But there is other prefixes, 0155859, 0128889
Etc.
If i search 0%9 it'll come up with all the results i don't want, it'll include the 0155859, 0128889 ones
I need to search and list ONLY the ones that have 0119, etc
How do i do it ?
0XX9 ( Where XX is any strings that matches, so 0119, 0129, etc. % Lists all other characters till a 9 appears, i don't want that. )
I'm trying on my english, correct me if i did'nt expressed myself right !
In a LIKE pattern, the _ character matches any single character. So you can do:
WHERE word LIKE '0__9%'
This matches a word that begins with 0, then any two characters, then 9, then anything after that.
My gut feeling at seeing your question was to consider using REGEXP, which is MySQL's regex matching operator. Try the following query:
SELECT *
FROM yourTable
WHERE word REGEXP '0[0-9][0-9]9'
The pattern used would match any word containing a zero, followed by any two numbers, followed by a 9.

How to replace delimiters from a string in SQL Server

I have the following data
abc
pqr
xyz,
jkl mno
This is one string separated by delimiters like space, new line, comma, tab.
There could be two or more consecutive spaces or tabs or any delimiter after or before a word.
I would like to be able to do the following
Get the individual words removing all leading and trailing delimiters off it
Append the individual words with "OR"
I am trying to achieve this to build a T-SQL query separated by OR clause.
Thanks
I think you can achieve what you need (although I think using a programming language is way better) using just SQL, here is my approach.
Kindly note that I will just handle commas, newlines and multiple-spaces, but you can simple follow using the same technique to remove the rest of your undesired characters
so let's assume that we have a table names ExampleData with a column named DataBefore and another called DataAfter.
DataBefore: has the line value that you want to clean
DataAfter: will host the cleaned text
First we need to trim the preceding & leading space(s) from the text
Update ExampleData
set DataAfter = LTRIM(RTRIM(DataBefore))
Second, we should clean all the commas, and replace them with spaces (doesn't matter if we will end up with many spaces together)
Update ExampleData
set DataAfter = replace(replace(DataAfter,',',' '),char(13),' ')
This is the part in which you may continue and remove any other characters using the same technique, and replace it by a space
So far we have a text that has no spaces before or after, and every comma, newline, TAB, dash, etc character replaced by a space, let's continue our cleaning procedure.
We can now safely move on to replace the spaces between words with just one, this is made by using the following SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' ')
as per your needs, we need to place an OR between each word, this is achievable with this SQL statement:
Update ExampleData
set DataAfter = replace(replace(replace(DataAfter,' ','<>'),'><',''),'<>',' OR ')
we are done now, as a final step that may or may not make a change, we need to remove any space at the end of the whole text, just in case an unwanted character was at the end of the text and as a result got replaced by a space, this can be achieved by the following statement:
Update ExampleData
set DataAfter = RTRIM(DataAfter)
we are now done. :)
as a test, I've generated the following text inside the DataBefore column:
this is just a, test, to be sure, that everything is, working, great .
and after running the previous commands, ended up with this value inside the DataAfter column:
this OR is OR just OR a OR test OR to OR be OR sure OR that OR everything OR is OR working OR great OR .
Hope that this is what you want, let me know if you need any extra help :)

SQL RegEx to handle comma separated IDs

I have a string that denotes which users are allowed to access something. For instance, if user 1, user 2, and user 3 could access it, the accessibility column would contain 1,2,3. If only user 1 could access it, it would only be 1 and so forth.
I know I can't do a simple CONTAINS clause because searching for 1 could return true for 14,2,3. How would I get a regex to accommodate when there is a comma on both sides, on one side, or neither of the ID number?
Here is a sample of what I'm trying to do
DataID: 1
Accessibility: "1,2,3,4,5"
Data: "secret stuff"
DataID: 2
Accessibility: "5,6,7,8,9"
Data: "more secret stuff"
I need to tell the regex to search for a number and to make sure its at the beginning of the string and the end of the string if it has no commas around it, is at the beginning of the string if it only has a comma after it, is at the end of a string if it only has a comma before it, or if it commas on both sides that's fine because it's in the middle of the string.
I know what I need to do, but don't know how to achieve it. Thanks.
First, you have a really bad data structure for several reasons:
The proper way to store lists in SQL is using tables, not strings.
The proper way to store integers in SQL is as integers, not strings.
Ids should be defined with a proper foreign key relationship, which you cannot do when the id is stored in a string.
Sometimes, we are stuck with other people's bad design decisions. That is, we are unable to create a proper junction table, with one column for the DataId and each user who has access to it.
In that situation, you can use the find_in_set() functionality in MySQL. This does not require a regular expression. You can just write:
where find_in_set($user, accessibility) > 0
Since A-Z, 0-9, and underscore are considered word boundaries, you could generalize like this:
-- word-bound DataID, e.g. 1 becomes \b1\b
SELECT '\b' || DataID || '\b' AS DataID_Bound FROM USER
WHERE REGEX_LIKE(DataID_Bound, Accessibility)
That way it doesn't matter if there is a comma leading, trailing, or if it's a sole occupant of the search subject. But it deffinitely cannot match 14 or 21, etc. \b1\b will only match solo 1, \b14\b will only match whole word 14, etc.

How to remove unwanted columns and fields in Notepad++

I have a feed with the following columns:
product_name,description,aw_product_id,store_price,merchant_image_url,merchant_deep_link,merchant_category,merchant_product_id
Each line afterwards has all the information in this order. I only require the product_name for each line, not everything that comes afterwards.
So my question is, how do I remove everything and only keep the product_name?
You could use a regex to replace the comma and everything after it with nothing:
Search: ,.*
Replace: (nothing)
As you want the first column, you can just use regex to extract the data, however things would be a lot more trickier if you wanted a column from the middle.
If that's the case, importing into a spreadsheet program such as Excel as a CSV file will extract all the data into columns which then allows you to highlight that column (or columns) and extract the data as necessary.
You could use the Column mode (ALT + Mouseselect) to select only the part (column) you want.
This could be tricky if the product name length is very unequal.
An other way would be Find+Replace with a clever RegEx. Thats what I would do in your case.
As the product name is the first column, deleting everthing behind the comma should do the trick. So use this regex and replace with an empty string:
Find: ,[\w]*
Replace:
To remove the 6th column from a CSV file:
Find:(.*?)(,.*?)(,.*?)(,.*?)(,.*?)(?:,.*?)(,.*)
Replace:${1}${2}${3}${4}${5}${6}
Search Mode: Regular Expression