I have the following data in TableA...
ID | Text
---------------------------------------------
1 | let's find this document
2 | docments are closed
...and if I do the following select...
select Text from TableA where Text like '%doc%';
...I seem to get a strange result. Both rows are returned. With this select, should it not only return row 1? I would have thought that..
select Text from TableA where Text like 'doc%';
...would have returned just row 2. Am I missing something?
What I'm trying to do is run 3 separate searches across this data as part of my searching tool. The first match is to look for the specified pattern "doc" at the beginning of a string, secondly, my next match looks for the same pattern but at the end of a string, and thirdly, identify if the pattern appears anywhere within the text - so can have text surrounding it. Ideally, the first search would only match row 2, the second search would return no results and the third result would only return row 1.The reason for doing it like this is I wanted to try and get a feel for how the pattern matched the string. Would make it easier to read the results to know that the pattern for a given row matched either (a) at the beginning, (b) at the end, (c) anywhere in the middle.Had thought about using regexp, but my data is unicode.
No, the first query returns both rows, because % means 0 or more characters. So if doc is the first thing appearing in the field, it matches the %doc% pattern as well.
But you're right on the second query, it will only return row 2.
doc_% should match it at the beginning, having at least one character after it.
%_doc should match it at the end, having at least one character before it.
%_doc_% should match it anywhere, having at least one character before and after it.
Note that these strict criteria fail to find the exact string "doc", i.e. with nothing before or after it. You may want to include this case in, say, query #1, by loosening it:
doc% should match it at the beginning, having any number of characters after it.
Related
I have a table with spellings of words. Words are marked up in a such way:
h[e,a]llo
Where the first letter is the correct spelling, and the second is the wrong one. I need to make a selection in such a way that, for example, when "hallo" or "hello" is given, the desired word is found and passed in its original markup. Any ideas?
I am using Below code in my derived column of SSIS to remove title in Name column such as Mr,Mrs,Ms and Dr.
Ex:-
Mr ABC
MS XYZX
Mrs qwrer
DR ADCS
SO I am removing the title of the name.
SUBSTRING( [Name] , 1, 3)=="Mr" && LEN( [Name] ) >2 ? RIGHT([Name],LEN([Name])-2)
But getting Error as incomplete token or invalid statement.
Please help.
any other suggestion to remove the prefixes are also welcome but need to Use transformation.
A different way to think about the problem is that you want to look at the first "word" in the column Name where "word" is the collection of characters from the start of the string to the first space.
String matching in SSIS is case-sensitive so you'll want to force the first word to lower/upper case - however your master list of titles is cased (and then ensure the title list is all lower/upper case).
I am an advocate of making life easier on yourself so I'll add a Derived Column, actually lots of derived columns, that will identify the position of the first space in Name and I'll call this FirstSpace
DER GetFirstSpace
Add a new column, called FirstSpace. The expression we want to use is FINDSTRING
FINDSTRING([Name], " ", 1)
That will return the position of the first instance of a space (or zero if no space was found).
DER GetFirstWord
Add another derived column after the DER GetGetFirstSpace. We need to do this so we can inspect the values we're passing to get the first word. Do it all in a single Derived column and when you get something wrong, you won't be able to debug it and the real cost of development is maintenance. New column, FirstWord will be type DT_WSTR 4000 because that's what happens when you use the string manipulation expressions. I am going to force this to upper case as I'll need it later on.
UPPER(SUBSTRING([Name], 1, [FirstSpace]))
TODO: Verify whether that will be "DR" or "DR " with trailing space as I'm currently coding this by memory.
TODO: What happens if FirstSpace is 0 - we might need to make use of ternary operator ?:
At this point in the data flow, we have a single word in a column named FirstWord what we need to do is compare that to our list of known titles and if it exists, strip it from the original. And that's an interesting problem.
DER GetIsTitleMatched
Add yet another Derived column, this time to solve whether we've matched our list of titles. This will be a boolean type column named IsTitleMatched
[FirstWord] == "DR" || [FirstWord] == "MRS" || [FirstWord] == "MR" || [FirstWord] == "MS"
Following that pattern "FirstWord is exactly equal literal text OR..." when this derived column evaluates, we'll know whether the first word is something to be removed (finally)
DER SetFinalName
Here we're going to add yet another column, NameFinal The magic of stripping out the bad word will be that we use the RIGHT expression starting at the position of that opening space and going to the end of the word. You might need to add a left TRIM in there based on whether the RIGHT operation is inclusive of the starting point or not. Again, free handing at the moment so good but no guarantee of perfection.
(IsTitleMatched) ? RIGHT([Name], [FirstSpace]) : [Name]
I do violate my own rule here as I have a quasi complex expression there in the positive case. Feel free to insert a derived column task that computes the stripped version of the name.
At this point, you've got 4 to 5 derived columns in your data flow but you can add a data viewer between each to ensure you're getting the expected result. You're welcome to solve this in your own manner but that's the easiest approach I can think of.
Lets say I have a search form on my site which generates a query string to filter results, eg. mysite/search?field1=value1&field2=value2&field3=value3
The user enters the following into the search fields:
Field 1 = Cat
Field 2 = Black
Field 3 = Stray
Given the nature of a query string, all three field values would have to be present in an item being searched, in order for it to be recognised as a match.. right?
Is there a way to either make the '&' an 'OR', so that any matching field will return a result.
Or
Is there a way to match the nearest result? Ie. Remove string pairs until a match is found, or in some way, find the next closest result.
For example. If the user enters 'Cat', 'Black' and 'Stray' and there is an item that includes all three values, it returns that result (standard response). If there isn't an item that includes all three values, let's say there's only an item that has 'Cat' and 'Stray', it recognizes there are no items containing all three values, so it looks for two field value matches instead?
Happy to consider any ideas to prevent "no items found" and at least render something rather than nothing.
It depends on what you're using for filtering but general idea is that after you filter the data you check if it is empty and if it is you remove one filter do the filtering again. Something like this for example:
result = filter(data,filters);
while(!result || filters.length > 0){
filters.pop();
result = filter(data, filters);
}
Not working code, just general idea.
I'm trying to merge multiple JSON data sets into one large data set, due to a max limit of 100 on the server I'm pulling them from.
The easiest way to do this would be to eliminate the end of one set and the beginning of the next and replace it with "," so that there would be only one open and close to the entire large set. This is what appears between the last entry of one set and the first entry of the next currently:
],"version":"1.0"}{"error":"OK","limit":100,"offset":100,"number_of_page_results":100,
"number_of_total_results":20235,"status_code":1,"results":[
Again, I need that entire string replaced with just a comma, but the problem I'm encountering is that I had to change the offset between each data set to grab the next 100 entries, so the "offset":100, is different in each string ("offset":200, "offset":300, etc.). I can't seem to get wildcards to cooperate. I suspect it has something to do with all the brackets that are already in the string.
Any help would be appreciated. Thank you.
A regular expression that matches the whole input you provided (provided there's no new line characters) is:
\],"version":"1\.0"\}\{"error":"OK","limit":[0-9]+,"offset":[0-9]+,"number_of_page_results":[0-9]+,"number_of_total_results":[0-9]+,"status_code":[0-9]+,"results":\[
It will get any digits in place off all the numbers in your sample (except version).
OK I have trouble and I dont know where else to look. I have a database with a table called "tasks". In this is a column called "description" with fields involving strings.
These descriptions are pretty long, and each one has this format:
1 2 3
Unique description "But I'd like to remind you..." Unique text
I need to delete the second part from every description to the end of the string.
So I need an sql query that will delete only the part of each field that starts with "But I'd like to remind you..." until the end of the string, leaving ONLY part 1, the unique description.
The SUBSTRING_INDEX function is good for this sort of thing:
SELECT SUBSTRING_INDEX(description, ' "But I''d', 1)
FROM tasks
The second argument is the string you're looking for. The third argument of 1 means return everything before the first occurrence of the string you're looking for.
I've included just the first part of the But I'd like to remind you... text. You can include as much of it as needed to make sure you get a proper hit.
The query above will return the first part of the description. To update the table and remove everything from But I'd like to remind you... to the end, just put the expression into an UPDATE:
UPDATE tasks
SET description = SUBSTRING_INDEX(description, ' "But I''d', 1)