Find formula produces negative number - function

Background:
I am using census data from 1999 that is in a text file format. There is not a distinct delimiter being used in the file and excel does not import the data correctly. Excel does separate the rows correctly, however, all of the column data is in a single cell.
A sample cell of data is:
01 07000 Birmingham city, AL 249,459 265,940 -16,481 -6.2 1 51 67
I attempted to use the FIND formula to find the word "city", to test the formula, and the result is a negative number. Specifically, the number is "-21".
I have looked in a few excel references and I have found nothing to explain this. Any insight is greatly appreciated.

Related

Replace entire row based on duplicates columns in csv file

I have two csv files, each of which has two columns. File A is the master file which contains the order of the items, which is important. File B has some (but not all) updated information that needs to replace the old information in file A.
How do I replace the old values in column 2 of file A with the new values from column 2 of file B, but only where the values in column 1 are duplicates?
For example:
File A
Name
Number
Bob Smith
12
Mary West
67
Joe Soap
77
Edith Little
41
File B
Name
Number
Mary West
83
Edith Little
16
Desired result
Name
Number
Bob Smith
12
Mary West
83
Joe Soap
77
Edith Little
16
I feel like there should be a simple solution to this that I'm just missing, but I haven't had any luck with searching for a method.
Edit:
I attempted to solve the problem using replace duplicates in google sheets, which resulted in the correct values, but the order was lost. I ran up against the same problem using Sublime Text in that I can keep the new values quite easily, but I can't seem to find a way to keep them in the position of the old values.
Try the following
=INDEX(IFNA({Q2:Q7,IFERROR(VLOOKUP(Q2:Q7,T2:U5,2,0),R2:R9)}))
(Do adjust the formula according to your ranges and locale)

How to Split the value and find the longest text string in google sheet

I have a column in google sheet in which each cell contains this type of text
manager, finance manager
accountant
accountant, chief accountant
manager, auditor, other, finance manager
accountant
I want to find the longest text like below and show that text into new cell
finance manager
accountant
chief accountant
finance manager
accountant
I used split function to split the text and find function but finding character or number of character is working but i am unable to understand how i will get whole longest word
Kindly help me.
An alternative would be to use
=FILTER(TRIM(SPLIT(A1, ",")), LEN(SPLIT(A1, ",")) = MAX(LEN(SPLIT(A1, ","))))
and fill down as far as needed.
Suppose your comma-separated lists reside in A2:A. Place the following in, say, B2 of an otherwise empty range B2:B ...
=ArrayFormula(IF(A2:A="",,TRIM(REGEXEXTRACT(A2:A,"[^,]{"&REGEXEXTRACT(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IF(REGEXMATCH(A2:A,"[^,]{"&SEQUENCE(1,30,30,-1)&"}")=FALSE,,SEQUENCE(1,30,30,-1))),,30))),"\S+")*1&"}"))))
This is a complex formula, one that would be difficult to explain. So I will leave it to you (and others who may be interested) to dissect, analyze and understand the inner workings. However, if there is a specific question I can answer should you (or others) get stuck, feel free to ask.
In short, the formula checks to see if there are any REGEX matches for non-comma groupings of 30 characters in length, then 29, 28 and so on to 1. If so, that number is returned in a space-separated list. The first number (which will be the highest) is returned and used to extract a non-comma REGEX expression of that exact length (which then has any leading or trailing spaces removed from it).
if your text is stored in column A, you can use this formula:
=hlookup(max(arrayformula(len(trim(split(A1,","))))), {arrayformula(len(trim(split(A1,","))));arrayformula(trim(split(A1,",")))},2,False)
Second option is to write a custom function.

Is there a way to use one worksheet as database to create a code based on the date?

I am currently working on a spreadsheet formula where 2 different codes would be generated. Here is the algorithm for the "code" to start with., but I don't know how to construct a proper excel function for it.
There are 10 digits to the code where the first 8 digits are just the date i.e. 20210328_ _
The final 2 digits are dependent on the previous records whether there are records with the same date. If so it would assign a two-digit number starting from 1 to differentiate the different records.
I have tried to use the below formula to achieve what I want but the part where it references the other spreadsheet is bothering me as I need it to be a flexible value where the value is referring to the last row of the spreadsheet. Is there a way to work around this without scripts? I am planning to deploy it on Google Sheets so App scripts solutions would also be workable but not preferable.
=IF(DAY(B2)=RIGHT(Data!A114,2),Data!A114+1,CONCATENATE(YEAR(TODAY()),TEXT(B2,"MM"),DAY(TODAY()),"01"))
FYI B2 is the date of input and Data!A114 is the part where I concern.
Here's what I came up with.
Formula(D3)=IF((TO_PURE_NUMBER(Concatenate(YEAR(A3), TEXT(A3,"MM"),DAY(A3))) - TO_PURE_NUMBER(Concatenate(YEAR(A2),TEXT(A2,"MM"),DAY(A2)))), (TO_PURE_NUMBER(CONCATENATE(TO_PURE_NUMBER(Concatenate(YEAR(A3), TEXT(A3,"MM"),DAY(A3))), "00"))) ,(D2+1))
The data for the dates starts in A3, and continues down.
Link to the Google Sheet I tried it on.
https://docs.google.com/spreadsheets/d/1bwukKFaEow4PysqcJLA9jqjKBLZcY8T1vTN5VpZo8F8/edit?usp=sharing
Let me know if this worked.

How to clean/sort data in R with multiple entries corresponding to few of the intermittent row variables under several columns?

the data sample imageHow to clean/re-organize data in R/R Studio in case a row variable has more than one entries under the column variables ? e.g. I have a data-set that has 13 columns , and 14 rows, in each of the month tab of an excel workbook depicting a year, there are 5 workbooks like this. So, in total there are 5 * 12 = 60 tabs. In each of the month tab, before the second/third/ etc. row starts, the previous row already has multiple entries under a column head, like the one in the attached image at the beginning.
how to format/clean this whole data, including all the months in a year, and also accounting for 5 years on trot, and make this suitable for analysis ? Thanks in advance .
Are you looking to import the excel data into R studio?
Look at the library "xlsx" in order to read excel sheets in.
This will be entered into your environment as a data frame. Available for you to analyse. If you want to have R studio recognise dates, then look at the library "lubridate".
You aren't particularly clear as to what you want here, after you've done some R coding, put it up, and we can help further.

Adding Missing text on bulk CSV file

I have a large data set, roughly 7000 lines. this has been generated with a particular piece missing. Is there a way I can on mass add in the missing information? Below is an example line from my dataset,
PRIPOS;20150527;EUR;AAAAA;Maxi Dresses;5050300000000;22200000;Thyme;Thyme;6;32;AAAAAA MAXI DRESS;AAAAAA MAXI DRESS;2;All AAAAA Products;000;Dresses;100;Maxi Dresses;10000;Soft Maxi Dress;000.00;00.00;;;;;SS15;;;Insert;;
The first bold field (32) need to be considered the second bold field (insert) is where data needs to be added. The 32 represents a size and the Insert should represent a different size. file contains around 7k lines, all different information.
Is there a particular text editor that will allow me to use a wildcard on a replace function, or an ideas on a script? Failing this I would assume dumping into a SQL table and updating via query would be the quickest method?
Thanks a lot.
You could load into Excel and do a formula on the insert column that looks at the 11th column and based on that sets it's value. Set your list separator character to a semi-colon in the regional settings first.