Lookup, Match and Concatenate - google-apps-script

I need a formula/function to concatenate cell values from one column and multiple rows. The matching criteria is applied to a different column. Here is my example of what I have to do:
Islington | "Bunhill" | EC2M
Islington | "Bunhill" | EC2Y
Islington | "Bunhill" | N1
Barnet | "Burnt Oak" | HA8
Barnet | "Burnt Oak" | NW7
Barnet | "Burnt Oak" | NW9
The end result needs to look like this:
Islington | "Bunhill" | EC2M, EC2Y, N1
Barnet | "Burnt Oak" | HA8, NW7, NW9
Basically, I need to remove all duplicates from the second column, but save the data from the third column that is paired with each of the duplicates, and concatenate it in one cell.

You can go through a process of steps using functions. Start with the UNIQUE function. Put this in a cell where it is convenient to list all the unique values of column B:
=UNIQUE(B:B)
Gets all the unique values in column B.
Google Support - Unique Function
The result from the UNIQUE function will look like this:
Now that you have all the unique values from column B, you can use the FILTER function to retrieve all the rows that match that unique value.
=FILTER(D1:D6, B1:B6=A8)
The FILTER function lists all the results down the column, but you can use the CONCATENATE function to avoid that.
Results of FILTER function:
Results of CONCATENATE:
You will need to adjust the FILTER function to now use column D, rather than column C.
=CONCATENATE(FILTER(D1:D6, B1:B6=A8))
This solves the problem of getting data in multiple rows, but now there is no separator between the values.
To get around that problem, you can create a fourth column with a function that adds a comma to the end:
There is a problem with an extra comma on the end, which you can get rid of with the LEFT function:

If not required too often it is quite practical without a script. Assuming EC2M is in C2, D1 is blank, and your data is sorted, in D2:
=if(B1=B2,D1&", "&C2,C2)
and in E2, both formulae copied down to suit:
=B2=B3
Select all, Ctrl+c, Edit, Paste special, Paste values only over the top and filter to select and delete rows with TRUE in ColumnE.

TEXTJOIN has 2 advantages over CONCATENATE: (1) customizable delimiter, and (2) can skip blanks.
Example:
AA | BB | CC | __ | EE
=TEXTJOIN(",",TRUE,A1:E1)
Will produce: AA,BB,CC,EE
(skipping the blank DD and putting a comma in between every term except last)

Related

mySQL - Reiteratively Count rows that have particular CSV string

2-column MySQL Table:
| id| class |
|---|---------|
| 1 | A,B |
| 2 | B,C,D |
| 3 | C,D,A,G |
| 4 | E,F,G |
| 5 | A,F,G |
| 6 | E,F,G,B |
Requirement is to generate a report/output which tells which individual CSV value of class column is in how many rows.
For example, A is present in 3 rows (with id 1,3,5), and C is present in 2 rows (with id 2,3), and G is in 4 rows (3,4,5,6) so the output report should be
A - 3
B - 3
C - 2
...
...
G - 4
Essentially, column id can be ignored.
The draft that I can think of - first all the values of class column need to picked, split on comma, then create a distinct list of each unique value (A,B,C...), and then count how many rows contain the unique value from that distinct list.
While I know basic SQL queries, this is way too complex for me. Am unable to match it with some CSV split function in MySQL. (Am new to SQL so don't know much).
An alternative approach I made it to work - Download class column values in a file, feed it to a perl script which will create a distinct array of A,B,C, then read the downloaded CSV file again foreach element in distinct array and increase the count, and finally publish the report. But this is in perl which will be a separate execution, while the client needs it in SQL report.
Help will be appreciated.
Thanks
You may try split-string-into-rows function to get distinct values and use COUNT function to find number of occurrences. Specifically check here

Is there a way in MySQL to use aggregate functions in a sub section of binary column?

Suppose we have 2 numbers of 3 bits each attached together like '101100', which basically represents 5 and 4 combined. I want to be able to perform aggregation functions like SUM() or AVG() on this column separately for each individual 3-bit column.
For instance:
'101100'
'001001'
sum(first three column) = 6
sum(last three column) = 5
I have already tried the SUBSTRING() function, however, speed is the issue in that case as this query will run on millions of rows regularly. And string matching will slow the query.
I am also open for any new databases or technologies that may support this functionality.
You can use the function conv() to convert any part of the string to a decimal number:
select
sum(conv(left(number, 3), 2, 10)) firstpart,
sum(conv(right(number, 3), 2, 10)) secondpart
from tablename
See the demo.
Results:
| firstpart | secondpart |
| --------- | ---------- |
| 6 | 5 |
With the current understanding I have of your schema (which is next to none), the best solution would be to restructure your schema so that each data point is its own record instead of all the data points being in the same record. Doing this allows you to have a dynamic number of data points per entry. Your resulting table would look something like this:
id | data_type | value
ID is used to tie all of your data points together. If you look at your current table, this would be whatever you are using for the primary key. For this answer, I am assuming id INT NOT NULL but yours may have additional columns.
Data Type indicates what type of data is stored in that record. This would be the current tables column name. I will be using data_type_N as my values, but yours should be a more easily understood value (e.g. sensor_5).
Value is exactly what it says it is, the value of the data type for the given id. Your values appear to be all numbers under 8, so you could use a TINYINT type. If you have different storage types (VARCHAR, INT, FLOAT), I would create a separate column per type (val_varchar, val_int, val_float).
The primary key for this table now becomes a composite: PRIMARY KEY (id, data_type). Since your previously single record will become N records, the primary key will need to adjust to accommodate that.
You will also want to ensure that you have indexes that are usable by your queries.
Some sample values (using what you placed in your question) would look like:
1 | data_type_1 | 5
1 | data_type_2 | 4
2 | data_type_1 | 1
2 | data_type_2 | 1
Doing this, summing the values now becomes trivial. You would only need to ensure that data_type_N is summed with data_type_N. As an example, this would be used to sum your example values:
SELECT data_type,
SUM(value)
FROM my_table
WHERE id IN (1,2)
GROUP BY data_type
Here is an SQL Fiddle showing how it can be used.

MySQL delete lines that contains specific word

I'm trying to delete lines in specific column from all rows that contains specific words.
For example:
Remove lines that contain word apple and it is always at the beginning of the line.
+--+------------------+
|ID|data |
+--+------------------+
|1 |sometext1 |
| |sometext2 |
| |apple sometext3 |
| |sometext4 |
+--+------------------+
|2 |apple sometext5 |
| |sometext6 |
+--+------------------+
so the result would be:
+--+------------------+
|ID|data |
+--+------------------+
|1 |sometext1 |
| |sometext2 |
| |sometext4 |
+--+------------------+
|2 |sometext6 |
+--+------------------+
'SometextX' is different in every line, number of lines is different in every row and it has different number of characters in every line.
I really need this in MySQL any help would be appreciated.
You would be better off using REGEXP here to match patterns in each line:
DELETE FROM yourTable WHERE text REGEXP '^apple';
REGEXP allows for fairly complex regex matching, and would be useful if your requirement changes or gets more complex later on.
Edit: MySQL has no built in support for regex replacement, so there is no easy way to accomplish what you want.
A general regex pattern to remove the word apple would be \bapple\b. You may search on this pattern and replace with empty string.
You would use where:
where textcol not like 'apple%' or textcol is null
This can be part of a select or a delete (the question mentions "result" which suggests the former and "delete" which suggests the latter). It is not clear whether you actually want to change the data or whether you just want the result set without these words.
Note: you can do this without or and still handle NULL values, because MySQL has a NULL-safe equality operator:
where not left(textcol, 5) <=> 'apple'
You can use MySQL functions to select the right rows and to update with new data as follows:
UPDATE `yourTable` SET `yourField` = REPLACE(yourField, 'apple', '') WHERE yourField LIKE '%apple%'
If you don't want to delete the whole row, you can run these 3 queries in this order
update your_table set text=replace(text,substring(text,#start:=locate('\napple',text),locate('\n',text,#start+1)-#start+1),'');
update your_table set text=if((#start:=locate('apple',text))=1,replace(text,substring(text,#start,locate('\n',text,#start+1)-#start+1),''),text);
update your_table set text=if((#start:=locate('apple',text))=1,replace(text,substring(text,locate('apple',text)),''),text);
update #1 will remove apple in the middle of the text (prefixed by \n)
update #2 will remove apple at the beginning of its row (nothing before) and having following rows
update #3 will remove remaining cases

How to store and evaluate dynamic expressions in MySQL(or any other SQL)

Best way to store a dynamic expression in a table for each row for a searching module.
The expression is dynamic and can have multiple fields which are being compared.
I considered creating a separate column for each type of field and fattening out complex nested logic by getting all possible combinations using dnf and storing them in my table. The disadvantages of doing that is for every new logic and expression, a new column has to be created which would lead to a large table which has too many NULLS in it and also adding a new column would take time & refactoring(we are talking about more than 800 columns here).
The alternate approach which I think would work better is below->
I want to discuss if there are better way to this, and if not, how can we improve and achieve the below suggested approach.
| id | expression | diagnosis |
|------|------------------------------------------------|-------------|
| 1 |`p.age>12 and p.gender==Male` | diseaseA |
| 2 |`p.age>50 and p.bp>20` | diseaseB |
| 3 |`p.age<20 and p.bp<20` | diseaseC |
| 4 |`p.age<30 and p.age>20 and (p.bp<30 or p.bp>50)`| diseaseD |
I want to search in this table, for a patient p with certain properties (age=*something*,bp=*something*,etc).
The resulting rows should return all rows which satisfy the expression and also rows which partially match the expression(i.e the rows which are using properties not supplied in the search criteria).
For example for a search for patient p(age=22,bp=15), the search result should be
| id | disease |
|------|-------------|
| 1 | diseaseA |
| 3 | diseaseC |
| 4 | diseaseD |
Since I am new to SQL, the (newbie) way I think I can do this is
First get all the rows(in-memory would be costly, lets discuss what is best possible way to execute the below said functionality in point 2 row-by-row)
Then row-by-row transform the expression to a logical executable expression(which is later executed using eval) using regex matching & replacement(I hope there is a better way than this) for the search criteria(i.e. substituting the patient details) [in my example for the 2nd row, the expression p.age>50 and p.bp>20 gets converted to "22>50 && 15>20"]
All the rows for which the result of transforming & executing the result was true(or partially matched) should be returned.
The language is not an issue as I would be starting this project from scratch and can use any language
I can answer for MySQL.
First of all, you'll have to write all of your sql code inside sql procedure.
Generally you are interestedin dynamic SQL
https://dev.mysql.com/doc/refman/5.7/en/sql-syntax-prepared-statements.html
So a straight-forward approach is to open a a cursor for your table with expressions and for each expression replace p.age with it's actual value and then execute dynamic SQL. (select 22 > 50 and 15 > 20)
Another approach is to loop through expression table (open cursor for it) and as you probably have patient id (not only it's field values) just generate normal sql that selects from patient table (select patient_id from patients where [expression_from_expression_table] and patient_id = [your_known_patient_id])
And the third one that I can imagine is generating a big single query from whole expression table
select group_concat(concat('if(', expression, ',"', diagnosis, '", "") as ', diagnosis) separator ',') from expressions into somevar;
and then doing replace of p.* with actual values and executing second query:
set somevar = replace(somevar, 'p.age', '15');
...
#qry = concat('select ', somevar);
PREPARE qry FROM #qry;
EXECUTE qry;
The third approach is fastest to my mind but will require aditional work on client as you will recieve diagnosis as columns, not as rows.
But hope you get the general idea.

Store multiple values in a single cell instead of in different rows

Is there a way I can store multiple values in a single cell instead of different rows, and search for them?
Can I do:
pId | available
1 | US,UK,CA,SE
2 | US,SE
Instead of:
pId | available
1 | US
1 | UK
1 | CA
1 | SE
Then do:
select pId from table where available = 'US'
You can do that, but it makes the query inefficient. You can look for a substring in the field, but that means that the query can't make use of any index, which is a big performance issue when you have many rows in your table.
This is how you would use it in your special case with two character codes:
select pId from table where find_in_set('US', available)
Keeping the values in separate records makes every operation where you use the values, like filtering and joining, more efficient.
you can use the like operator to get the result
Select pid from table where available like '%US%'