Divide SELECT value by first entry found - mysql

I am fairly inexperienced with SQL, and I have a table that looks like this (simplified version):
ID | Dataset | date | value
I'm trying to divide each value by a baseline, which would be the first entry in the database for that particular dataset.
For example, for dataset1, if the value at 05/05/2018 is 28, and the first value in the database is 4 at 01/01/2018, then I want the result to be 28/4.
thing is that not every dataset was added to the database at the same time, so they have different dates for their baseline. So if dataset1 has its first entry at 01/01/2018, dataset2 might have its first entry at 02/02/2018.
How would I go about this query? I tried a simple div, but it seems like I can only divide by a single number, and not a value-by-value table.
I tried something like this:
SELECT DATE(date) as time, value, dataset / (
SELECT min(DATE(date)) as time, value, dataset FROM table GROUP BY dataset)
FROM table GROUP BY time, dataset
but I think SQL expects the denominator to be a single value in this case, not a table.

Check this out:
SELECT T1.Date
,T1.Value/T2.Value Value
FROM TempTable T1
INNER JOIN TempTable T2
ON T2.Id
=
(SELECT T2.Id
FROM TempTable T2
WHERE T1.Dataset=T2.Dataset
ORDER BY T2.Date
LIMIT 1
)

"I think SQL expects the denominator to be a single value in this case, not a table." You are on the right track, so the sub select needs to return a single value, below is an example:
SELECT
value, (select value from table b where a.dataset = b.dataset and b.dt = (select min(dt) from table c where b.dataset = c.dataset))
FROM
table a

In MySQL 8+, you would simply do:
select t.*,
t.value / first_value(t.value) over (partition by t.dataset order by t.date) as ratio
from t;

Related

Is it possible to query MySQL to get only fields that contain duplicate/repeating strings?

What I mean is, I have table with a "list" column. The data that goes into the "list" is related to addresses, so I sometimes get repeated zip codes for one record in that field.
For example, "12345,12345,12345,12456".
I want to know if it's possible to construct a query that would find the records that have an unknown string that duplicates within the field, such that I would get the records like "12345,12345,12345,12456", but not ones like "12345,45678,09876".
I hope that makes sense.
Yes, it is possible. You need to use a numbers table to convert your delimited string into rows, then use group by to find duplicates, e.g.
CREATE TABLE T (ID INT, List VARCHAR(100));
INSERT INTO T (ID, List)
VALUES (1, '12345,12345,12345,12456'), (2, '12345,45678,09876');
SELECT
T.ID,
SUBSTRING_INDEX(SUBSTRING_INDEX(T.list, ',', n.Number), ',', -1) AS ListItem
FROM T
INNER JOIN
( SELECT 1 AS Number UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) AS n
ON CHAR_LENGTH(T.list)-CHAR_LENGTH(REPLACE(T.list, ',', ''))>=n.Number-1
GROUP BY T.ID, ListItem
HAVING COUNT(*) > 1;
If you don't have a numbers table you can create one in a derived query as I have above with UNION ALL
Example on DB Fiddle
With that being said, this is almost certainly not the right way to store your data, you should instead use a child table, e.g.
CREATE TABLE ListItems
(
MainTableId INT NOT NULL, --Foreign Key to your current table
ItemName VARCHAR(10) NOT NULL -- Or whatever data type you need
);
Then your query is much more simple:
SELECT T.ID, li.ItemName
FROM T
INNER JOIN ListItems AS li
ON li.MainTableId = T.ID
GROUP BY T.ID, li.ItemName
HAVING COUNT(*) > 1;
If you need to recreate your original format, this is easily done with GROUP_CONCAT():
SELECT T.ID,
GROUP_CONCAT(li.ItemName) AS List
FROM T
INNER JOIN ListItems AS li
ON li.MainTableId = T.ID
GROUP BY T.ID;
Example on DB Fiddle
I am still unclear what your desired result is based on your question however if it is simply to get all rows where there is a duplicate entry in column list you could do the following:
SELECT * FROM TABLE
WHERE COLUMN IN
(SELECT COLUMN FROM TABLE
having count(*) >1)

During group by I need to take a variable which is not using in group by also I don't want to take its aggregation function (I want it as it is)

I have a data Frame that has millions of records and 8 columns.
I want to group by it with col1 and col2 and in select, I want name_id, max(SUM),col1,col2.
Now the problem is I am not using name_id in a group by condition nor is it an aggregate function.
Can you please suggest any method that solves my problem in SQL or Pyspark.
Input Data Frame here SUM = number of columns have data and name_id is unique:
Required Output : name_id (as it is), max(SUM),Col1,Col2
I tried something like this but it's not working:
Any suggestion is welcome!
I tried below code which is working fine with one scenario and not with others.
Working scenario, When I have duplicate maximum values in sum column then its working fine and retuning max name_id which is my requirement
When SUM columns do not have maximum value duplicate then it is returning null, in the below table according to logic my output should contain name_id = 48981 and name_id = 52214 but I am getting the only name_id = 52214.
It is a classical greatest per group problem. I would suggest using the following solution to this problem:
select d.*
from data_frame d
join (
select col_1, col_2,
max(sum) max_sum,
max(name_id) max_name_id
from data_frame
group by col_1, col_2
) t on d.col_1 = t.col_1 and
d.col_2 = t.col_2 and
d.name_id = t.max_name_id and
d.sum = t.max_sum
You seem to want:
select max(name_id), max(sum), col1, col2, max(col3), . . .
from t
group by col1, col2;
Your last column doesn't seem to be using max(), but you have not explained that logic.

Mysql: Is it possible to use a subquery inside a from clause in order to pick the table name from another table

I was wondering if there's any way to add a subquery with a switch case to the form clause of my select query in order to select a table based on a condition.
For example:
select a.*
from (select (case when (table2.column = 'something')
then (table2.tablename1)
else (table2.tablename2)) as tablename
from table2
where table2.column2 = 'blabla'
limit 1
) a
I tried to write that in many variation & so far non of them worked.
On the most successful tryouts (when I got no mysql errors) it returned the name of the table as the result itself (for example: the value that's in table2.tablename2). I understand why it did that (because I selected everything from a select results...) but how can I use the tablename from the results in order to set the table on the main query?
Hope that make sense...
Any idea?

Concat 2 columns in a string, then get a count for each concatenation

I am trying to concatenate 2 columns, then count the number of rows i.e. the total number of times the merged column string exists, but I don't know if it is possible. e.g:
SELECT
CONCAT(column_1,':',column_2 ) as merged_columns,
COUNT(merged_columns)
FROM
table
GROUP BY 1
ORDER BY merged_columns DESC
Note: the colon I've inserted as a part of the string, so my result is something like 12:3. The 'count' then should tell me the number of rows that exist where column_1 =12 and column_2 = 3.
Obviously, it tells me 'merged_columns' isn't a column as it's just an alias for my CONCAT. But is this possible and if so, how?
Old question I know, but the following should work without a temp table (unless I am missing something):
SELECT
CONCAT(column_1,':',column_2 ) as merged_columns,
COUNT(CONCAT(column_1,':',column_2 ))
FROM
table
GROUP BY 1
ORDER BY merged_columns DESC
You can try creating a temp table from your concatenation select and then query that:
SELECT CONCAT(column_1,':',column_2 ) AS mergedColumns
INTO #temp
FROM table
SELECT COUNT(1) AS NumberOfRows,
mergedColumns
FROM #temp
GROUP BY mergedColumns
Hope this answer is what your are looking for.
Try this
SELECT
CONCAT(column_1,column_2 ) as merged_columns,
COUNT(*)
FROM
table
GROUP BY merged_columns
ORDER BY merged_columns DESC

Select record using values from a previous COUNT() IN MYSQL

I obtain a series of values that appear only one time in my database using COUNT in mysql that list below:
valueName
---------
value1
value2
value3
value4
I need a script that retrieves all records in a table where valueName are not the values listed in the initial count, and I need this two steps to run in a single script (doesn't matter how many parts it has).
I've got the script to obtain the list above like this:
SELECT field AS new_name FROM table GROUP BY field HAVING COUNT(field) = 1;
And it works.
The problem is that I don't know how to work with the aggregated result of the first step. Maybe using some kind of function. Or loop (I don't think in SQL..).
I've tried different things like attaching a COUNT inside a WHERE clause and others but it doesn't work.
Please help!
Use a join:
select t.*
from table t join
(SELECT field
FROM table
GROUP BY field
HAVING COUNT(field) > 1
) filter
on t.field = filter.field;
If you have a primary key in your table and an index on table(field, pk), the following is probably faster:
select t.*
from table t
where exists (select 1
from table t2
where t2.field = t.field and t2.pk <> t.pk
);
Try this:
SELECT table.* FROM table
JOIN
(SELECT field FROM table GROUP BY field HAVING COUNT(field) > 1) newtable
ON
table.field = newtable.field;
This should work.