combining dataframes, and adding values of common elements

combining dataframes, and adding values of common elements - mysql

I have multiple data sets like this
data set 1
index| name | val|
1 | a | 1 |
2 | b | 0 |
3 | c | 3 |
data set 2
index| name | val|
1 | g | 4 |
2 | a | 2 |
3 | k | 3 |
4 | l | 2 |
I want to combine these data sets in such a way that if the both the data sets have a row with a common element name, in this example, "a", i want to have only a single row for the combined dataset, where the value is sum of that a and this a, in this case the combined row a would have a val of 3 (2+1). index number for elements does not matter. is there an effective way to do this in excel itself? I'm new to querying data, but im trying to learn. If i can do this in pandas(i'm trying to make myself familiar in this language) or sql, I will do so. My data sets are of different sizes

use:
df3 = df1.groupby('name').sum().add(df2.groupby('name').sum(), fill_value=0).reset_index()
df3['val'] = df3.fillna(0)[' val']+df3.fillna(0)['val']
df3 = df3.drop([' val'], axis=1)
print(df3)
Output:
name index val
0 a 3.0 3.0
1 b 2.0 0.0
2 c 3.0 3.0
3 g 1.0 4.0
4 k 3.0 3.0
5 l 4.0 2.0

IN Sql you can try below query:
select name,sum(val)
from
(select index,name,val from dataset1
union all
select index,name,val from dataset2) tmp
group by name
In Pandas:
df3=pd.concat([df1,df2],ignore_index=True)
df3.groupby(['name']).sum()

Related

JDBC, spring: How to bulk update using JdbcTemplate

I'm trying to update one Column data with the value. CARD Table Example likes below.
| id | column_for_update |
| 1 | 10 |
| 2 | 11 |
| 3 | 12 |
| 4 | 13 |
| 5 | 14 |
...
Using jdbcTemplate(or other possible templates), I wanted to update column_for_update to 1000 * n(n = 1, 2, 3 ..), and result will be like ['10, 11, 12 ...'] -> ['1000, 2000, 3000'].
It is possible importing all objects into the application level and update them, but importing a large number of objects causes outOfMemory exceptions. How can I update this columns efficiently?

You can try changing your query to use a UPDATE/SELECT to multiply your id column by 1000:
UPDATE card a
INNER JOIN card b ON a.id = b.id
SET a.column_for_update = (b.id * 1000)
id
column for update
1
1000
2
2000
3
3000
4
4000
5
5000
See Fiddle.

How to pull parents in a hierarchical table design

I'm working on a database with a category tree that's hierarchical. I'd like to be able to be able to write a query that returns all of the parents. For example, assume this structure/content. A parent of 0 means that it's a root element, no parents.
ID | Name | Parent
1 | Tools | 0
2 | Drill | 1
3 | Impact | 2
4 | Cordless | 2
5 | Series X | 4
How could I write a query that would get all of the parents of Series x (ID 5)? I don't care if it's inclusive of ID 5, since I would already have that one. I'd like to see it return the below results.
ID | Name | Parent
1 | Tools | 0
2 | Drill | 1
4 | Cordless | 2
5 | Series X | 4
Bonus if there's a way to find how many generations they are at the same time. Something like:
ID | Name | Parent | Generation
1 | Tools | 0 | 0
2 | Drill | 1 | 1
4 | Cordless | 2 | 2
5 | Series X | 4 | 3
I'm really stuck on this right now. I am thinking it might need to be a custom sql function?

In MySQL 8.0, they now support recursive CTE queries:
WITH RECURSIVE cte AS (
SELECT * FROM MyTable WHERE id = 5
UNION ALL
SELECT MyTable.* FROM cte JOIN MyTable WHERE MyTable.id = cte.parent
)
SELECT * FROM cte ORDER BY id;
Getting the "Generation" when your CTE starts at the leaf of the hierarchy is tricky.
If you are using a version of MySQL older than 8.0, you may like my answer to What is the most efficient/elegant way to parse a flat table into a tree? or my presentation Recursive Query Throwdown.

MySQL Get grouped join from multiple ID's

I'm creating a database for compressed files with some media files. To connect the files within every .zip, i'me using a middle table compressed_has_medium.
The Database
medium compressed_has_medium compressed
id | filename id | medium_id | compressed_id id | filename
-------------- ------------------------------ -------------
1 | file1.mp3 1 | 1 | 1 1 | compressed1.zip
2 | file2.mp3 1 | 2 | 1 2 | compressed2.zip
3 | file3.mp3 1 | 3 | 1 3 | compressed3.zip
4 | file4.mp3 1 | 4 | 1
5 | file5.mp3 1 | 5 | 1
6 | file6.mp3 1 | 6 | 1
7 | file7.mp3 1 | 1 | 2
8 | file8.mp3 1 | 2 | 2
9 | file9.mp3 1 | 3 | 2
The Problem
I need to return every .zip that contain the ID's that I send to MySQL.
If the zip1 has the files 1,2,3,4,5,6 and I want the zip for the files 3,4, it should return me the zip1.
But if the zip2 has the files 1,2,3 and I request the files 3,4, it shouldn't be returned.
The partial solution
I've tried a lot and get to this solution: http://sqlfiddle.com/#!9/cb0864/37 but It's a really ugly query and probably not efficient.
SELECT c.*, GROUP_CONCAT(cm.medium_id) as media
FROM compressed as c
INNER JOIN compressed_has_medium as cm
ON cm.compressed_id = c.id
GROUP BY cm.compressed_id
HAVING (media LIKE '%,3,%' OR media LIKE '3,%' OR media LIKE '%3')
AND (media LIKE '%,4,%' OR media LIKE '4,%' OR media LIKE '%,4')
Do you know a better way to do it?
Many tanks!

May be this is what you expecting i believe
i have changed the query in your fiddle : http://sqlfiddle.com/#!9/cb0864/62
SELECT c.*,GROUP_CONCAT(cm.medium_id) as media
FROM compressed as c
LEFT JOIN compressed_has_medium as cm
ON cm.compressed_id = c.id
WHERE cm.medium_id in (3,4)
GROUP BY cm.compressed_id
having sum(cm.medium_id in (3,4)) = 2 and // 2 represents the numbers of medium id you are entering
sum(cm.medium_id not in (3,4)) = 0

Merge two tables together and overwrite exisiting values with SQL

I try to merge two tables together and want to get a single table with SQL. My main problem is to overwrite existing values, because in the 2nd table (deltaTable) are some new revision rows, that have the same ID as in the first table (rootTable).
Example:
1) rootTable
ID | REV | NAME
1 | 0 | Part 1
2 | 0 | Part 2
3 | 0 | Part 3
4 | 0 | Part 4
5 | 0 | Part 5
2) deltaTable
ID | REV | NAME
2 | 1 | Part 2
4 | 2 | New Part 4
I want to have the following result:
ID | REV | NAME
1 | 0 | Part 1
2 | 1 | Part 2
3 | 0 | Part 3
4 | 2 | New Part 4
5 | 0 | Part 5
Can anyone help me or give me an hint how to manage the SQL code?

If I understand your question correctly, you could use an UPDATE query:
update
rootTable r inner join deltaTable d
on r.id = d.id
set
r.REV = d.REV,
r.NAME = d.NAME
Please see it working here.
As Hogan suggested, we could add something like where d.rev>r.rev since it should help giving better performances.
An alternative query, if you defined ID as a primary key, is:
insert into rootTable (ID, REV, NAME)
select * from deltaTable
on duplicate key update
REV=values(REV), NAME=values(NAME);
(this will update existing records, and add new ones).
Please see it here.

MySQL Table Structure - Storing a limited set of numbers

I need to store a set of numbers in a MySQL database. I need some help to determine the best table structure to use.
There are 20 numbers that will be stored in each row, along with an ID. The numbers can range from 1 - 80 and there are no repeats in this series of numbers.
Initially I created a table structure with 21 columns, an ID and 20 columns that store each individual number.
Id | Num1 | Num2 | Num3 | Num4 | Num5 | etc.. |
----------------------------------------------------------
0001 | 1 | 4 | 15 | 22 | 39 | 43 |
0002 | 3 | 5 | 22 | 43 | 55 | 58 |
0003 | 1 | 3 | 5 | 6 | 15 | 26 |
I've also thought of a table with 81 columns, an ID and 80 boolean columns that would represent each individual number.
Id | 1 | 2 | 3 | 4 | 5 | etc.. |
----------------------------------------------------------
0001 | True | False | False | True | True | False |
0002 | False | False | True | False | True | False |
0003 | True | False | True | False | True | True |
Can anyone give some advice to the pros and cons of each table structure, and which would be easier to use when searching this table.
For example, we would need to search for every row that contains 1,2,5,66, and 79.
Or every row that contains 16,33, and 4.
Any guidence would be appreciated.

What you're looking for is called database normalization; a way to organize data that prevents duplication and anomalies (like changing one record inadvertently changing another record).
Higher-normal forms depend on the meaning of your data, which you have not told us, but to start you should avoid ordered or indeterminate columns (like Num1, Num2, ...) and split your columns into rows:
ID Num
0001 1
0001 4
0001 15
...
0002 3
0002 5
...
In general, any time you find yourself adding a bunch of columns that depend on their position you are making a mistake. SQL has many functions for aggregating, combining, sorting, and reporting on rows. Use the features of SQL to produce the results you want; don't try to make your database schema look like the final printed report.
In answer to your comment, a query that returns only IDs that have Nums 1, 4, and 15, and no other ID:
select ID from YourTable
where Num in (1, 4, 15)
group by ID
having Count(ID) = 3
If Nums can be duplicated you will want something like having count(distinct ID). If you can have different counts of Nums to match you will have to create a temporary table of Nums to match and use having count(ID) = (select Count(Num) from TemporaryTable).
Note that SQL Server already has a master..spt_values table of integers to use in such situations; I do not know if MySql has such a thing, but they are easy to generate if you need one.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

combining dataframes, and adding values of common elements - mysql

use: df3 = df1.groupby('name').sum().add(df2.groupby('name').sum(), fill_value=0).reset_index() df3['val'] = df3.fillna(0)[' val']+df3.fillna(0)['val'] df3 = df3.drop([' val'], axis=1) print(df3) Output: name index val 0 a 3.0 3.0 1 b 2.0 0.0 2 c 3.0 3.0 3 g 1.0 4.0 4 k 3.0 3.0 5 l 4.0 2.0

IN Sql you can try below query: select name,sum(val) from (select index,name,val from dataset1 union all select index,name,val from dataset2) tmp group by name In Pandas: df3=pd.concat([df1,df2],ignore_index=True) df3.groupby(['name']).sum()

Related

JDBC, spring: How to bulk update using JdbcTemplate

How to pull parents in a hierarchical table design

MySQL Get grouped join from multiple ID's

Merge two tables together and overwrite exisiting values with SQL

MySQL Table Structure - Storing a limited set of numbers

Categories

Resources