How to group different rows in one column? [closed]

How to group different rows in one column? [closed] - mysql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a website for a project that needs to summarize all of the budget categories in one column.
For example I have a column which contains:
Categories:
Water,Electricity,Gas,Rentals,Hospital Fees,Medicine,Personal Care,Fitness,
I want to select the sum of
water,electricity,gas,rentals
and name it as utility bills.
Same as sum of
hospital fees, medicine, personal care, fitness
as healthcare.
What sql statement should i use?
Any help will be appreciated

You'd have some other table perhaps, or another column on this table, that maps the specific bills to a general group or category
You would then run a query like (if you put the category group in the main table)
SELECT categorygroup, sum(amount)
FROM bills
GROUP BY categorygroup
Or (if you have a separate table you join in)
SELECT bcg.categorygroup, sum(amount)
FROM bills b INNER JOIN billcategorygroups bcg ON b.category=bcg.category
GROUP BY bcg.categorygroup
You would then maintain the tables, either like (category in main table style):
Bills
Category, CategoryGroup, Amount
---
Electricity, Utility, 123
Water, Utility, 456
Or (separate table to map categories with groups style)
BillCategoryGroups
Category, CategoryGroup
---
Water, Utility
Electricity, Utility
Etc
Something has to map electricity -> utility, water -> utility etc. I'd probably have a separate table because it is easy to reorganize. If you decide that Cellular is no longer Utility but instead Personal then just changing it in the mapping table will change all the reporting. It also helps prevent typos and data entry errors affecting reports - if you use the single table route and put even one Electricity bill down as Utitily then it gets its own line on the report. Adding new categories is easy with a separate table too. All these things can be done with single table and big update statements etc but we have "normalization" of data for good reasons

You may use conditional aggregation. Like
SELECT project,
SUM(CASE WHEN category IN ('water','electricity','gas','rentals')
THEN spent
ELSE 0
END) AS bills,
SUM(CASE WHEN category IN ('hospital fees','medicine','personal care','fitness')
THEN spent
ELSE 0
END) AS healthcare
FROM datatable
GROUP BY project;
But the data normalization is the best option. All categories must be moved to separate table. See Caius Jard's answer.

Related

Access: Counting Number of Occurrences in 2 Columns [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm working on a database for work, and I need to figure out a way for Access to count the number of projects that each employee is assigned. Projects have 1 or 2 employees assigned, and my boss needs to be able to quickly figure out how many projects each person is working on. Below is an example table:
Project Employee 1 Employee 2
Project A John Doe Jane Doe
Project B Jane Doe Sam Smith
Project C Jane Doe John Doe
Project D Sam Smith Anna Smith
Project E Anna Smith John Doe
And here is the result I'm looking for:
**Employee # of Projects**
John Doe 3
Jane Doe 3
Sam Smith 2
Anna Smith 2

The table you described is probably not the best way to store the data and I think it's only making your job more difficult. The value of a relational database is that you can have data living in different tables but related based on primary/ foreign keys which makes it significantly easier to pull reports like the one you described. It seems to me like this table might have previously lived in Excel, and I would spend some time now establishing relationships in Access which will save you time and headaches later. I would suggest creating 3 separate tables: employees, projects, and project employee assignments.
The employee table should have 3 fields: EmployeeID, which should be set to AutoNumber in Design view and then selected as the primary key, First Name, and Last Name, both short text fields. This EmployeeID field will be referenced in the project employee assignments table.
The projects table should have 2 fields: ProjectID, also set to AutoNumber in Design view and selected as the primary key, and ProjectName which will also be a short text field. You can also add other fields, perhaps a text field for ProjectDescription would be helpful later on.
The Project-Employee Assignments table should have 2 fields: EmployeeID and ProjectID. If you aren't familiar with one-to-one, one-to-many, and many-to-many relationships I would suggest looking it up- you are describing a many-to-many relationship between the projects and employees, that is, one project can have many employees and one employee can be involved in many projects. This table exists to establish those relationships between employees and projects.
From here, go to the database tools tab and select Relationships. You'll need to establish a one-to-many relationship between the Employees table and the Assignments table on the EmployeeID field. You'll also need to establish a one-to-many relationship between the Projects table and the Project-Employee Assignments table on the ProjectID field.
Enter each relationship between projects and employees in the Assignments table. If you have a short list of projects and employees, you can do this directly in the table, but I'd suggest creating a form to do this with 2 combo boxes that each select from the lists of existing projects and employees, respectively. There are many tutorials about creating combo boxes that show informative columns, like employee name, but save the ID numbers to the table. Search "Bind Combo Box to Primary Key but display a Description field" for one example.
Finally, create a query to count projects per employee. You should include your Employees table, as well as your Project-Employee Assignments table. Select FirstName and LastName from the Employees table. Select both columns (EmployeeID and ProjectID) from the Project-Employee Assignments table. Unclick "show" for EmployeeID. Right-click anywhere in the query to get a menu of more options and click the sigma for totals. Set the total for EmployeeID, FirstName, and LastName to "Group By" and for ProjectID to "Count" then save the query. Run the query and enjoy having your totals!

Elizabeth Ham's answer is very thorough and I recommend following her advice, but knowing that sometimes we don't have time to do a complete overhaul, here's some instructions on how to get results from the given table structure. As Elizabeth and I pointed out (in my comment), a single query could have gotten the requested data if the tables were complete and properly normalized.
Because there are multiple employee columns for which you want statistics, you need to join the given table at least twice, each time grouping on a different column and using a different alias. It is possible to do this using the visual Design View, however it is usually easier to post questions and answers on StackOverflow using SQL text, so that's what follows. Just paste the following code into the SQL view of a query, then you should be able to switch between SQL view and Design View.
Save the following SQL statements as two separate, named queries: [ProjectCount1] and [ProjectCount2]. Saving them allows you to refer to these queries multiple times in other queries (without embedding redundant subqueries):
SELECT P.[Employee 1] As Employee, Count(P.Project]) As ProjectCount
FROM Project As P
GROUP BY P.[Employee 1];
SELECT P.[Employee 2] As Employee, Count(P.[Project]) As ProjectCount
FROM Project As P
GROUP BY P.[Employee 2];
Now create a UNION query for the purpose of creating a unique list of employees from the two source columns. The UNION will automatically keep only distinct values (i.e. remove duplicates). (By the way, UNION ALL would return all rows from both tables including duplicates.) Save this query as [Employees]:
SELECT Employee FROM [ProjectCount1]
UNION
SELECT Employee FROM [ProjectCount2]
Finally, combine them all into a list of unique employees with a total sum of projects for each:
SELECT
E.Employee As Employee, nz(PC1.ProjectCount, 0) + nz(PC2.ProjectCount, 0) As ProjectCount
FROM
([Employees] AS E LEFT JOIN [ProjectCount1] As PC1
ON E.[Employee] = PC1.[Employee])
LEFT JOIN [ProjectCount2] As PC2
ON E.[Employee] = PC2.Employee
ORDER BY E.[Employee]
Note 1: The function nz() converts null values to the given non-null value, in this case 0 (zero). This ensures that you'll get a valid sum even when an employee appears in only one column (and as such has a null value in the other column).
Note 2: This solution will double count an employee if it's listed as both [Employee 1] and [Employee 2] in the original table. I assume that there are proper constraints to exclude that case, but if needed, one could do a self join on the second query [ProjectCount2] to exclude such double entries.
Note 3: If you do decide to follow Elizabeth's advice and you already have a lot of data in the existing structure, the above queries can also be useful in generating data for the new, normalized table structure. For instance, you could insert the unique list of employees from the above UNION query directly into a newly normalized [Employee] table.

what are some disadvantages of having too many null values in one table? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am designing a database to store historical data for my company. Simply speaking, we want to store charges (fee) for products over the past ten years. The charges usually change once or twice a year and we want to store all the changes.
I have generated an idea based on the article from the following link:
https://jiripik.com/2017/02/04/optimal-database-architecture-super-fast-access-historical-currency-market-data-mysql/
I am thinking creating a table for every single product, with date as primary key. Then, I can prepopulates the table with null values from January 1, 2008 to December 31, 2018. So, I can simply update the charges instead of inserting new records.
This method is purely based on the article. However, the problem is that in the article, this method is used for historical currency rates, which changes way more often than my data. So, If I follow this method, I will have most null values in my table.
Can anyone tell me what are disadvantages of this method applied to my data? And is there any better way to design the database?
Note: There isn't a rule that how many times a product changes its fees in a year, and there isn't a constant date every year.

Many Null values aren't a problem per se, but that design is terrible.
You just need two tables,
Products (ID, Name) and
History (ID, ProductID, StartDate, Charge)
and if you want to query "what was the charge for product X on date Y?", you simply do
SELECT TOP 1 Charge
FROM History
WHERE ProductID = X
AND StartDate <= Y
ORDER BY StartDate DESC
getting you the last entry that predates date Y.

Storing duplicate fields: good or bad [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Let's say a user has posts table like this:
Post with id=1 is the first post that a user has posted. Post with an id=2 – is the edit that was made to the post, with id=3 – latest current version of the post.
post_param_a cannot be changed throughout versions, as well as user_id – they always stay the same since the first version. So we could store it like this:
So the question is: would it be better to store it the second way, with no duplication? This way, to get a current version of user's post we'd have to join the first version and check its user_id all the time. Or is it okay to store duplicate fields in this case?
p.s. this is questioned because we want to avoid duplication and accident changes of values that cannot be changed throughout versions, so we want to be storing them all in one place

Take the entity Post and look at the simple tuple:
ID User_ID Post_Param_A Comment
1 69 foo This is a post
This is perfectly normalized. However, the post may undergo editing and you want to track the changes made. So you add another field to track the changes. Instead of an incremental value, however, it would make more sense to add a datetime field.
ID EffDate User_ID Post_Param_A Comment
1 1/1/16 12:00 69 foo This is a post
This has two advantages: 1) if you track the changes, you will want to know anyway when this version was saved and 2) you don't have to find the largest incremental value for the post to find out what value to save with each new version. Just save the current date and time.
However, with either an incremental value or date, there is a problem. In the simple row, each field has a function dependency on the PK. In the version row, User_ID and Post_Param_A maintain their dependency on the PK but Comment is now dependent on the PK and EffDate.
The tuple is no longer in 2nf.
So the solution is a simple matter of normalizing it:
ID User_ID Post_Param_A
1 69 foo
ID EffDate Comment
1 1/1/16 12:00 This is a post
1 1/1/17 12:00 An edit was made
1 1/1/17 15:00 The last and current version (so far)
with (ID, EffDate) the composite PK in the new table.
The query to read the latest post is a bit complicated:
select p.ID, v.EffDate, p.User_ID, p.Post_Param_A, v.Comment
from Posts p
join PostVersions v
on v.ID = p.ID
and v.EffDate = (
select Max( v1.EffDate )
from PostVersions v1
where v1.ID = p.ID
and v1.EffDate <= today )
and p.ID = 1;
This is not really as complicated as it looks and it is impressively fast. The really neat feature is -- if you replace "today" with, say, 1/1/17 13:00, the result will be the second version. So you can query the present or the past using the same query.
Another neat feature is achieved by creating a view from the "today" query with the last line ("and p.ID = 1") removed. This view will expose the latest version of all posts. Create triggers on the view and this allows the apps that are only interested in the current version to do their work without consideration of the underlying structure.

You could have a separate table where you store the post_param_a for each post_id, then you wouldn't need to have NULL values or duplicate values.

The 1st solution is better because user_id is aligned with the post_id and avoid various interpretations.
This way, to get a current version of user's post we'd have to join the first version and check its user_id all the time.
Do you think about adding a field timestamp, so that you can always get the last version of a post?
In the 2nd solution, NULL could be ambiguous when the data grow. And even querying will be difficult, every SQL should be well designed to think about the NULL cases and their specific meanings.
The 3rd solution could be a normalization of your table using 2 separated ones, e.g. post and post_history. As you mentioned in the question that post_param_a cannot be changed throughout versions, as well as user_id – they always stay the same since the first version. In this case,
In table post, you can store information related to the post which are permanent (won't be changed): id, param_a, user_id, created_at ...
In table post_history, you can store informations related to the post which are related to each version / modification: version_id, comment, modified_at ... And you can add a FK constraint for the second table which indicates post_history.post_id = post.id

Database schema for event ticketing system [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to figure out a good way to build a database for events. I have a client that has a list of customer names and promo codes. A customer on the list can go to a landing page, fill out the promo code and choose an event from a drop down field they would like to attend. He currently has 4 events ready to go.
In the database, should I create 4 tables, one for each event with customers or separate the customers from the event tables (ie...customer table and 4 event tables). There might be more events in the future so scalable options would be preferred.
Also, each customer is only aloud a maximum number of 4 tickets and they can only use the promo code once.
Thanks!

Jay is correct that a complete answer would be quite long, but I'll offer a few starting pointers nonetheless as it sounds like you're quite new to database architecture.
As a general principle, you should never build a schema that involves adding/removing tables at run time. The relationship you're looking for between customers and events is many-to-many, which in MySQL would use a junction table. An example schema would look like this:
customer
customer_id (primary key)
email, name, etc.
event
event_id (primary key)
name, time, etc.
ticket
ticket_id (primary key)
customer_id (index)
event_id (index)
date_purchased, etc.
Rules like "each customer is only allowed 4 tickets" should be implemented at a code level rather than a schema level since that is subject to change and your schema should be flexible enough to accommodate that change, tempting as it may be to have four columns in the customers table for the four tickets.
To get the events that customer ID 1 is attending:
SELECT DISTINCT event.*
FROM ticket
LEFT JOIN event ON ticket.event_id = event.event_id
WHERE ticket.customer_id = 1
To get the customers attending event ID 1:
SELECT DISTINCT customer.*
FROM ticket
LEFT JOIN customer ON ticket.customer_id = customer.customer_id
WHERE ticket.event_id = 1
A common format for junction tables is to combine the two table names, as in event_customer, but in this case calling it ticket makes more sense, since you might be including additional information about the ticket purchase in that table.

What is a good database design (schema) for a attendance database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm trying to make a application for keeping attendance for a relative's martial arts studio. I've tried looking around for some similar examples, but I couldn't find any specific or clear enough ones for this kind of application.
At the moment, I am using two tables, one for keeping student information, students(id, first_name, last_name, email, ...), and another table for attendance by the weeks in a year, attendance(id, week_1, week_2, week_3, ...). I am trying to change it to keep attendance by days instead, but can't seem to think of a good approach since I'm still kind of new to MySQL.
I am trying to make it so it is possible to see the attendance in a calendar-like format. It probably would be bad to just make columns for 365 days... and same with having a table for each month. I've noticed some similar applications just keep track of the dates, and store that in the database. Would this approach be better? Or, is there some other better approach to designing this kind of database? Thanks in advance.

In martial arts, instructors are students too -- so the Instructor table is sub-typed to the Student table. All common fields are in the Student table and only columns specific to instructors are in the Instructor table.
The Art table has list of arts that the school offers (judo, karate ...).
The school may have several rooms, these are listed in the Room table.
ClassSchedule describes the published schedule of classes that the school offers.
Attendance is captured in the Attendance table.
One row in the Calendar table is one calendar day (date). The table has date-properties like DayOfWeek, MonthName, MonthNumberInYear etc.
One row in the TimeTable is one minute of a day, like 7:05.
Calendar and TimeTable allow for easy attendance reporting by date/time, for example
-- Attendance of judo morning classes
-- for the first three months of the year 2010
-- by day of a week (Sun, Mon, Tue, ..)
select
DayOfWeek
, count(1) as Students
from ClassSchedule as a
join Calendar as b on b.CalendarId = a.CalendarId
join TimeTable as c on c.TimeID = a.StartTimeId
join Attendance as d on d.ClassId = a.ClassID
join Art as e on e.ArtId = a.ArtID
where ArtName = 'judo'
and Year = 2010
and MonthNumberInYear between 1 and 3
and PartOfDay = 'morning'
group by DayOfWeek ;
Hope this gets you started.

Attendance should have id, student_id and date. This is all you need to record when students attended. if you want to know how many students attended on a specific date (and who) you run a query for that specific date or date range.
You could also create a lesson table, in which case the attendance table would be
id, student_id and lesson_id
the lesson table could be
id, held_on_date
unless you need to add more columns to the lesson table, I think it is overkill.

Step back a little, you have two types of entities:
a person [like a student]
events [like a class]
Think of any entity as something that exists in the real world.
And one relationship
attendance
A relationship is just that, an association between entities, and often has time data associated with it or other types of measures.
So without thinking too hard, you should have 3 database tables:
attendee [E]
class [E]
attendance [R]
E = entity, R = relationship
If you find yourself duplicating data in one of the entity tables, this is a good sign that this entity requires a "sub-model". In some places this is called "don't repeat yourself" or DRY and for entity relational modeling, this is called "data normalization".
Remember, there's overhead in both time and code to build a more elaborate schema. So consider starting simple [3 tables] and refactoring away redundancy.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008