how to represent large dataset in simplified form

how to represent large dataset in simplified form - mysql

We have a table attendance_details in MySQL that has attendance details for five courses. This table has millions of records.
table structure is
training_date - date on which training happened
student_id - id of the student
course1 - no of hrs attended
course2 - no of hrs attended
course3 - no of hrs attended
course4 - no of hrs attended
course5 - no of hrs attended
I need to expose the information to another app which will query the attendance details.
and the query pattern is always "did a given student attend course4 all the day between date1 and date2 ?"
if the student attended atleast one hour for that day then it will be considered as attended.
the result could be
attended all the day between date1 and date2
attended between date1 and date2, but absent for some days
not appeared at all between date1 and date2
I need to provide data in simplified way in new table, so that another app can get the details by querying.
my objectives are
reduce no. of records substantially in new table, so that the query would run fast. faster querying is main objective
data model should be easier for querying
constraints :
i do not want to expose attendance_details, just because it is huge and highly transactional.
it is not possible to chanage structure of attendance_details.
Below is what i have tried
table to represent first & last attendance date, first & last absent date
+------------+------------------+-----------------+--------------+------------------------+
| student_id | first_attendance | last_attendance | first_absent | last_absent |
+------------+------------------+-----------------+--------------+------------------------+
| 123 | 2015-01-01 | 2015-01-30 | 2015-01-15 | 2015-01-21 |
+------------+------------------+-----------------+--------------+------------------------+
in above table design the dates become specific to courses. hence i need 4 courses x 4 columns, totally 16 columns. this will increase if i add more courses.
Also attempted to represent each month records as bitmap , but that makes programming logic complex.

I'd say you're close.
Let's go over the relationships.
A student takes 1 or more courses.
A student attends a course all days between 2 dates.
A student attends a course some days between 2 dates.
A student did not attend a course between 2 dates.
So let's look at the object tables first. I'm assuming there's a Student table and a Course table already in the database.
The first table is a junction table of Student and Course.
StudentCourse
-------------
Student ID
Course ID
Course Started Date
Course Ended Date
The primary key is (Student ID, Course ID). This allows us to query on the courses that a student is taking. We also have a unique index on (Course ID, Student ID). This allows us to query on the students attending a course.
Now that we've established the start and end date of the course, we can keep a record of each student's attendance,
We need one more class to complete the relationships, Attendance. Here's what attendance looks like.
Attendance
----------
Student ID
Course ID
Start Date
End Date
Is Present
This table has a primary key of (Student ID, Course ID, Start Date). There's also a unique index on (Course ID, Student ID, Start Date).
The idea here is that for each student, you create enough rows to describe a student's presence or absence on a particular range of dates. If you want to make this easier, remove the End Date from the table, and you'll have a row for each date of the class.
I'm not sure what your example row is telling me, but here's what I mean.
Student ID Course ID Start Date End Date Is Present
123 456 2015-01-01 2015-01-14 true
123 456 2015-01-15 2015-01-21 false
123 456 2015-01-22 2015-01-31 true
Since all of the dates are covered, you can query using the SQL clause WHERE "Start Date" IS BETWEEN date AND date to get the rows you want.

Related

Is mysql database logic suitable for query?

I am designing a database, and I would like to know;
Can I answer this question with queries, how much skill employees earned from this trainings?
Is this a good structure to do it?
how much money spent per department
how much skill earned per employee
how much skill earned per department
id session_name Skill impact sugg dept function training_value training no
1 PHP Software 3 Sales 2 100usd 1
2 PHP Software 3 Finance 2 100usd 1
3 PHP communication 2 Sales 2 100usd 1
4 PHP communication 2 Finance 2 100usd 1
5 ASP Software 4 Sales 2 200usd 2
6 ASP Software 4 Finance 2 200usd 2
7 ASP database 1 Sales 2 200usd 2
8 ASP database 1 Finance 2 200usd 2
attended training table
id student_id training_no
1 1 1
1 1 2
student table
id name department
1 John 1
2 Mary 2
department table
id name
1 sales
2 finance
In the end I need to find skills for each student
john
software 7
communication 2
database 1
total spent
john 300 usd
total spent by department
sales 300 usd

Your schema looks OK to me.
You should, however, think about entities and relationships.
Your entities seem to be trainings, people, and departments.
You have a many:many relationship for people:trainings. That's good.
You have a one:many relationship for departments:people. That's also good.
It looks like you want some kind of relationship for trainings:departments. I'm guessing here, but you have a sugg dept column in your trainings table. Is that supposed to have a direct relationship to your departments table?
Do you actually need an extra entity called "attendance" rather than just a many-to-many relationship people:trainings. Do you want to record when a person did a training? Do you want to record how much that particular attendance cost? How about what marks they received if there was a quiz?
In that case, you'll want relationships where each person has zero or more attendances, each attendance has exactly one training, and each training has zero or more attendances.
My point: do the hard work of thinking through your entities and relationships, and the result will be a good design for your tables.
If I may put it another way: What part of the real world are you trying to capture in your data base? What's valuable in the real world that you want your data base to hold? In your application ...
Students are people. They are, umm, inherently valuable and persistent entities.
Trainings represent the labor and cost of creating them and presenting them.
Attendances represent the effort of students.
Departments probably pay the bill for attendances. They certainly represent power centers in your application.
What other items of value exist in this corner of the real world? Teachers? Managers? Venues (classrooms)? Equipment? Customers?
My point is, figure out your entities -- the items of value -- and the relationships between them. Then write your table definitions.

Optimizing query response time

I have a live, filterable report in my web app which is querying a list of loans and loan payments in MySQL. The goal is to display each loan in a table row and then a list of its loan payments in table columns that each represent a sum of loan payments for that day. We also allow the user to select a date range and aggregation level (daily / weekly / monthly). If the user chose Sept 1-3 with daily aggregation, the results would look like this:
Loan ID | sept 1 | sept 2 | sept 3
---------------------------------------
0001 | $350 | $239.45 | $112
0002 | $100 | $0 | $75
The 2 database tables are Loan and Payment where Payment stores the Loan ID, date, and amount of each payment.
When we run this query on a 60 day range, the result is ~45sec response time. We then tried to create our own pre-aggregated table which was 366 columns per year (Loan ID + daily date columns representing the sum of payments on that day). This increased the response time to > 60sec. That is not even including weekly or monthly aggregation which is even slower.
How can we speed this up? We're ideally looking for 10-15 sec response time, and I have tried every caching / indexing technique I can find without success.

You should discuss with the business what are business requirements or practical application of the table with 60 columns?
The result table looks fine for Sep1-3 example, but for 60 days date range? Who would look at this table? Would it better to group by weeks or months?
If the number of loans is limited

MS Access 2007: How to summarize two rows by date and customer

Data:
Customer | Ship_Date | Ship_Weight
Peter 08/01/14 120
Peter 08/01/14 285
How do I summarize these two rows to get an answer by date:
Customer | Ship Date | Ship Weight
Peter 08/01/14 405
As you can see, there are multiple shipments on a single day. I want to summarize it to show unique ship dates with total ship weight.
I am using MS Access 2007.

SELECT Customer, Ship_Date, Sum(Ship_Weight) as Sum_Weight
From tblMyTable
Group By Customer, Ship_Date
You're going to need to make sure your Ship_Date is in Date format only and not DateTime, otherwise it will group by both Date and Time. If necessary, you may need to format that within the query.

1-M Relationship database design

I'm trying to come up with a database design for the following scenario.
Student can register to a Programme, at a given time student can have
only one registered programme.However, he/she must be able to change
the registered programme at any given time (including registering to a new programme). Ultimately, student can be
registered to multiple programme but he must have only 1 active
programme.
I think it should be a 1-M relationship but how to handle this "1 active programme at a given time" situation?

Your student table will have the ProgramID in relation to the Program table for example that he/she chooses and would be the current program. Now, every time he/she change his/her program that ProgramID will change however there will be a ProgramHistory to record the changes.
So possible table would be Student, Program, ProgramHistory.
Example:
Student
StudentID Lastname Firstname Gender ProgramID
------------------------------------------------------
101 Smith Jason M 1
102 Jones Kate F 2
Program
ProgramID ProgramName
------------------------------
1 Computer Science
2 Nursing
3 Electrical Engineering
ProgramHistory
ID ProgramID StudentID Semester Year
-----------------------------------------------------
1 3 101 Spring 2014
2 2 102 Fall 2014
3 1 101 Fall 2014

To allow for tracking of the history of program enrollment, you need to have a ProgramHistory table that is the intersection of a many-to-many relationship between Student and Program
There are a couple of ways to ensure that there is only one active program at one time for a given student.
One way would be to put an active_program_key column in your student table and make it a foreign key to the Program table. This is probably not the best alternative, since it requires denormalizing data and the resulting duplication might result in data inconsistencies unless you take significant steps to avoid them.
Another option using declarative constraints is to create a unique index on the ProgramHistory table that includes the student_key and the enrollment_date. This ensures that a student can only enroll in once per given date. The active program will be the record with the latest date for any given student.
This second option is simple and avoids duplicating any data. In fairness, the query to retrieve current student enrollments will be slightly more complicated. As always, design is about trade-offs.
Assuming that students can change programs at just about any time (i.e. not just between semesters) then you want to have a program_start_date in your ProgramHistory table.

Mapping booleans to date ranges in a relational database

I'm not quite sure wether my title is worded right.
Say you have a database for a car rental service that contains information on what car will be rented out at what time(s). The cars can be rented out multiple days at a time, but never less than one day (so the time is atomic).
How would you fit that in to a relational database? Do you have a row for each date with a boolean representing wether the car will be in use that day? Or do you work it in some other way?
Extra question: What solution would make checking how many cars are rented out at a specific time the easiest/fastest?
thanks,
robin.

in the car rental table have a car rental checkout date and a car rental checkin date and you can tell if a car will be rented out on a given day by
WHERE $date_to_check BETWEEN checkout_date AND checkin_date

Why not make two tables: CARS and RENTALS?
In CARS you keep all the information about the physical car (model, date made, etc..).
In RENTALS you keep all the information about the rental itself (starting time rental, end time rental, etc...)
You relate them with a foreign key in RENTALS that points to the ID car.
CARS RENTALS
idCar | model | .... idRental | xIdCar | startDate | endDate
1 Honda 1 1 1/1/2010 10:30 1/1/2010 18:30
2 1 1/1/2010 19:00 2/1/2010 10:30
That should solve both your questions, since you only need to query the rentals for the dates you need and join with the cars

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008