MySQL architecture: null columns vs. joins - mysql

I have an application where I'll have repeating events. So an event can repeat by day, "every n days", by week, "every n weeks on Mon/Tue/Wed/etc", and by month, "every n months on the 1st,2nd,3rd,etc".
What is the best way to handle this from a table design perspective? I can think of two ways but I'm not sure which one is better.
1) 5 columns for the above, 1 for the day case and 2 each for week and month. Whichever ones are not being used would be null. In my application I could see the nulls and choose to ignore them.
2) Have a second table, say events_dateinfo or something, against which I'd JOIN for the query.
Seems like option 2 is probably more 'normalized' and what not, but does it strike you as overkill for such a simple thing? Also, if I were to go option 2, is there a way to translate rows into columns - that is, select the 2 week attributes for a specific event and have them treated as columns?

If I understood right event can have more than 1 schedule (this is why you want " to translate rows into columns ").
You will need not 2 but 3 tables in this case; third one must be junction table. You can easily add new schedules if you need in the future with this scheme.
So, something like this:
table events (event_id, event_name, description)
table schedules (sch_id, schedule)
table event_schedule (event_id, sch_id)
There isn't PIVOT possibility in MySQL as I know, but you can use GROUP_CONCAT() function in SELECT; it'll be one row per event and all schedules for one event will be in one column.
SELECT e.event_name AS Event, GROUP_CONCAT( s.schedule SEPARATOR ', ') AS Schedule
FROM events e
(LEFT) JOIN event_schedule es
ON e.event_id = es.event_id
JOIN schedules s
ON s.sch_id = es. sch_id
GROUP BY e.event_name;

I would prefer to handle this normallized, The events in one table, and the event recurrency in another.
Handling the indexes in a appropriate way, you can handle the request for data through views, or if data gets larger, as an audit table with triggers.

Related

Data design best practices for customer data

I am trying to store customer attributes in a MySQL database although it could be any type of database. I have a customer table and then I have a number of attribute tables (status, product, address, etc.)
The business requirements are to be able to A) look back at a point in time to see if a customer was active or what address they had on any given date and B) have a customer service rep be able to put things like entering future vacation holds. I customer might call today and tell the rep they will be on vacation next week.
I currently have different tables for each customer attribute. For instance, the customer status table has records like this:
CustomerID
Status
dEffectiveStart
dEffectiveEnd
1
Active
2022-01-01
2022-05-01
1
Vacation
2022-05-02
2022-05-04
1
Active
2022-05-05
2099-01-01
When I join these tables the sql typically looks like this:
SELECT *
FROM customers c
JOIN customerStatus cs
on cs.CustomerID = c.CustomerID
and curdate() between cs.dEffectiveStart and cs.dEffectiveEnd
While this setup does work as designed, it is slow. The query joins themselves aren't too bad, but when I try to throw an Order By on its done. The typical client query would pull 5-20k records. There are 5-6 other similar tables to the one above I join to a customer.
Do you any suggestions of a better approach?
That ON clause is very hard to optimize. So, let me try to 'avoid' it.
If you are always (or usually) testing CURDATE(), then I recommend this schema design pattern. I call it History + Current.
The History table contains many rows per customer.
The Current table contains only "current" info about each customer -- one row per customer. Your SELECT would need only this table.
Your design is "proper" because the current status is not redundantly stored in two places. My design requires changing the status in both tables when it changes. This is a small extra cost when changing the "status", for a big gain in SELECT.
More
The Optimizer will probably transform that query into
SELECT *
FROM customerStatus cs
JOIN customers c
ON cs.CustomerID = c.CustomerID
WHERE curdate() >= cs.dEffectiveStart
AND curdate() <= cs.dEffectiveEnd
(Use EXPLAIN SELECT ...; SHOW WARNINGS; to find out exactly.)
In a plain JOIN, the Optimizer likes to start with the table that is most filtered. I moved the "filtering" to the WHERE clause so we could see it; I left the "relation" in the ON.
curdate() >= cs.dEffectiveStart might use an index on dEffectiveStart. Or it _might` use an index to help the other part.
The Optimizer would probably notice that "too much" of the table would need to be scanned with either index, and eschew both indexes and simply do a table scan.
Then it will quickly and efficiently JOIN to the other table.

Mysql DELETE where ID isn't present in multiple tables - best practice?

I want to delete people that aren't present in events or photos or email subscribers. Maybe they were, but the only photo they're tagged in gets deleted, or the event they were at gets purged from the database.
Two obvious options:
1)
DELETE FROM people
WHERE personPK NOT IN (
SELECT personFK FROM attendees
UNION
SELECT personFK FROM photo_tags
UNION
SELECT personFK FROM email_subscriptions
)
2)
DELETE people FROM people
LEFT JOIN attendees A on A.personFK = personPK
LEFT JOIN photo_tags P on P.personFK = personPK
LEFT JOIN email_subscriptions E on E.personFK = personPK
WHERE attendeePK IS NULL
AND photoTagPK IS NULL
AND emailSubPK IS NULL
Both A & P are about a million rows apiece, and E a few thousand.
The first option works fine, taking 10 seconds or so.
The second option times out.
Is there a cleverer, better, faster third option?
This is what I would do with, say, a multi-million row half-fictitious schema like above.
For the person, I would add count columns, 1 each, related to the child tables, and a datetime. Such as
photoCount INT NOT NULL,
...
lastUpdt DATETIME NOT NULL,
When it comes time for an INSERT/UPDATE on child tables (main focus naturally being insert), I would
begin a transaction
perform a "select for update" which renders an Intention Lock on the parent (people) row
perform the child insert, such as a new picture or email
increment the parent relevant count variable and set lastUpdt=now()
commit the tran (which releases the intention lock)
A delete against a child row is like above but with a decrement.
Whether these are done client-side / Stored Procs/ Trigger is your choice.
Have an Event see 1 and 2 that fires off once a week (you choose how often) that deletes people rows that have lastUpdt greater than 1 week and the count columns all at zero.
I realize the Intention Lock is not an exact analogy but the point about timeouts and row-level locking and the need for speed are relevant.
As always carefully craft your indexes considering frequency of use, real benefit, and potential drags on the system.
As for any periodic cleanup Events, schedule them to run in low peak hours with the scheduler.
There are some natural downsides to all of this. But if those summary numbers are useful for other profile pages, and fetching them on the fly is too costly, you benefit by it. Also you certainly evade what I see in your two proposed solutions as expensive calls.
I try duplicate your scenario here using postgreSQL. But I think there is something else you didnt tell us.
Both A & P are about a million rows apiece, and E a few thousand.
table people= 10k records
I select 9500 record at random and insert into email_subscriptions
Then duplicate those 9500 records 100 times for attendees and photo_tags total 950k on each table
SQL FIDDLE DEMO
First query need 5 sec
Second one need 11 millisec.

MySQL query that joins a table to 2 others

What I have is 3 tables (starred ids may be null):
ITEMS
id|name|cost
EVENTS
id|name|date|assignment|items
ASSIGNMENTS
id|name|start|items
where ITEMS contains lines of cost - an event may cost X and the assignment/project owning that event may have its own lines of cost. All Items in Events and assignments tables are referenced as comma separated lists of ids.
Given an Assignment, I'd like to get
ASSIGNMENTS.NAME|EVENTS.NAME|ITEMS.NAME|ITEMS.COST
Assignment A management 10.00
Assignment A event A travel exp 60.00
Assignment A event A day cost 100.00
Assignment A event B day cost 90.00
I tried the subquery way, building a subquery that returns a list of Items with
SELECT assignments.name, events.name,
concat(events.items, ",", assignments.items)
FROM assignments left join events
ON find_in_set(events.assignment, assignments.id) where assignments.id=2
but that way I would get the assignment item listed twice and, what's worst, I would get a line (assignment without event) with a field starting with comma.
I also tried joining twice the same table, but then MySQL remembered me that I cannot do it.
Any idea on how to solve this?
try this way
SELECT assignments.name, events.name,
concat(events.items, ",", assignments.items)
FROM assignments left join events
ON events.assignment = assignments.id
where assignments.id=2

MySQL Union (or similar) query

I have some booking data from a pair of views in MySQL. They match columns perfectly, and the main difference is a booking code that is placed in one of these rows.
The context is as follows: this is for calculating numbers for a sports camp. People are booked in, but can do extra activities.
View 1: All specialist bookings (say: a football class).
View 2: A general group.
Due to the old software, the booking process results in many people booking for the general group and then are upgraded to the old class. This is further complicated by some things elsewhere in the business.
To be clear - View 1 actually contains some (but are not exclusively all) people from within View 2. There's an intersection of the two groups. Obviously people can't be in two groups at once (there's only one of them!).
Finding all people who are in View 2 is of course easy... as is View 1. BUT, I need to produce a report which is basically:
"View 1" overwriting "View 2"... or put another way:
"View 1" [sort of] UNION "View 2"
However: I'm not sure the best way of doing this as there are added complications:
Each row is as approximately (with other stuff omitted) as follows:
User ID Timeslot Activity
1 A Football
1 A General
2 A General
3 A Football
As you can see, these rows all concern timeslot A:
- User 2 does general activities.
- User 3 does football.
- User 1 does football AND general.
AS these items are non unique, the above is a UNION (distinct), as there are no truly distinct rows.
The output I need is as follows:
User ID Timeslot Activity
1 A Football
2 A General
3 A Football
Here, Football has taken "precedence" over "general", and thus I get the picture of where people are at any time.
This UNION has a distinct clause on a number of fields, but ignores others.
So: does anyone know how to do what amounts to:
"add two tables together and overwrite one of them if it's the same timeslot"
Or something like a:
"selective distinct on UNION DISTINCT".
Cheers
Rick
Try this:
SELECT *
FROM
(SELECT *,
IF(Activity='General',1,0) AS order_column
FROM `Table1`
ORDER BY order_column) AS tmp
GROUP BY UserId
This will add an order_column to your original table that as value 1 if the Activity value is general; Doing this we can select this temporary table ordering by this column (ascending order) and all record with general activity comes after all others. After that we can simply select the result of this temporary table grouping by user id. The group by clouse without any aggregate function takes the first record that match.
EDIT:
If you don't to use group by without aggregate function this is an 'ugly' alternative:
SELECT UserId,
Timeslot,
SUBSTRING(MAX(CASE Activity WHEN "General" THEN "00General" WHEN "Football" THEN "01Football" ELSE Activity END) , 3)
FROM `Table1`
GROUP BY UserId,
Timeslot LIMIT 0 ,
30
Here we need to define each possible value for Activity.

Find first, second, third, and so forth record per person

I have a 1 to many relationship between people and notes about them. There can be 0 or more notes per person.
I need to bring all the notes together into a single field and since there are not going to be many people with notes and I plan to only bring in the first 3 notes per person I thought I could do this using at most 3 queries to gather all my information.
My problem is in geting the mySQL query together to get the first, second, etc note per person.
I have a query that lets me know how many notes each person has and I have that in my table. I tried something like
SELECT
f_note, f_person_id
FROM
t_person_table,
t_note_table
WHERE
t_person_table.f_number_of_notes > 0
AND t_person_table.f_person_id = t_note_table.f_person_id
GROUP BY
t_person_table.f_person_id
LIMIT 1 OFFSET 0
I had hoped to run this up to 3 times changing the OFFSET to 1 and then 2 but all I get is just one note coming back, not one note per person.
I hope this is clear, if not read on for an example:
I have 3 people in the table. One person (A) has 0 notes, one (B) with 1 and one (C) with 2.
First I would get the first note for person B and C and insert those into my person table note field.
Then I would get the second note for person C and add that to the note field in the person table.
In the end I would have notes for persons B and C where the note field for person C would be a concatination of their 2 notes.
Welcome to SO. The thing you're trying to do, selecting the three most recent items from a table for each person mentioned, is not easy in MySQL. But it is possible. See this question.
Select number of rows for each group where two column values makes one group
and, see my answer to it.
Select number of rows for each group where two column values makes one group
Once you have a query giving you the three rows, you can use GROUP_CONCAT() ... GROUP BY to aggregate the note fields.
You can get one note per person using a nested query like this:
SELECT
f_person_id,
(SELECT f_note
FROM t_note_table
WHERE t_person_table.f_person_id = t_note_table.f_person_id
LIMIT 1) AS note
FROM
t_person_table
WHERE
t_person_table.f_number_of_notes > 0
Note that tables in SQL are basically without a defined inherent order, so you should use some form or ORDER BY in the subquery. Otherwise, your results might be random, and repeated runs asking for different notes might unexpectedly return the same data.
If you only aim for a concatenation of notes in any case, then you can use the GROUP_CONCAT function to combine all notes into a single column.