MySQL: improving efficiency of sub-select query

MySQL: improving efficiency of sub-select query - mysql

I have a set of (MySQL) tables as shown below. I need to retrieve a list of Trouble Tickets, as well as the id of matching equipment in either of the equipment tables, based on matching model/serial numbers.
Note that model/serial number combinations should be unique across both equipment tables, but the model/serial number entry boxes are free-form, so there's the possibility the user could enter the same model/serial twice. In such case, it doesn't matter which equipment unit is retrieved, but only result should be returned per ticket, since we're displaying a list of tickets, not equipment.
'tickets' [Trouble Tickets table]
id [index]
model [Equipment Model Number]
serial [Equipment Serial Number]
... etc
user_equipment [User Equipment table]
id [index]
model [Equipment Model Number]
serial [Equipment Serial Number]
... etc
site_equipment [Onsite Equipment]
id [index]
model [Equipment Model Number]
serial [Equipment Serial Number]
... etc
Currently I'm using sub-queries to return the user and site equipment IDs, but performance is very poor:
SELECT tickets.*, ... other tables/columns here,
(SELECT id FROM user_equipment WHERE model = tickets.model AND serial = tickets.serial LIMIT 1) as user_equipment_id,
(SELECT id FROM site_equipment WHERE model = tickets.model AND serial = site_equipment.serial LIMIT 1) as site_equipment_id
FROM
tickets
... other joins here
WHERE ...
HAVING ...
ORDER BY ...
LIMIT ...
I would greatly appreciate any suggestions for improving the performance of this query. Changing the table structure is, unfortunately, not an option at this point, due to many other dependencies.
Thanks!

You want indexes on:
user_equipment(model, serial, id)
site_equipment(model, serial, id)
The first two columns can be in either order.

Related

Non-unique many-to-many table design

I'm implementing a voting system for a php project which uses mysql. The important part is that I have to store every voting action separately for statistic reasons. The users can vote for many items multiple times, and every vote has a value (think of it like a donation kinda stuff).
So far I have a table votes in which I'm planning to store the votes with the following columns:
user_id - ID of the voting user, foreign key from users table
item_id - ID of the item which the user voted for, foreign key from items table
count - # of votes spent
created - date and time of voting
I'll need to get things out of the table like: Top X voters for an item, all the items that a user have voted for.
My questions are:
Is this table design suitable for the task? If it is, how should I index it? If not, where did I go wrong?
Would it be more rewarding to create another table beside this one, which has unique rows for the user-item relationship (not storing every vote separately, but update the count row)?

Each base table holds the rows that make a true statement from some fill-in-the-(named-)blanks statement aka predicate.
-- user [userid] has name ...
-- User(user_id, ...)
SELECT * FROM User
-- user [user_id] voted for item [item_id] spending [count] votes on [created]
-- Votes(user_id, item_id, count, created)
SELECT * FROM Votes
(Notice how the shorthand for the predicate is like an SQL declaration for its table. Notice how in the SQL query a base table predicate becomes the table's name.)
Top X voters for an item, all the items that a user have voted for.
Is this table design suitable for the task?
That query can be asked using that design. But only you can know what queries "like" that one are. You have to define sufficient tables/predicates to describe everything you care about in every situation. If Votes records the history of all relevant info about all events then it must be suitable. The query "all the items that user User has voted for" returns rows satisfying predicate
-- user User voted for item [item] spending some count on some date.
-- for some count & created,
user User voted for item [item_id] spending [count] votes on [created]
-- for some count & created, Votes(User, item_id, count, created)
-- for some user_id, count & created,
Votes(user_id, item_id, count, created) AND user_id = User
SELECT item_id FROM Votes WHERE user_id = User
(Notice how in the SQL the condition turns up in the WHERE and the columns you keep are the ones that you care about. More here and here on querying.)
If it is, how should I index it?
MySQL automatically indexes primary keys. Generally, index column sets that you JOIN ON, otherwise test, GROUP BY or ORDER BY. MySQL 5.7 Reference Manual 8.3 Optimization and Indexes
Would it be more rewarding to create another table beside this one, which has unique rows for the user-item relationship
If you mean a user-item table for some count & created, [user_id] voted for [item_id] spending [count] votes on [created] and you still want all the individual votings then you still need Votes, and that user-item table is just SELECT user_id, item_id FROM Votes. But if you want to ask about people who haven't voted, you need more.
(not storing every vote separately, but update the count row)
If you don't care about individual votings then you can have a table with user, item and the sum of count for user-item groups. But if you want Votes then that user-item-sum table is expressible in terms of Votes using GROUP BY user_id, item_id & SUM(count).

Alternatives to junction table?

I'm designing a relational database tables for storing data about eCommerce scenario where I need to store
List of Products purchased by a user
List of users who purchased a particular product.
Its a many to many relationship.
So far I could only thinking of doing this.
create a table for storing orders
table recordorders(
userID // foreign key from users table
productID, // foreign key from products table
dateofpurchase,
quantity,
price_per_unit,
total_amount
)
It will act like a junction table.
Is this a good approach and are there any other methods than junction table that are more effective and efficient for querying ?

Your bullets describe two tables, not one. Your junction table is not properly described as two lists. It is a set of order info rows. The junction table you gave holds rows where "user [userID] purchased product [productID] on ...". Ie it records order info. (Combinations of user, product, date, etc of orders.) Given a user or product, you can get the corresponding bullet table by querying the order info table.
However your table probably needs another column that is a unique order id. Otherwise it cannot record that there are two orders that are alike in all those columns. (Eg if the same person buys the same product on the same date in the same quantity, price and total.) Ie its rows probably aren't 1:1 with orders. That's why above I called it an order info table rather than an order table. It records that some order had those properties; but it doesn't record distinct orders if there can be orders with the same info. It's really a many-to-many-to-etc (for every column) association. That is why an order id gets picked as a unique name for an order as further info. This new table would be called an entity table, not a junction or association table. It holds rows where "in order [id] user [user] purchased ...".
PS An order is usually something that can be characterized as an association on/among/between an order id, user, set of order-lines (product, quantity, price & total), and other stuff (date, grand total, etc). The orders are usually relationally characterized by an order entity table on order id with its user, date etc plus an order-line association table on order ids and their order-line info.
PPS It's time for you to read a book about information modeling and database design.

You don't "store" those two things in a table (Junction, or otherwise), you discover them from the raw ("Fact") data:
Using your proposed table:
List of Products purchased by a user:
SELECT productID
FROM recordorders
WHERE userID = 123;
List of users who purchased a particular product:
SELECT userID
FROM recordorders
WHERE productID = 987;

Pulling different records from multiple tables as one transaction history list

I am working on an employee management/reward system and need to be able to show a single "transaction history" page that shows in chronological order the different events that the employee has experienced in one list. (Sort of like how in facebook you can goto your history/action section and see a chronological list of all the stuff that you have done and affects you, even though they are unrelated to eachother and just have you as a common user)
I have different tables for the different events, each table has an employee_id key and an "occured" timestamp, some table examples:
bonuses
customers
raise
complaints
feedback
So whenever an event occurs (ie a new customer is assigned to the employee, or the employee gets a complaint or raise) a new row is added to the appropriate table with the employee ID it affects and a timestamp of when it occured.
I need a single query to pull all records (upto 50 for example) that include the employee and return a history view of that employee. The field names are different in each table (ie the bonus includes an amount with a note, the customer includes customer info etc).
I need the output to be a summary view using column names such as:
event_type = (new customer, bonus, feedback etc)
date
title (a brief worded title of the type of event, specified in sql based on the table its referencing)
description (verbiage about the action, such as if its event_type bonus display the bonus amount here, if its a complain show the first 50 characters of the complaint message or the ID of the user that filed the complaint from the complaints table. All done in SQL using if statements and building the value of this field output based on which table it comes from. Such as if its from the customers table IF current_table=customers description='A customer was assigned to you by'.customers.assigner_id).
Ideally,
Is there any way to do this?
Another option I have considered, is I could do 5-6 different queries pulling the records each from their own table, then use a mysql command to "mesh/interleave" the results from all the queries into one list by chronological order. That would be acceptable too

You could use a UNION query to merge all the information together and use the ORDER BY clause to order the actions chronologically. Each query must have the same number of fields. Your ORDER BY clause should be last.
The examples below assume you have a field called customer_name in the customers table and bonus_amount in the bonuses table.
It would look something like this:
SELECT 'New Customer' as event_type, date,
'New customer was assigned' as title,
CONCAT('New Customer: ', customer_name, ' was assigned') as description
FROM customers
WHERE employee_id = 1
UNION
SELECT 'Bonus' as event_type, date,
'Received a bonue' as title,
CONCAT('Received a bonus of $', FORMAT(bonus_amount, 2), '.') as description
FROM bonuses
WHERE employee_id = 1
UNION
...
ORDER BY date DESC;

Database Normalization design Issue: 2 tables sharing almost the same information

Example Scenario: Let say I only have 2 types of buyers for a product house.
Buyer 1: Individual Buyer
Buyer 2: Company Buyer
Distinct for Buyer 1 attributes: FName, LName, Bdate and Age
Distinct for Buyer 2 attributes: Company_Name, Nature_Of_Business and Type_Of_Business
Common for the 2 Buyers are: Address, Email, Tel_No, Country
Db Table Name: Buyer
Attributes: BuyerID, BuyerType **FName,LName,Bdate,Age**,Company_Name, Nature_Of_Business and Type_Of_Business
My Explanation: If you try to see the table Buyer, both the attributes of Individual and Company buyer are merged because they are all buyers they are just categorized base on BuyerType(Individual or Company)
Issue: If the buyer type is a Company then my Individual Attributes(i.e.FName,LName etc.) will be recorded empty or vice versa. My thoughts prevents me to separate them because I DON'T want to create Buyer's ID for each table.They should have only 1 BuyerID whether they are Individual or a Company.
Problem: How to construct DB Table(s) to solve this query:
I want report that shows all the buyers information with no empty records.
Sounds crazy but when the possible report will be generated, the details might give empty records of the Individual buyer fields if the buyer is type Company
Note: This can be done easily if I will filter specific Buyer type but that is not the case. I want all.

You can calculate age from birth date, so there's no need to store the age.
You have a buyer table and separate individual buyer and company buyer tables.
Buyer
------
Buyer ID
Buyer Type
Buyer Type ID
Address
Country
Email
Telephone Number
Individual Buyer
----------------
Individual Buyer ID
Last Name
First Name
Birth Date
Company Buyer
-------------
Company Buyer ID
Company Name
Type of Business
Nature of Business
Buyer Type is an indicator that points to the particular sub table for this buyer. 'I' for individual and 'C' for company would be one way to define the indicator.
Buyer Type ID is the foreign key to either Individual Buyer ID or Company Buyer ID.
The SQL to get all the buyer information would be
SELECT *
FROM Buyer, "Individual Buyer", "Company Buyer"
WHERE "Buyer ID" = 12345
AND (("Buyer Type ID" = "Individual Buyer ID") AND ("Buyer Type" = 'I'))
OR (("Buyer Type ID" = "Company Buyer ID") AND ("Buyer Type" = 'C'))
If you want more than one Buyer row, adjust the WHERE clause.

Your logical schema will probably have three distinct entities: an abstract Buyer that contains all common fields, and two subentities that inherit from it: Individual Buyer and Company Buyer.
How you implement that schema physically is up to you. Usually, all logical entities that share the same primary key (here buyerID) will be merged in the same physical table.
Having a single table makes sense:
from a performance point of view: filtering takes less resources than joining generally. DML will also be a lot faster with a single table.
from an integrity point of view: it is very easy to insert invalid data when you have multiple tables. For instance, it is hard to guarantee that a buyerID has at least one and at most one row in the subentities if you have three tables.
I would go for a single physical table with constraints:
CREATE TABLE buyer (BuyerID primary key, BuyerType,
FName,LName,Bdate,
Company_Name, Nature_Of_Business, Type_Of_Business,
CONSTRAINT individual_chk
CHECK (BuyerType = 2 OR (Company_name IS NULL AND
Nature_Of_Business IS NULL AND
Type_Of_Business IS NULL)
),
CONSTRAINT company_chk
CHECK (BuyerType = 1 OR (...))
)
The check constraints would also validate that the required fields are not null for each type.
You can then create views if you need access to individual and business separately:
CREATE VIEW individual_buyer IS
SELECT BuyerID,
FName,LName,Bdate
FROM buyer
WHERE buyerType = 1
CREATE VIEW company_buyer IS
SELECT BuyerID,
Company_Name, Nature_Of_Business, Type_Of_Business
FROM buyer
WHERE buyerType = 2

In MYSQL, how to summarise query results based on the parameters not specified in the query?

I have a MySQL table with around 4 million+ rows. Let us say the table is as follows:
Columns in table Person:
Id
Name
Age
Marital Status
Education Level
'Location Country'
'Description'
When I run a query based on Age, I also want to have a summary count of people with the same age in different marital status and also with different 'Education Level' and 'Location Country'.
When I run a query based on Age and Education Level, I also want to have a summary count of people with the same age and Education Level in different marital status and also with different 'Location Country'.
For example, the query issued would be SELECT * FROM Person WHERE Age = 27;. I also want results that would be produced by SELECT Education Level, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Education Level; and SELECT Location Country, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Location Country;
Also, this becomes more challenging for me when I have to do a search based on keywords on description and want a summary count on each of the other columns. The application I am developing is a sort of search engine. This can be seen in sites like Ebay,
I can possibly run these queries separately. But, with 4 million rows, the GROUP BY query will take substantial amount of time. This is an internet application and the query should complete within few seconds.
Any help would be much appreciated.

You can do both in one query
SELECT p.*, count(p2.id)
FROM Person p, Person p2
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education
GROUP BY p1.id
In such situation, I would suggest to save data in a memcache cache. You can expire cache if new data inserted to table or after some expiration time, to avoid long query execution. Another improvement would be using a LIMIT to reduce number of row returned by DB like this:
SELECT p.*, count(p2.id)
FROM Person p, Person p2
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education
GROUP BY p1.id
LIMIT 10

From what you are describing, I would have a separate aggregate table to query directly from that has those "roll-up" stats you want. How frequent is the "Person" table getting added to / changed. If you are only storing a person's "Age", what is the basis of the age if no date, and you add the person again in future they would have multiple records... such that
At age X, so many people were married (or not) and had this level of education.
At age Y, so many people... etc..
I would create a summary table, something like
create table AgeStat (
age int,
married int,
single int,
divorced int,
HighSchool int,
Associates int,
Bachelors int,
Masters int,
Doctorate int )
Then, add a trigger to the person table such that during insert (or inclusive of update/delete as needed), the new record just adds 1 to each respective count applicable.
Then, for your web app, it would be instantaneous to grab one record from this summary table where age = 27 and you have ALL your classification stats.
However, if you distinctly wanted to know how many Married with Masters degree, you would have to roll back to master person list.
Alternatively, you could do a similar pre-aggregation but down a level of granularity something like
create table AgeStat (
age int,
maritalstat int, -- but I would actually use an enumerated value for marital status
educationlevel int, -- and education level vs a hard description of each.
peoplecount int )
and likewise have a trigger that updates the count based on the two combination elements per age. Then, if you wanted the total "Married", you can sum(peoplecount) for age = 27 and maritalstat=(enumerator for "married" value)
Good luck, and hope it helps alternative solution for you.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008