MySQL Views - Use or Not Use [closed]

MySQL Views - Use or Not Use [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I wanted to ask if View are really worth using.
From my understanding a view is really just a query and each time you query the view the view then runs its own query again to get fresh/uptodate data.
This sounds to me like 2 queries are run.
wouldn't it be faster to just run the query required and skip the view?
Please note: I would be using simple views but even it they were quite complex I assume the same principle applies.
My type of view - say 3 tables with 6 columns each - and 2 columns of each time is added into the view with a couple of maths equations to refine the data a touch.
What do others do? Skip or use them?

Typically Views are set up to make selects easier to understand and at the same time give guidance to the database engine on how to optimize the query. By creating a view you tell the database engine that you're going to be selecting from this frequently and to spend more time optimizing the query plan so selects from the view will be faster. The upside of this is that when it comes time to parse the query and plan the query you'll save some execution time because the optimization has already been performed. It could be as little as a few miliseconds you save, or potentially very large (for very large result sets)

You're correct that views are not designed to be a performance benefit in MySQL.
What they are designed to do is make other queries built on them to be simpler to read, and to make sure that other users and programmers have a better chance at using the data correctly. Think of them as a way to virtually de-normalize the data without taking the size/performance hit of actually de-normalizing the data.
Just as the most simple case, let's just take orders and line items. Each order has a line item.
The orders table might have the following columns:
ID
Status
Created_at
Paid_on
And the line_items table might have the following columns:
LI_ID
order_id
sku_id
quantity
price
What you'll find, when writing code and queries is that you are going to be doing the following join all the time -
orders
join line_items on line_items.order_id = orders.id
This could be simplified by creating a view:
create view 'order_lines' as
select * from orders
join line_items on line_items.order_id = orders.id
So your query would go from:
select orders.id, sum(price) from orders
join line_items on line_items.order_id = orders.id
where created_at >= '2011-12-01' and created_at < '2012-01-01
group by orders.id;
to:
select id, sum(price) from order_lines
where created_at >= '2011-12-01' and created_at < '2012-01-01
group by id;
The DB will execute both of these exactly the same way, but one is easier to read. Admittedly in this case, not MUCH easier to read, but easier to read and code.

The query optimizer is usually able to combine the view query with the query that uses the view in such a way that only a single query is run, so the objection you have to views doesn't really apply.

See also:
MySQL VIEW as performance troublemaker
View vs. Table Valued Function vs. Multi-Statement Table Valued Function
Should I use a view, a stored procedure, or a user-defined function?
Regards

Views can be provided to applications or users that needs a straight-forward view of data that isn't necessarily in one table (or limited fields from one table). That means they don't have to understand the data and how it relates -- they just get the data they need. You create the complex query, optimize it, and they just use the resulting view.

Related

Optimize query from view with UNION ALL

So I'm facing a difficult scenario, I have a legacy app, bad written and designed, with a table, t_booking. This app, has a calendar view, where, for every hall, and for every day in the month, shows its reservation status, with this query:
SELECT mr1b.id, mr1b.idreserva, mr1b.idhotel, mr1b.idhall, mr1b.idtiporeserva, mr1b.date, mr1b.ampm, mr1b.observaciones, mr1b.observaciones_bookingarea, mr1b.tipo_de_navegacion, mr1b.portal, r.estado
FROM t_booking mr1b
LEFT JOIN a_reservations r ON mr1b.idreserva = r.id
WHERE mr1b.idhotel = '$sIdHotel' AND mr1b.idhall = '$hall' AND mr1b.date = '$iAnyo-$iMes-$iDia'
AND IF (r.comidacena IS NULL OR r.comidacena = '', mr1b.ampm = 'AM', r.comidacena = 'AM' AND mr1b.ampm = 'AM')
AND (r.estado <> 'Cancelled' OR r.estado IS NULL OR r.estado = '')
LIMIT 1;
(at first there was also a ORDER BY r.estado DESC which I took out)
This query, after proper (I think) indexing, takes 0.004 seconds each, and the overall calendar view is presented in a reasonable time. There are indexes over idhotel, idhall, and date.
Now, I have a new module, well written ;-), which does reservations in another table, but I must present both types of reservations in same calendar view. My first approach was create a view, joining content of both tables, and selecting data for calendar view from this view instead of t_booking.
The view is defined like this:
CREATE OR REPLACE VIEW
t_booking_hall_reservation
AS
SELECT id,
idreserva,
idhotel,
idhall,
idtiporeserva,
date,
ampm,
observaciones,
observaciones_bookingarea,
tipo_de_navegacion, portal
FROM t_booking
UNION ALL
SELECT HR.id,
HR.convention_id as idreserva,
H.id_hotel as idhotel,
HR.hall_id as idhall,
99 as idtiporeserva,
date,
session as ampm,
observations as observaciones,
'new module' as observaciones_bookingarea,
'events' as tipo_de_navegacion,
'new module' as portal
FROM new_hall_reservation HR
JOIN a_halls H on H.id = HR.hall_id
;
(table new_hall_reservation has same indexes)
I tryed UNION ALL instead of UNION as I read this is much more efficient.
Well, the former query, changing t_booking for t_booking_hall_reservation, takes 1.5 seconds, to multiply for each hall and each day, which makes calendar view impossible to finish.
The app is spaguetti code, so, looping twice, once over t_booking and then over new_hall_reservation and combining results is somehow difficult.
Is it possible to tune the view to make this query fast enough? Another approach?
Thanks
PS: the less I modify original query, the less I'll need to modify the legacy app, which is, at less, risky to modify

This is too long for a comment.
A view is (almost) never going to help performance. Yes, they make queries simpler. Yes, they incorporate important logic. But no, they don't help performance.
One key problem is the execution of the view -- it doesn't generally take the filters in the overall tables into account (although the most recent versions of MySQL are better at this).
One suggestion -- which might be a fair bit of work -- is to materialize the view as a table. When the underlying tables change, you need to change t_booking_hall_reservation using triggers. Then you can create indexes on the table to achieve your performance goals.'

t_booking, unless it is a VIEW, needs
INDEX(idhotel, idhall, date)
VIEWs are syntactic sugar; they do not enhance performance; sometimes they are slower than the equivalent SELECT.

Database relationships why should I use them? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am just setting up a website which has an order and a chat correspondence for each order. I have an orders table with each order specifying a single user.
If I add a relationship to the table how does that effect it? Would I still query the database using JOIN LEFT method?
Why should I use it?
I would like to confirm that my ORDERS table has user_id field and my users table does NOT have order_id field, is that correct thinking?
I have yet to do the same to chat feature on each order since I am still trying to learn what fields I need in that table to correctly query with php.

You are talking about two different concepts here:
Data input: Relationships is one of the methods used to ensure data integrity. It restricts what data can be inserted to the database.
Data retrieval: The LEFT JOIN that you mentioned is on of the methods used to retrieve the data from the database. It, kind of, filters the data so you get what you want instead of returning the entire table(s).
You don't have to use any of those two. They are there to help you, but they are not required and you can achieve similar results by other means. More importantly, they are completely unrelated. If you use relationships, you are not required to use LEFT JOIN or any other joins. And if you don't use relationships, you can still use joins and get the same results.
If you don't use relationships, your app can dictates how the data are entered to the database. If you don't use joins, you can use sub-queries for example. Which way to go is really greatly dependant on your requirements, but probably for most scenarios, using relationships and joins is the way to go.
For example the two queries below are equivalent. The first one uses joins, and the second one does not:
SELECT users.name, orders.id
FROM users
INNER JOIN orders ON users.id = orders.user_id
SELECT users.name, orders.id
FROM users, orders
WHERE users.id = orders.user_id

Performance issues of mysql query as compared to oracle query

I have created a complex view which gives me output within a second on Oracle 10g DBMS.. but the same view takes 2,3 minutes on MYSQL DBMS.. I have created indexes on all the fields which are included in the view definition and also increased the query_cache_size but still failed to get answer in less time. My query is given below
select * from results where beltno<1000;
And my view is:
create view results as select person_biodata.*,current_rank.*,current_posting.* from person_biodata,current_rank,current_posting where person_biodata.belt_no=current_rank.belt_no and person_biodata.belt_no=current_posting.belt_no ;
The current_posting view is defined as follows:
select p.belt_no belt_no,ps.ps_name police_station,pl.pl_name posting_as,p.st_date from p_posting p,post_list pl,police_station ps where p.ps_id=ps.ps_id and p.pl_id=pl.pl_id and (p.belt_no,p.st_date) IN(select belt_no,max(st_date) from p_posting group by belt_no);
The current_rank view is defined as follows:
select p.belt_no belt_no,r.r_name from p_rank p,rank r where p.r_id=r.r_id and (p.belt_no,p.st_date) IN (select belt_no,max(st_date) from p_rank group by belt_no)

Some versions of MySQL have a particular problem with in and subqueries, which you have in this view:
select p.belt_no belt_no,ps.ps_name police_station,pl.pl_name posting_as,p.st_date
from p_posting p,post_list pl,police_station ps
where p.ps_id=ps.ps_id and p.pl_id=pl.pl_id and
(p.belt_no,p.st_date) IN(select belt_no,max(st_date) from p_posting group by belt_no)
Try changing that to:
where exists (select 1
from posting
group by belt_no
having belt_no = p.belt_no and p.st_date = max(st_date)
)
There may be other issues, of course. At the very least, you could format your queries so they are readable and use ANSI standard join syntax. Being able to read the queries would be the first step to improving their performance. Then you should use explain in MySQL to see what the query plans are like.

Muhammad Jawad it's so simple. you have already created indexes on table that allow database application to find data fast, but if you change/update the (indexes tables) table (e.g: inserst,update,delete) then it take more time that of which have no indexes applied on table because the indexes also need updation so each index will be updated that take too much time. So you should apply indexes on columns or tables that we use it only for search purposes only. hope this will help u. thank u.

MySQL How to efficiently compare multiple fields between tables?

So my expertise is not in MySQL so I wrote this query and it is starting to run increasingly slow as in 5 minutes or so with 100k rows in EquipmentData and 30k or so in EquipmentDataStaging (which to me is very little data):
CREATE TEMPORARY TABLE dataCompareTemp
SELECT eds.eds_id FROM equipmentdatastaging eds
INNER JOIN equipment e ON e.e_id_string = eds.eds_e_id_string
INNER JOIN equipmentdata ed ON e.e_id = ed.ed_e_id
AND eds.eds_ed_log_time=ed.ed_log_time
AND eds.eds_ed_unit_type=ed.ed_unit_type
AND eds.eds_ed_value = ed.ed_value
I am using this query to compare data rows pulled from a clients device to current data sitting within their database. From here I take the temp table and use the ID's off it to make conditional decisions. I have the e_id_string indexed and I have e_id indexed and everything else is not. I know that it looks stupid that I have to compare all this information, but the clients system is spitting out redundant data and I am using this query to find it. Any type of help on this would be greatly appreciated whether it be a different approach by SQL or MySql Management. I feel like when I do stuff like this in MSSQL it handles the requests much better, but that is probably because I have something set up incorrectly.

TIPS
index all necessary columns which are using with ON or WHERE condition
here you need to index eds_ed_log_time,eds_e_id_string, eds_ed_unit_type, eds_ed_value,ed_e_id,ed_log_time,ed_unit_type,ed_value
change syntax to SELECT STRAIGHT JOIN ... see more reference

Is it a bad idea to keep a subtotal field in database

I have a MySQL table that represents a list of orders and a related child table that represents the shipments associated with each order (some orders have more than one shipment, but most have just one).
Each shipment has a number of costs, for example:
ItemCost
ShippingCost
HandlingCost
TaxCost
There are many places in the application where I need to get consolidated information for the order such as:
TotalItemCost
TotalShippingCost
TotalHandlingCost
TotalTaxCost
TotalCost
TotalPaid
TotalProfit
All those fields are dependent on the aggregated values in the related shipment table. This information is used in other queries, reports, screens, etc., some of which have to return a result on tens of thousands of records quickly for a user.
As I see it, there are a few basic ways to go with this:
Use a subquery to calculate these items from the shipment table whenever they are needed. This complicates things quite a bit for all the queried that needs all or part of this information. It is also slow.
Create a view that exposes the subqueries as simple fields. This keeps the reports that needs them simple.
Add these fields in the order table. These would give me the performance I am looking for, at the expense of having to duplicate data and calculate it when I make any changes to the shipment records.
One other thing, I am using a business layer that exposes functions to get this data (for example GetOrders(filter)) and I don't need the subtotals each time (or only some of them some of the time), so generating a subquery each time (even from a view) is probably a bad idea.
Are there any best practices that anybody can point me to help me decide what the best design for this is?
Incidentally, I ended up doing #3 primarily for performance and query simplicity reasons.
Update:
Got lots of great feedback pretty quickly, thank you all. To give a bit more background, one of the places the information is shown is on the admin console where I have a potentially very long list of orders and needs to show TotalCost, TotalPaid, and TotalProfit for each.

Theres absolutely nothing wrong with doing rollups of your statistical data and storing it to enhance application performance. Just keep in mind that you will probably need to create a set of triggers or jobs to keep the rollups in sync with your source data.

I would probably go about this by caching the subtotals in the database for fastest query performance if most of the time you're doing reads instead of writes. Create an update trigger to recalculate the subtotal when the row changes.
I would only use a view to calculate them on SELECT if the number of rows was typically pretty small and access somewhat infrequent. Performance will be much better if you cache them.

Option 3 is the fastest
If and when you are running into performance issues and if you cannot solve these any other way, option #3 is the way to go.
Use triggers to do the updating
You should use triggers after insert, update and delete to keep the subtotals in your order table in sync with the underlying data.
Take special care when retrospectively changing prices and stuff as this will require a full recalc of all subtotals.
So you will need a lot of triggers, that usually don't do much most of the time.
if a taxrate changes, it will change in the future, for orders that you don't yet have
If the triggers take a lot of time, make sure you do these updates in off-peak hours.
Run an automatic check periodically to make sure the cached values are correct
You may also want to keep a golden subquery in place that calculates all the values and checkes them against the stored values in the order table.
Run this query every night and have it report any abnormalities, so that you can see when the denormalized values are out-of-sync.
Do not do any invoicing on orders that have not been processed by the validation query
Add an extra date field to table order called timeoflastsuccesfullvalidation and have it set to null if the validation was unsuccessful.
Only invoice items with a dateoflastsuccesfullvalidation less than 24 hours ago.
Of course you don't need to check orders that are fully processed, only orders that are pending.
Option 1 may be fast enough
With regards to #1
It is also slow.
That depends a lot on how you query the DB.
You mention subselects, in the below mostly complete skeleton query I don't see the need for many subselects, so you have me puzzled there a bit.
SELECT field1,field2,field3
, oifield1,oifield2,oifield3
, NettItemCost * (1+taxrate) as TotalItemCost
, TotalShippingCost
, TotalHandlingCost
, NettItemCost * taxRate as TotalTaxCost
, (NettItemCost * (1+taxrate)) + TotalShippingCost + TotalHandlingCost as TotalCost
, TotalPaid
, somethingorother as TotalProfit
FROM (
SELECT o.field1,o.field2, o.field3
, oi.field1 as oifield1, i.field2 as oifield2 ,oi.field3 as oifield3
, SUM(c.productprice * oi.qty) as NettItemCost
, SUM(IFNULL(sc.shippingperkg,0) * oi.qty * p.WeightInKg) as TotalShippingCost
, SUM(IFNULL(hc.handlingperwhatever,0) * oi.qty) as TotalHandlingCost
, t.taxrate as TaxRate
, IFNULL(pay.amountpaid,0) as TotalPaid
FROM orders o
INNER JOIN orderitem oi ON (oi.order_id = o.id)
INNER JOIN products p ON (p.id = oi.product_id)
INNER JOIN prices c ON (c.product_id = p.id
AND o.orderdate BETWEEN c.validfrom AND c.validuntil)
INNER JOIN taxes t ON (p.tax_id = t.tax_id
AND o.orderdate BETWEEN t.validfrom AND t.validuntil)
LEFT JOIN shippingcosts sc ON (o.country = sc.country
AND o.orderdate BETWEEN sc.validfrom AND sc.validuntil)
LEFT JOIN handlingcost hc ON (hc.id = oi.handlingcost_id
AND o.orderdate BETWEEN hc.validfrom AND hc.validuntil)
LEFT JOIN (SELECT SUM(pay.payment) as amountpaid FROM payment pay
WHERE pay.order_id = o.id) paid ON (1=1)
WHERE o.id BETWEEN '1245' AND '1299'
GROUP BY o.id DESC, oi.id DESC ) AS sub
Thinking about it, you would need to split this query up for stuff that's relevant per order and per order_item but I'm lazy to do that now.
Speed tips
Make sure you have indexes on all fields involved in the join-criteria.
Use a MEMORY table for the smaller tables, like tax and shippingcost and use a hash index for the id's in the memory-tables.

I would avoid #3 as possible as I can. I prefer that for different reasons:
It's too hard to discuss performance without measurement. Imaging the user is shopping around, adding order items into an order; every time an item is added, you need to update the order record, which may not be necessary (some sites only show order total when you click shopping cart and ready to checkout).
Having a duplicated column is asking for bugs - you cannot expect every future developer/maintainer to be aware of this extra column. Triggers can help but I think triggers should only be used as a last resort to address a bad database design.
A different database schema can be used for reporting purpose. The reporting database can be highly de-normalized for performance purpose without complicating the main application.
I tend to put the actual logic for computing subtotal at application layer because subtotal is actually an overloaded thing related to different contexts - sometimes you want the "raw subtotal", sometimes you want the subtotal after applying discount. You just cannot keep adding columns to the order table for different scenario.

It's not a bad idea, unfortunately MySQL doesn't have some features that would make this really easy - computed columns and indexed (materialized views). You can probably simulate it with a trigger.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008