Optimise Linq-to-Sql mapping with one to many lookup - linq-to-sql

I'm having problems optimising data lookup with the following data structure:
Order
-----
Id
Customer
Date
... etc
OrderStatus
------
Id
OrderId
Date
UpdatedBy
StatusTypeId
...etc
This is causing me a headache on the Order List page, which basically shows a list of Orders. Each Order Summary in the list shows a bunch of fields from Order and the current OrderStatus, i.e. the OrderStatus with the latest Date which is linked to the Order.
Order List
-------------------------------------------------------
Order Id | Customer | Order Date | CurrentStatus |
-------------------------------------------------------
1 | Someone | 1.10.2010 | Completed |
-------------------------------------------------------
2 | Someone else | 12.10.2010 | In Progress |
-------------------------------------------------------
3 | Whoever | 17.10.2010 | On Hold |
-------------------------------------------------------
Now, say I want to list all orders from this year. My Repository fetches the Order objects
var orders = _repository.GetAllOrdersSinceDate(dt);
and now I end up with something like
foreach (Order order in orders)
{
OrderSummary summary = new OrderSummary();
summary.Customer = order.Customer;
summary.Date = order.Date;
// ...etc
// problem here!!
summary.OrderStatus = order.OrderStatus
.OrderByDescending(s => status.Date).First();
}
So what I end up with is a SELECT statement on Order and then a further SELECT statement on OrderStatus for each Order returned.
So to show the summary of all records for this year is requiring around 20,000 individual SQL queries and taking many minutes to load.
Is there any neat way to fix this problem?
I'm considering re-writing the database to hold the current OrderStatus in the Order table, so I end up with something like
Order
-----
Id
Customer
Date
CurrentStatusTypeId
CurrentStatusDate
CurrentStatusUpdatedBy
...etc
OrderStatusHistory
------
Id
OrderId
Date
UpdatedBy
StatusTypeId
...etc
which is the only way I can see to solve the problem but seems a pretty nasty solution.
Whats the best way forward here?

Please don't denormalize your database model to solve your problem. This will only make things worse. You can fix this by writing a service method that returns a list of data transfer objects (DTO) instead of the LINQ to SQL entities. For instance, the service method might look like this:
public OrderSummary[] GetOrderSummariesSinceDate(DateTime d)
{
return (
from order in this.context.Orders
where order.Date >= d
let lastStatus = (
from status in order.OrderStatusses
orderby status.Date descending
select status).First()
select new OrderSummary
{
OrderId = order.Id,
CustomerName = order.Customer.Name,
Date = order.Date,
OrderStatus = lastStatus.StatusType.Name
}).ToArray();
}
Note the following:
This code will execute as a single SQL query in the database.
This method will return an object that contains just the data that the client needs, but nothing more. No Customer object, no OrderStatus object.
By calling ToArray we ensure that the database is queried at this point and it is not deferred.
These three points ensure that the performance is maximized and allows the service layer to stay in control over what is executed to the database.
I hope this helps.

You can create a DataLoadOptions object as follows:
DataContext db = new DataContext
DataLoadOptions ds = new DataLoadOptions();
ds.LoadWith<OrderStatus>(c => c.Orders);
db.LoadOptions = ds;
Then when you run your query it should prefetch the OrderStatus table

Related

using aggregate functions before joining the tables

I have two tables and joining them on customer_id.
The first table is deal and I store the data of a deal there. And every deal has volume and rest , pay etc.
The second table is handle and it's hard for me explain what is purpose of this table but the handle table is the same deal table and it has volume_handle, rest_handle, pay_handle etc.
I have to use the left join because i want all records in deal table and the matched records from handle table
I want to sum volume and sum rest from deal and sum volume_handle from handle and the relationship between these tables is customer_id and buy_id.
for example the deal table:
id = 1
volume = 1000
rest = 1000
customer_id = 1
---------------
id = 2
volume = 500
rest = 0
customer_id = 1
---------------
id = 3
volume = 2000
rest = 0
customer_id = 2
and handle table is :
id = 1
volume_handle = 3000
buy_id = 1
the query i write is :
select sum(deal.rest) as rest , sum(deal.volume) as volume , sum(handle.volume_handle) as handle
from deal
left join handle on deal.customer_id = handle.buy_id
group by deal.customer_id;
and the result of this query is :
//when customer_id is 1
volume = 1500
rest = 1000
handle = 6000
//when customer_id is 2
volume = 2000
rest = 0
handle = null
the volume and the rest is right but the handle from second table is wrong because the result of sum(handle.volume_handle) is 3000 not 6000(when customer_id is 1 )
and i don't know how use aggregate functions before joining the tables.
anyone here can write the query for this problem?
Because you can have multiple rows in handle for each deal.customer_id value, you need to perform aggregation in that table before you JOIN it to deal. Something like this:
SELECT d.customer_id,
SUM(d.rest) AS rest,
SUM(d.volume) AS volume,
MAX(h.volume_handle) AS handle
FROM deal d
LEFT JOIN (SELECT buy_id, SUM(volume_handle) AS volume_handle
FROM handle
GROUP BY buy_id) h ON h.buy_id = d.customer_id
GROUP BY d.customer_id
Output:
customer_id rest volume handle
1 1000 1500 3000
2 0 2000 null
Demo on dbfiddle
Note that I have used MAX around h.volume_handle, this won't change the result (as all the values it will test will be the same) but will be required to avoid any only_full_group_by errors.

Check if relation exists in database via pivot table

situation
User can belong to multiple organizations, linked via a pivot table called employees
Models at play: User, Employee & Organizations
Relevant database columns:
users
- id
employees
- user_id
- organization_id
organizations
- id
goal
An efficient way to check if user 1 and user 2 share at least one organization_id in the employees table
usecase
Api endpoint /api/v1/user/# returns additional metadata regarding the user.
Using a policy, it checks if the current user and the user id from the url are the same, or that they are both employee in at least one organization, the organization_id is not known at this stage, all that matters is that it matches.
example A
user A (1) is employee of organization foo (1)
user B (2) is employee of organization bar (2)
employee table thus has the following records:
+-----------------+---------+
| organization_id | user_id |
+-----------------+---------+
| 1 | 1 |
| 2 | 2 |
+-----------------+---------+
in this example the query should return a false result, since there is no shared organization_id between user A and B
example B
user A (1) is employee of organization foo (1)
user A (1) is employee of organization foobar (3)
user B (2) is employee of organization bar (2)
user B (2) is employee of organization foobar (3)
employee table thus has the following records:
+-----------------+---------+
| organization_id | user_id |
+-----------------+---------+
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
+-----------------+---------+
in this example the query should return a true result, since there is a shared organization_id between user A and B
policy code
/**
* Determine whether the user can view the model.
*
* #param \App\User $user
* #param \App\User $model
* #return mixed
*/
public function view(User $user, User $model)
{
if ($user->is($model)) {
return true;
} else {
// check if users share at least one organization
}
}
code that works but does not look efficient
foreach ($user->organizations()->with('users')->get() as $organization) {
if ($organization->users->where('id', $model->id)->first()) {
return true;
}
}
return false;
experimental code with joins instead of something done with laravel models
\Illuminate\Support\Facades\DB::table('employees as auth_employee')
->join('employees as other_employee', 'other_employee.organization_id', '=', 'auth_employee.organization_id')
// ->join('organizations', 'organizations.id', '=', 'organizations.id')
->where('auth_employee.id', 1)
->where('other_employee.id', 2)
->get()
requested solution
An efficient query to get a (castable to) boolean result result wether or not 2 users share at least one organization_id on the employees table, 'bonus points' for using the laravel models / query builder.
footer
Thanks for reading, here is a potato: 🥔
Assuming you have a users relationship set up in your Organization model, you could use the whereHas method:
$user->organizations()->whereHas('users', function ($query) use($model) {
$query->where('users.id', $model->id);
})->exists();
As a raw query, I would probably use EXISTS here, but since you would need to port any query to Laravel/PHP code, I might suggest using a self-join:
SELECT DISTINCT
e1.user_id, e2.user_id
FROM employees e1
INNER JOIN employees e2
ON e1.organization_id = e2.organization_id AND e1.user_id = 2
WHERE
e1.user_id = 1;
This would just return the user_id pair of values (1, 2). If you wanted a query to return all pairs of distinct users sharing at least one organization, you could rewrite this query to this:
SELECT DISTINCT
e1.user_id, e2.user_id
FROM employees e1
INNER JOIN employees e2
ON e1.organization_id = e2.organization_id AND e1.user_id <> e2.user_id;
The most readable example I can think of would be to put something like this in the user model:
public function isColleagueWith(User $user): bool
{
return $this->organizations->intersectByKey($user->organizations)->count() > 0;
}
Usage is easy to read and understand:
$userA->isColleagueWith($userB);
If you wanted to use less DB queries, you could query the pivot table directly instead. Here you an get all organizations that employ the two users and check if the list contains any duplicate organization ids.
use Illuminate\Database\Eloquent\Relations\Pivot;
class Employees extends Pivot
{
public function areColleagues(int $userIdA, int $userIdB): bool
{
$employments = $this->where('user_id', $userIdA)
->orWhere('user_id', $userIdB)
->get('organization_id');
return $employments->count() > $employments->unique()->count();
}
}
Usage:
Employees::areColleagues($userIdA, $userIdB);
To check if users 1 and 2 share one or more organizations:
SELECT EXISTS (
SELECT 1 FROM employees AS a
JOIN employees AS b USING(organization_id)
WHERE a.user_id = 1
AND b.user_id = 2;
Have both of these indexes:
(user_id, organization_id)
(organization_id, user_id)

How to get history of mapping for following table data

I have a history mapping table for UserId changes, where every time when UserId changes, a row for new UserId with old UserId inserted in the history table.
Below is the sample table and data:
UserIdNew | UserIdOld
---------------------
5 | 1
10 | 5
15 | 10
The above data explains that UserId 1 has gone with following transition from UserId 1 -> 5-> 10 -> 15.
I want to query all the Old Ids for a give UserIdNew, how can I do it in a single query?
For this case if UserIdNew = 15, then it should return 1,5,10
If UserIdNew are always greater then previous (older) in a UserIds chain, i.e. if cases like 10->20->5->1 never happen, this query can do the job (not fully tested, new and old used instead of your field names):
SELECT
CASE
WHEN new=7 THEN #seq:=concat(new,',',old)
WHEN substring_index(#seq,',',-1)=new THEN concat(#seq,',',old)
ELSE #seq
END AS SEQUENCE
FROM (SELECT * FROM UserIdsTable ORDER BY new DESC) AS SortedIds
ORDER BY SEQUENCE DESC
LIMIT 1

MySQL Query Search using Multiple Rows

Firstly I'd like to start by apologizing for the potentially miss-leading title... I am finding it difficult to describe what I am trying to do here.
With the current project I'm working on, we have setup a 'dynamic' database structure with MySQL that looks something like this.
item_details ( Describes the item_data )
fieldID | fieldValue | fieldCaption
1 | addr1 | Address Line 1
2 | country | Country
item_data
itemID | fieldID | fieldValue
12345 | 1 | Some Random Address
12345 | 2 | United Kingdom
So as you can see, if for example I wanted to lookup the address for the item 12345 I would simply do the statement.
SELECT fieldValue FROM item_data WHERE fieldID=1 and itemID=12345;
But here is where I am stuck... the database is relatively large with around ~80k rows and I am trying to create a set of search functions within PHP.
I would like to be able to perform a query on the result set of a query as quickly as possible...
For example, Search an address name within a certain country... ie: Search for the fieldValue of the results with the same itemID's as the results from the query:
'SELECT itemID from item_data WHERE fieldID=2 and fieldValue='United Kingdom'..
Sorry If I am unclear, I have been struggling with this for the past couple of days...
Cheers
You can do this in a couple of ways. One is to use multiple joins to the item_data table with the fieldID limited to whatever it is you want to get.
SELECT *
FROM
Item i
INNER JOIN item_data country
ON i.itemID = country.itemID
and fieldid = 2
INNER JOIN item_data address
ON i.itemID = country.itemID
and fieldid = 1
WHERE
country.fieldValue= 'United Kingdom'
and address.fieldValue= 'Whatever'
As an aside this structure is often referred to as an Entry Attribute Value or EAV database
Sorry in advance if this sounds patronizing, but (as you suggested) I'm not quite clear what you are asking for.
If you are looking for one query to do the whole thing, you could simply nest them. For your example, pretend there is a table named CACHED with the results of your UK query, and write the query you want against that, but replace CACHED with your UK query.
If the idea is that you have ALREADY done this UK query and want to (re-)use its results, you could save the results to a table in the DB (which may not be practical if there are a large number of queries executed), or save the list of IDs as text and paste that into the subsequent query (...WHERE ID in (...) ... ), which might be OK if your 'cached' query gives you a manageable fraction of the original table.

How to get the sum of a column from combined tables in mySQL?

I've been trying to write a mySQL-statement for the scenario below, but I just can't get it to work as intended. I would be very grateful if you guys could help me get it right!
I have two tables in a mySQL-database, event and route:
event:
id | date | destination | drivers |
passengers | description | executed
route:
name | distance
drivers contains a string with the usernames of the registered drivers in an event on the form "jack:jill:john".
destination contains the event destination (oh, really?) and its value is always the same as one of the values in the field name in the table route (i.e. the destination must already exist in route).
executed tells if the event is upcoming (0) or already executed (1).
distance is the distance to the destination in km from the home location.
What I want is to get the total distance covered for one specific user, only counting already executed events.
E.g., if Jill has been registered as a driver in two executed events where the distances to the destinations are 50km and 100km respectively, I would like the query to return the value 150.
I know I can use something like ...WHERE drivers LIKE '%jill%' AND executed = 1 to get the executed events where Jill was driving, and SUM() to get the total distance, but how do I combine the two tables and get it all to work?
Your help is very much appreciated!
/Linus
I haven't use MySQL for years, so sorry if I've got the syntax wrong, but something like this should do it:
In generic SQL:
select sum(distance) from route
join event on route.name = event.destination
where drivers like '%jill%' AND executed = 1
Or not using JOIN:
select sum(distance) from route, event
where drivers like '%jill%' AND executed = 1
and route.name = event.destination
Stuart's answer shows you how to get the sum of the column, but I just want to note that:
...WHERE drivers LIKE '%jill%'...
will return any event with a driver whose name contains the letters 'jill'.
Secondly, this database design doesn't seem to be normalized. You have driver names and route names repeated. If you normalize the database and have something like:
participant
id | name | role
event
id | date | route_id | description | executed
route
id | name | distance
participant_event
id | participant_id | event_id
then it would be a lot easier to work with the data.
Then if you wanted to implement a user search, you could make the query:
SELECT id FROM participant WHERE
name LIKE '%jill%' AND
role='driver';
Then if the query returns more than one result, let the user/application choose the correct driver and then run a SELECT SUM like Stuart's query:
SELECT SUM(r.distance) FROM route r
JOIN event e ON e.route_id=r.id
JOIN participant_event pe ON e.id=pe.event_id
JOIN participant p ON pe.participant_id=p.id
WHERE p.id=?;
Otherwise, the only way to ensure that you're only getting the total distance driven by one driver is to do something like this (assuming drivers is comma-delimited):
...WHERE LCASE(drivers)='jill' OR
drivers LIKE 'jill, %' OR
drivers LIKE '%, jill' OR
drivers LIKE '%, jill,%';