How to group in MySQL a column with multiple choices under only one of them? - mysql

I have a table in which a certain column's data is sometimes missing, in which cases has just a dash.
But since there's another column which is unique, the missing data can be deduced from other rows.
In other words, I have something like this:
Unique model
Maker
Profit
abcd1234
-
56
zgh675
Company Y
40
abcd1234
Company X
3
zgh675
-
10
abcd1234
Company X
1
Which query can I use to automatically reach the following (the list of Makers is dynamic but every model can only go to one of them):
Unique model
Maker
Profit
abcd1234
Company X
60
zgh675
Company Y
50
?

You may aggregate by unique model and then take the MAX value of the maker, along with the sum of the profit:
SELECT model, MAX(maker) AS maker, SUM(profit) AS profit
FROM yourTable
GROUP BY model;
This approach should work assuming that each model only has one unique non NULL value.

Related

Storing csv in MySQL field – bad idea?

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).
You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).
Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

Substitute fields in a record in a group

In a matrix table, i have Part as Row and Vendor as Column and price as data, if for a part say 'bearing' there are two vendors say A and B selling for $ 10 and $ 8 respectively, i want the group aggregate column to display the name of the vendor who is the cheapest rather than the min price value across every row, how do i substitute the min price value with the name of the vendor.
Vendor
Part-------------- A ------------- B ------------------- Cheapest
Bearing --------10---------------- 8 ---------------------- B
One solution is to use a Lookup expression, with the minimum price as the value to match. This expression should give each vendor if there is a tie for the cheapest:
=Join(LookupSet(Min(Fields!PRICE.Value), Fields!PRICE.Value, Fields!VENDOR.Value, "pv"), " / ")
There may be a more elegant solution out there, but is the first one I found.

SSRS Show a percentage of a a total value

I have a regular Table in SSRS. With 3 Groups...
(Parent) STORE - CLERK - PRODUCT (Child)
I have some regular aggregations. How many PRODUCTS Sold by a CLERK , How Many CLERKS Per STORE and Eventually How many PRODUCTS Per STORE
On top of the Regular Sums And Avgs, I need To Find Out The Percentage of PRODUCT (Type) Meaning a Particular value of that Group.
Example STORE 001 Has Sold 10 RADIOS (a PRODUCT) and There has Been 100 RADIOS sold by all Stores
So Basically What I Need is to show STORE 001 is Responsible for 10% of all RADIO Sales.
(A note: Ideally , I would Like to show this To adjust to the Data - So if I add new products It will group those as products (Naturally) but still give me those percentages)
= fields!product.value / sum(fields!product.value)
in its most basic form you would want to use something like this.
The first will give you the total of the current row of data and the second will give you the total of all rows of that product.
Thus you would have 10 / 100 (per your example).
This is assuming that you have your data structured correctly. Depending on the structure of you report you may need to add a scope to your total summation to make sure that you are not totaling any other datasets that may reference the same product or field.
sum(fields!product.value, "--your dataset here--")

Populating with '0' when Data in SSRS Does not exist

I'm trying to create a report in SSRS where I have a matrix, which has gender as the column headings and specifically defined agegroups as the rows. The report is sorted by date (ie, the records being displayed are filtered by the modifedAt value). My problem is that i wish for all of the age group categories to be displayed, even if the dataset does not return any data for that row.
So, for example, if i set the date to be a date where there are no db rows where there are Age5-16 children in - I still want to display the category name, but just have the cells related to that row to display '0'. Instead, the report just drops the whole row because, obviously the query returns no data.
Is the solution to have a separate dataset that brings back the entire list of categories and then somehow fit them together? I'm stuck here so any help is appreciated!
I can think of a few ways to do this:
DataSet level
Instead of just returning the relevant data in the underlying data in the DataSet, include all the categories you want to display in all cases.
e.g. For a database query it might be the difference between an inner and left join, i.e. going from something like:
select *
from AgeGroup
inner join MyData on ...
to:
select *
from AgeGroup
left join MyData on ...
So the report always has all the age groups to display. Where there are NULL values, just display 0.
I think this is the best option if you have control over the DataSet - you won't have to update your report at all, with luck the actual DataSet changes should be minimal, there is still only one DataSet call, and it's by far the simplest to maintain.
Hard code groups into the report
Here you include a table header row for each group you want to display, so these are always displayed in all cases.
Here you have some sort of conditional expression to display the values, e.g. For each group row it will be tailored to that group:
=Sum(IIf(Fields!AgeGroup.Value = "5-16", Fields!Amount.Value, Nothing)
This is not too flexible and will need updates as you change groups, and doesn't have as many options for layout. There is still only one DataSet call, so that is a plus.
Subreports
You can have a parent DataSet that displays one row for each age group, then embed a subreport in each row that displays the data you want for that row.
This allows you flexibility in layout but it will add complexity to the report(s) and will mean that you make a lot of DataSet calls that could be avoided with other options.
I know this is old, but I wanted to elaborate on Ian's section 1 above using joins at the dataset level. (His answer was super helpful to me for a report I'm working on.)
per op:
Is the solution to have a separate dataset that brings back the entire list of categories and then somehow fit them together?
That is how I've handled it successfully, but you can do so without actually creating a separate dataset by using common table expressions (or temp tables, of course).
For these example tables:
AGE_Table
ID Group Group_Desc Toys
1 A 00-10 Teddy Bear
2 B 11-20 Video Game
3 C 21-30 Sports Car
4 D 31-40 Mansion
5 E 41-50 Jewelry
People_Table (filtered for whatever date)
ID Name Age Gender Age_Group
1 Ariel 07 F A
2 Brandon 23 M C
3 Chelsea 27 F C
4 Derek 06 M A
You want to see 2 results for the 00-10 row, 2 for the 21-30 row, and then still see rows for the other age groups even if there aren't any results.
We want to create a dataset with all the different age groupings and then join on it. Behold a solution using common table expressions:
with CTE_Age AS
(SELECT Distinct Age_Group from AGE_Table)
SELECT ID, Name, Age, Gender, CTE_Age.Age_Group FROM People_Table
RIGHT JOIN CTE_Age ON
People_Table.Age_Group = CTE_Age.Age_Group
This will return:
ID Name Age Gender Age_Group
1 Ariel 7 F A
4 Derek 6 M A
NULL NULL NULL NULL B
2 Brandon 23 M C
3 Chelsea 27 F C
NULL NULL NULL NULL D
NULL NULL NULL NULL E
Once you have that in your dataset, you can change NULL values to 0 on the report builder side -- I think in 2008R2 the default is just blank.

Database Design: how to model generic price factors of a product/service?

I'm trying to create a generic data model that will allow for a particular product (indicated by the FK product_id in the sample table below) to specify 0 or more price "factors" (I define "factor" as a unit of price added or subtracted in order to get the total).
So say there is this table:
===============================
price
===============================
price_id (PK)
product_id (FK)
label
operation (ENUM: add, subtract)
type (ENUM: amount, percentage)
value
A book's price might be represented this way:
====================================================================
price_id | product_id | label | operation | type | value
====================================================================
1 | 10 | Price | add | amount | 20
2 | 10 | Discount | subtract | percentage | .25
3 | 10 | Sales Tax | add | percentage | .1
This basically means:
Price: $20.00
Discount: - $5.00 (25%)
--------------------
Sub Total: $15.00
Sales Tax: $1.50 (10%)
------------------------
Total: $16.50
A few questions:
Is there anything obviously wrong with the initial design?
What if I wanted to create "templates" (e.g. "general merchandise" template that has "price", "discount" and "sales tax" fields; a "luxury merchandise" that has "price", "discount", "luxury tax" fields) - how would I model that?
The above model works if each record applies to the total of the preceeding record. So, in the example, "sales tax" applies to the difference of "price" and "discount". What if total was not computed that simply? For example: A + B + (A + 10%) - (B - 5%). How would I model that?
Also, what if the "percentage" type doesn't apply to the immediately preceeding row (as implied by question #3) and applied to more than 1 row? Do I need another table to itemize which price->price_id the percentage applies to?
First of all you need a model of price labels, which is simple:
price_labels
id | label
1 | Price
2 | Discount
3 | Tax
Then a slightly modified version of the sample table that you've given:
products_prices
price_id|product_id|label_id|divider|value
1 10 1 1 20
2 10 2 100 -25
3 10 3 100 10
Here I just substituted the label with the corresponding id from the price_labels table as a foreign key. Additionally, I omitted the type field which is trivial since value can be positive or negative float number. I added the divider column to enable the percentage parameter. I think it is more easily read this way as well, since you say (and think) "minus twenty-five percent" not 0.25 .
Now the expression "abstraction" part is a bit more complicated and there could be a lot of solutions.
price_expressions
product_id | date_from | date_until | expression
10 |2011-11-02 04:00:00 |2011-11-12 04:00:00 | (SELECT divider*value from
products_prices
WHERE product_id=%PRODUCT_ID%
AND label_id=1)*
(SELECT 1+value/divider from products_prices
where product_id=%PRODUCT_ID% AND
label_id=2)*
(SELECT 1+value/divider from products_prices
where product_id=%PRODUCT_ID% AND
label_id=3)
In the expression field you can store a complex SQL statement in which you can just replace the %PRODUCT_ID% placeholder with the product_id value from the same row:
SELECT REPLACE(expression,'%PRODUCT_ID%',CAST(product_id AS char))
AS price_expression FROM price_expressions
WHERE product_id = 10 AND date_from>=DATE_OF_PURCHASE
AND date_until<=DATE_OF_PURCHASE
There are two possible variations of this the way I see it:
You can change the product_id=%PRODUCT_ID% and label_id=N condition with just a price_id=N since you already have it stored in the products_prices table
You can use another expression format e.g. %PRICE_ID_1%*%PRICE_ID_2 and perform substitutions and calculations on the application level not directly in SQL
Hope this helps.
This seems a little over-engineered.
1) Wouldn't the sales tax percentage be a factor of where the item was purchased and not which item was purchased? I could see a field for "IsTaxable", but specifying the rate for each items seems incorrect.
2) Are you sure you need to incur the cost of making this generic? Are you already fairly certain there will be more factors in the future? If not, don't overcomplicate it.
Suggested Design:
- Add columns to the products table for IsTaxable, DiscountPct, and Unit Price.
- Store the Sales tax percentage in another table. Probably the invoice table.
Regarding your question 1:
There is a potential functional dependency between label, operation and type. For example, a discount might always imply subtraction and percentage. If so, the data model can be normalized by moving these fields to a separate table with label as a PK.
BTW, a de-normalized data model may be a legitimate tool for improving performance and/or simplicity.
Regarding your question 2:
Here is a model that allows easy "templating":
The final price of a product is calculated by applying the series of steps on PRICE, in order defined by STEP_NO. Multiple products can easily share the same "template" (i.e. the same PRICE_ADJUSTMENT_ID).
Regarding your questions 3 and 4:
You'd need to model a full expression tree, not just a "linear" series of steps. There are several ways to do that, most of them fairly complicated in relational paradigm. Perhaps the simplest one is to keep the data model similar to above, but treat it as Reverse Polish Notation.
For example...
A + B + (A + 10%) - (B - 5%)
...could be represented as:
OPERATION TYPE VALUE
---- ---- -----
value A
value B
add
value A
percentage 10
add
add
value B
percentage 5
subtract
subtract
Are you sure you actually need this kind of functionality?
If some price factors are dependent on the type of the item, then you'd have a set of price factors linked to entities in an ItemType table, and ItemType would be a property of the item entity (foreign key referencing ItemType). If other price factors are linked to the locale in which the item is being sold or shipped (e.g. sales tax), then those factors would be linked to Locale and would be invoked based on the customer's address. You would typically apply item-type factors at the line-item level, and locale-driven factors to the invoice total. Sin-tax would be linked to an ItemTypeLocale dyad, and applied at the line-item level.
1/ I think you also need to consider sequence
e.g. Price - discount + sales tax is obviously acceptable but Price +sales tax - discount is not nor is Price - (discount + sales tax)
2/ I would consider having price in another table. Is this not a detail of the item being sold? E.g. Widget, blue, $20.00. Whereas your factors are a detail of sales type. Presumably you could have one set of factors for a walk-in retail sale, another for a on-line sale and a third for a wholesale sale. You could calculate the actual price for these three sale types from the base price * factors.
3/ I think you need more tables; e.g. maybe Item, Sale type, and factor_details and factor_rules. It may be that your sale type is covered by your example of Luxury item in which case (if an item is only ever one sale type) this could be in the item table. Factor_rules would detail the calculation formula and factor_details the values.
I find this quite interesting. I would appreciate you updating this question with your experiences once you have worked this through.