Storing csv in MySQL field – bad idea? - mysql

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).

You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).

Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

Related

Multi Ordering 4 SQL columns with a single query

Environment: MySQL 5.6
SqlTable name = CategoryTable
Sql Columns
CATEGORY_ID (INT)
CATEGORY_NAME (VARCHAR)
LEVEL (INT)
MOTHER_CATEGORY (INT)
I've tried with
SELECT
CATEGORY_ID, CATEGORY_NAME , LEVEL , MOTHER_CATEGORY
FROM
CategoryTable
But I don't know how to use the ORDER BY in order to get that result.
So the first line here are the columns, and from the second lines, there start the table content:
CATEGORY_ID CATEGORY_NAME LEVEL MOTHER_CATEGORY
1 MainCategory 0 0
2 -SubCategory1 1 1
3 --SubCategory2 2 2
4 ---SubCategory3 3 3
5 2Nd_Main_Category 0 0
6 -SubCategory1 1 5
7 --SubCategory2 2 6
8 ---SubCategory3 3 7
is there a way to achieve something like this with a mysql query?
You aren't very clear in what you are trying to achieve. I'll take a guess that you want to order using a multi-level parent child structure. there are some very complicated ways of handling such a feat within mysql 5.6, a DB that's not really ideal for such a structure, but I have come up with something simple myself that I use in my own apps. you create a special ordering field that creates a path of zero filled ids for each record.
ordering_path_field
/
/0000000001/
/
/0000000001/0000000002
/0000000003
/0000000003/0000000005
/0000000003/0000000005/0000000006
etc
so each record contains a path of each parent up to the root, using zero filled ids. then you can just sort by this field to get them in proper order. the drawbacks being that you'll have to set a max number of levels allowed, so that the ordering fields doesn't overflow, and also, moving a record to a new parent if ever needed would be a big pain.

How to design a MySQL database for storing sell items

I have been trying to design a MySQL table o store the items of the store purchased by the costumers. I am stuck with what approach should I take to design a good table.
My first option is:
id
bill_id_fk
item1_id
item2_id
item3_id
item4_id
In this approach, I'll create may be 20 columns for items (assuming that a costumer may buy a maximum of 20 items at a time). ID of the items will be stored in the item(n)_id columns for that specific bill_id_fk.
My concern with this approach is that it would be difficult to query later for a specific item, like how many times a specific item has been sold.
My second opinion is:
id
bill_id_fk
item_id
1
1
23
2
1
29
3
2
23
In this approach, I'll just create 3 columns and for each item I'll create a rows with the bill_id_fk for a specific bill.
In this approach, it is easier to query for a counts of the sell of a specific item. But my concern is creating thousands and thousands of rows when the app will be used and how will that affect the performance of the app over time?
I'd like to have your opinion on what is the best practice for designing such database. Or is there any other approach should I take?
There's no chance that you will go with the first choice, the second is the best approach for your case.
it will not affect your performance if you indexed the right columns.
When it comes to items can add a column to your bills table that holds item numbers, for example:
bills (id - total_price - user_id - item_counts)
bill_items (id - bill_id - item_id - item_price)

MySQL - counter query

Have two tables users and user_demographics
users has the basic structure of (does have more fields but not needed here):
id name email gender age ethnicity
1 test1 test1#test.com 1 1 1
2 test2 test2#test.com 1 2 1
3 test3 test3#test.com 2 3 2
4 test4 test4#test.com 3 1 1
5 test5 test5#test.com 2 4 5
**Gender**:
1 - Male,
2 - Female,
3 - Prefer not to say
**Age**:
1 - 16-20,
2 - 21-24,
3 - 25-30,
4 - 31-24
**Ethnicity**:
1 - White,
2 - Black,
5 - Prefer not to say
and so on and currently have around 1000 users.
user_demographics structure is:
coreid, type, option (for the sake of this question 'type' will be text, just to make it clearer)
coreid, type option
1 gender 1
1 gender 2
1 age 1
1 age 3
1 ethnicity 1
2 gender 2
2 gender 3
2 age 3
3 gender 1
On a web based form I have 3 sets of checkbox lists, one for each option gender, age, ethnicity and the a user can select multiple from each. They click update and these details are stored in the mysql database as above. coreid is related to another table, but not relevant here.
What I'm trying to do is get a total count of users for each coreid regardless of what type it is. The count should get smaller the more options you select. So coreid 3 should have the biggest count because I've only selected one option.
Example: coreid 3 is selecting all males
Example: coreid 2 is selecting all (females AND 'prefer not to say') AND age range 25-30
Struggling on how to create a single query that will give me the results I need, hope this makes sense.
The idea behind the over all system is that we have a large form that a user fill outs and we store in the information in the users table. Then a member of the admin team can go in and select these users by selecting options from the various demographics information we have collected. So they might just want to see everyone that has ticked the gender options of 'male' and 'prefer not to say' for example. Another admin member may go in and say they want all males, between the age of 25-30. Or they could just tick all options under gender. The idea is that they can select any combination and get a list of results. At the minute I just need to get a count back for the combination selected.
By the SOUNDS of it, you are probably going to need to do with dynamic SQL where you actually build the query on-the-fly, then execute that. Also, to clarify what I THINK you are asking is as follows. CoreID is like a set of filters that some manager is interested in getting count and or details of specific users. They are interested in
EITHER gender condition (1 or 2)
AND EITHER age condition (1 or 3)
AND just the one ethnicity
to possibly target products that might hit those demographics. So you would pre-query every record for CoreID = 1 then start building your query. You would want to order your query by the TYPE to group common items such as the gender, age, ethnicity categories.
Then, within your either localized code (not indicated such as C#, VB, java, whatever), you would need to build the query in such a way that you parenthesis OR those within same category, and logical AND between different such as
where
( Gender = 1
OR Gender = 2 )
AND ( Age = 1
OR Age = 3 )
AND ( Ethnicity = 1 )
If you are trying to write as a MySQL stored procedure, it would be a type of dynamic SQL query... either way, the WHERE clause needs to be constructed from the Core criteria someone is looking for.
You are correct, the last one would be easiest for CoreID = 3 would be a simple
WHERE ( Gender = 1 )
Clarify language source and I or others might be able to offer additional direction, but if I am accurate, you should try to write your own first pass of code, but I will shoot out a pseudo-code for you something like
Get Records Ordered for one CoreID, order by the type of criteria.
prep variable identifying if pending open Parenthesis
prep variable identifying last "type" building for.
for each record
If new type
if has Open Parenthesis
add closing paren
add logical AND before the next entry we are getting
add open parenthesis
set flag we have open parenthesis
else
since same type as last type, add logical OR
go to next record, repeat.
If after last record we would always need to close parenthesis even if a single criteria

database schema one column entry references many rows from another table

Let's say we have a table called Workorders and another table called Parts. I would like to have a column in Workorders called parts_required. This column would contain a single item that tells me what parts were required for that workorder. Ideally, this would contain the quantities as well, but a second column could contain the quantity information if needed.
Workorders looks like
WorkorderID date parts_required
1 2/24 ?
2 2/25 ?
3 3/16 ?
4 4/20 ?
5 5/13 ?
6 5/14 ?
7 7/8 ?
Parts looks like
PartID name cost
1 engine 100
2 belt 5
3 big bolt 1
4 little bolt 0.5
5 quart oil 8
6 Band-aid 0.1
Idea 1: create a string like '1-1:2-3:4-5:5-4'. My application would parse this string and show that I need --> 1 engine, 3 belts, 5 little bolts, and 4 quarts of oil.
Pros - simple enough to create and understand.
Cons - will make deep introspection into our data much more difficult. (costs over time, etc)
Idea 2: use a binary number. For example, to reference the above list (engine, belt, little bolts, oil) using an 8-bit integer would be 54, because 54 in binary representation is 110110.
Pros - datatype is optimal concerning size. Also, I am guessing there are tricky math tricks I could use in my queries to search for parts used (don't know what those are, correct me if I'm in the clouds here).
Cons - I do not know how to handle quantity using this method. Also, Even with a 64-bit BIGINT still only gives me 64 parts that can be in my table. I expect many hundreds.
Any ideas? I am using MySQL. I may be able to use PostgreSQL, and I understand that they have more flexible datatypes like JSON and arrays, but I am not familiar with how querying those would perform. Also it would be much easier to stay with MySQL
Why not create a Relationship table?
You can create a table named Workorders_Parts with the following content:
|workorderId, partId|
So when you want to get all parts from a specific workorder you just type:
select p.name
from parts p inner join workorders_parts wp on wp.partId = p.partId
where wp.workorderId = x;
what the query says is:
Give me the name of parts that belongs to workorderId=x and are listed in table workorders_parts
Remembering that INNER JOIN means "INTERSECTION" in other words: data i'm looking for should exist (generally the id) in both tables
IT will give you all part names that are used to build workorder x.
Lets say we have workorderId = 1 with partID = 1,2,3, it will be represented in our relationship table as:
workorderId | partId
1 | 1
1 | 2
1 | 3

Storing Hierarchical Data (MySQL) for Referral Marketing

I need to have a 5 levels hierarchy for the users registered to a website. Every user is invited by another, and I need to know all descendants for a user. And also ancestors for a user.
I have in mind 2 solution.
Keeping a table with relationships this way. A closure table:
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
Having this table for relationships. Keeping in a table 5 levels ancestors. A "ancestors" table:
user_id ancestor_level1_id ancestor_level2_id ancestor_level3_id ancestor_level4_id ancestor_level5_id
10 9 7 4 3 2
9 7 4 3 2 1
Are these good ideas?
I know about "the adjacency list model" and "the modified preorder tree traversal algorithm", but are these good solutions for a "referral" system?
The queries that I need to perform on this tree are:
frequently adding a new users
when a user buys something, their referrers get a percentage commission
every user should be able to find out how many people they've referred (and how many people were referred by people who they referred....) at each level
Closure Table
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
2 3 1
To add user 10, referred by user 3. (I don't think you need to lock the table between these two insertions):
insert into ancestor_table
select ancestor_id, 10, distance+1
from ancestor_table
where descendant_id=3;
insert into ancestor_table values (10,10,0);
To find all users referred by user 3.
select descendant_id from ancestor_table where ancestor_id=3;
To count those users by depth:
select distance, count(*) from ancestor_table where ancestor_id=3 group by distance;
To find the ancestors of user 10.
select ancestor_id, distance from ancestor_table where descendant_id=10;
The drawback to this method is amount of storage space this table will take.
Use the OQGRAPH storage engine.
You probably want to keep track of an arbitrary number of levels, rather than just 5 levels. Get one of the MySQL forks that supports the QGRAPH engine (such as MariaDB or OurDelta), and use that to store your tree. It implements the adjacency list model, but by using a special column called latch to send a command to the storage engine, telling it what kind of query to perform, you get all of the advantages of a closure table without needing to do the bookkeeping work each time someone registers for your site.
Here are the queries you'd use in OQGRAPH. See the documentation at
http://openquery.com/graph-computation-engine-documentation
We're going to use origid as the referrer, and destid as the referree.
To add user 11, referred by user 10
insert into ancestors_table (origid,destid) values (10,11)
To find all users referred by user 3.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND origid = 3;
To find the ancestors of user 10.
SELECT linkid FROM ancestors_table WHERE latch = 2 AND destid = 10;
To find the number of users at each level, referred by user 3:
SELECT count(linkid), weight
FROM ancestors_table
WHERE latch = 2 AND origid = 3
GROUP BY weight;
Managing Hierarchical Data in MySQL
In general, I like the "nested set", esp. in MySQL which doesn't really have language support for hierarchical data.
It's fast, but you'll need to make sure your developers read that article if ease of maintenance is a big deal. It's very flexible - which doesn't seem to matter much in your case.
It seems a good fit for your problem - in the referral model, you need to find the tree of referrers, which is fast in the nested set model; you also need to know who are the ~children# of a given user, and the depth of their relationship; this is also fast.
Delimited String of Ancestors
If you're strongly considering the 5-level relationship table, it may simplify things to use a delimited string of ancestors instead of 5 separate columns.
user_id depth ancestors
10 7 9,7,4,3,2,1
9 6 7,4,3,2,1
...
2 2 1
1 1 (empty string)
Here are some SQL commands you'd use with this model:
To add user 11, referred by user 10
insert into ancestors_table (user_id, depth, ancestors)
select 11, depth+1, concat(10,',',ancestors)
from ancestors_table
where user_id=10;
To find all users referred by user 3. (Note that this query can't use an index.)
select user_id
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3';
To find the ancestors of user 10. You need to break up the string in your client program. In Ruby, the code would be ancestorscolumn.split(",").map{|x| x.to_i}. There's no good way to break up the string in SQL.
select ancestors from ancestors_table where user_id=10;
To find the number of users at each level, referred by user 3:
select
depth-(select depth from ancestors_table where user_id=3),
count(*)
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3'
group by depth;
You can avoid SQL injection attacks in the like '%,3,%' parts of these queries by using like concat('%,', ?, ',%') instead and binding the an integer for the user number to the placeholder.