Storing users data efficiently in mysql database - mysql

I am developing a recommendation engine, so that requires storing lots of data and keeping track of every move made by the user. So, basically my website is a product search engine and will be having sets of queries as users data. Following are some examples of the data set
Example
User1 :
1. Apple Ipod tOuch
2. Samsung Galaxy Ace Plus
3. HArry Porter
User2 :
1. Product1
2. Product2
and so on.
One way(naive) could be having an ID associated with each of my users and then having a string corresponding to that ID which will be of this form(strings separated with ~) :-
Unique ID - Apple IPod TOuch~Samsung Galaxy Ace Plus~HArry Porter
But this method won't be efficient considering how I would be playing with those data later on.
Can any one come up with a very efficient model fairly easily implementable in mysql ?
Comment if I am unclear in asking my doubt.

The classic design is a table for users :
Users(user_id,user_name,reg_date....)
table for products :
Products(prod_id,prod_name,prod_cost....)
table with mapping user-->products :
User_products(user_id,prod_id ....)
Example :
Users :
user_id|user_name
1200 | User1
7856 | User2
Products :
prod_id | prod_name
12900 | Apple Ipod tOuch
45673 | Samsung Galaxy Ace Plus
99876 | HArry Porter
34590 | Product1
56283 | Product2
User_products :
user_id | prod_id
1200 |12900
1200 |45673
1200 |99876
7856 |34590
7856 |56283

Avoid strings separated with some identifier because you'll have to work with the data submitted, otherwise your search engine will be very slow when you'll get really big amount of data.
I think Grisha is absolutely right - user or product searches (numeric id searches), joined with mapping tables will output the result much faster than searches through text/varchar fields, separating the results, etc.

Using the canonical approach as proposed by Grisha, the query 'who has product 1' would be represented thus
select user.user_name
from users inner join user_products on users.user_id = user_products.user_id
inner join products on products.prod_id = user.products.prod_id
where products.prod_name = 'Product 1'
This may look complicated but it's actually very simple and very powerful. If there were another field in the user_products table such as purchase date, you could also find out when those users bought product 1, or find all the users who bought the product during a given period, by means of a simple extension to the query.

Related

Storing combinations of item properties in database

I have a problem of such:
Let's say I have an item, a CUP for example. I want to sell it, but want to allow the user to pick CUP properties, such as Size, Color, Material. When the user will select Size (maybe Large), color (maybe Black) and Material (maybe Glass) then I need to show him, that we have 20 such Cups in warehouse and the cost is $25 each. And now: I don't know how to store those combinations in database.
Here is my ultra stupid solution:
For each combination I will have a column, yet, adding any new combination might be painfull as well, as removing some, I will have to map them somehow, well...
Id | Product Name | LargeBlackGlassPrice | LargeBlackGlassCount | SmallBlackGlassPrice | SmallBlackGlassCount | Medium...
stupid idea, but as for now didn't hit anything better :/
Hope it's clear what I want to achieve.
Thank you
Consider the following ERD:
The system administrator maintains a list of product categories, these may include, for example, cups. The administrator also maintains a list of features. These could include size, colour, material, and anything else that they decide is potentially important for any type of product. The administrator can then create an intersection of categories and features to indicate which features matter for a particular product category.
This establishes the "rules" for a catalogue of products. Which types of products do you have and what is important to know about each of these types products.
Now, to store the products themselves, you have the SKU table. Each individual product, for example: Large Black Glass Cups is stored in this table. You can store the current price of this product here. You can also store the stock on hand here, although I've recommended elsewhere to never store stock quantity directly. Inventory management is not the basis of your question, however.
For any particular product (SKU) you then have a list of product features where the specific values of each specific product are stored. The features that matter are the ones defined by the product's category as listed in the CATEGORY_FEATURE table.
On your website, when a customer is searching for items in a PRODUCT_CATEGORY, (e.g. Cups) you show them the list of CATEGORY_FEATUREs that apply. For each feature, you can create a drop down list of possible values to choose from by using:
select distinct PF.value
from CATEGORY_FEATURE CF
inner join PRODUCT_FEATURE PF
on CF.product_category_id = PF.product_category_id
and CF.feature_id = PF.feature_id
where CF.product_category_id = CategoryOfInterest
and CF.feature_id = FeatureOfInterest
order by
PF.value
This design gives your administrator the ability to define new product categories and product features without having to make database schema or code changes.
Many people are likely to point out that this design uses the Entity-Attribute-Value (EAV) pattern, and they are equally likely to point out that EAV is EVIL. I agree in principle that EAV is to be avoided in almost all cases, but I have also asserted that in some cases, and in particular in the case of product catalogues, EAV is actually the preferred design.
Table1 => Cup Master
Fields => Cup Id | Product Name
Example =>
1001 | CUP A
1002 | CUP B
Table2 => Property Master
Fields => Property_Id | Properties
Example =>
1 | LargeBlackGlass
2 | SmallBlackGlass
3 | MediumBlackGlass
Table3 => Inventory Master
Fields => Cup Id | Property_Id | count | price_per_piece
Example =>
CUP A | 1 | 3 | 45/=
CUP A | 2 | 2 | 40/=
CUP A | 3 | 2 | 35/=
CUP A | 1 | 3 | 45/=
CUP A | 2 | 2 | 40/=
NOTE: A cup with a particular property might be available and with other property might not.
Let try to reason how to solve your task. I will describe general conception and split it in some steps:
Define types of products that you are going to sell: cup, plate, pan and so on. Create table products with fields: id, name, price.
Define colours of products: black, red, brown. Create table products_colours with fields: id, name, price.
Define sizes of products: small, medium, large. Create table products_sizes with fields: id, name, price.
In simple case all types of products will have the same price and will store in table products.
In simple case additional price for colours and sizes will be the same for all types of products and will be stored in tables products_colours and products_sizes.
Create table customers_products with fields: id, products_id, products_colours_id, products_sizes_id, quantity.
Write a query for join all table together to fetch all products with colours, sizes and all prices from db.
In the script iterates through all rows and calculate price for every product as a sum of product price, size price and colour price.
To sum up: this is very basic implementation that doesn't include things like brands, discounts and so on. However, it gives you understanding how to scale your system in case of adding additional attributes that affect the final price of products.

mysql optimize data content: multi column or simple column hash data

I actually have a table with 30 columns. In one day this table can get around 3000 new records!
The columns datas look like :
IMG Name Phone etc..
http://www.site.com/images/image.jpg John Smith 123456789 etc..
http://www.site.com/images/image.jpg Smith John 987654321 etc..
I'm looking a way to optimize the size of the table but also the response time of the sql queries. I was thinking of doing something like :
Column1
http://www.site.com/images/image.jpg|John Smith|123456789|etc..
And then via php i would store each value into an array..
Would it be faster ?
Edit
So to take an example of the structure, let's say i have two tables :
package
package_content
Here is the structure of the table package :
id | user_id | package_name | date
Here is the structure of the table package_content :
id | package_id | content_name | content_description | content_price | content_color | etc.. > 30columns
The thing is for each package i can get up to 16rows of content. For example :
id | user_id | package_name | date
260 11 Package 260 2013-7-30 10:05:00
id | package_id | content_name | content_description | content_price | content_color | etc.. > 30columns
1 260 Content 1 Content 1 desc 58 white etc..
2 260 Content 2 Content 2 desc 75 black etc..
3 260 Content 3 Content 3 desc 32 blue etc..
etc...
Then with php i make like that
select * from package
while not EOF {
show package name, date etc..
select * from package_content where package_content.package_id = package.id and package.id = package_id
while not EOF{
show package_content name, desc, price, color etc...
}
}
Would it be faster? Definitely not. If you needed to search by Name or Phone or etc... you'd have to pull those values out of Column1 every time. You'd never be able to optimize those queries, ever.
If you want to make the table smaller it's best to look at splitting some columns off into another table. If you'd like to pursue that option, post the entire structure. But note that the number of columns doesn't affect speed that much. I mean it can, but it's way down on the list of things that will slow you down.
Finally, 3,000 rows per day is about 1 million rows per year. If the database is tolerably well designed, MySQL can handle this easily.
Addendum: partial table structures plus sample query and pseudocode added to question.
The pseudocode shows the package table being queried all at once, then matching package_content rows being queried one at a time. This is a very slow way to go about things; better to use a JOIN:
SELECT
package.id,
user_id,
package_name,
date,
package_content.*
FROM package
INNER JOIN package_content on package.id = package_content.id
WHERE whatever
ORDER BY whatever
That will speed things up right away.
If you're displaying on a web page, be sure to limit results with a WHERE clause - nobody will want to see 1,000 or 3,000 or 1,000,000 packages on a single web page :)
Finally, as I mentioned before, the number of columns isn't a huge worry for query optimization, but...
Having a really wide result row means more data has to go across the wire from MySQL to PHP, and
It isn't likely you'll be able to display 30+ columns of information on a web page without it looking terrible, especially if you're reading lots of rows.
With that in mind, you'll be better of picking specific package_content columns in your query instead of picking them all with a SELECT *.
Don't combine any columns, this is no use and might even be slower in the end.
You should use indexes on a column where you query at. I do have a website with about 30 columns where atm are around 600.000 results. If you use EXPLAIN before a query, you should see if it uses any indexes. If you got a JOIN with 2 values and a WHERE at the same table. You should make a combined index with the 3 columns, in order from JOIN -> WHERE. If you join on the same table, you should see this as a seperate index.
For example:
SELECT p.name, p.id, c.name, c2.name
FROM product p
JOIN category c ON p.cat_id=c.id
JOIN category c2 ON c.parent_id=c2.id AND name='Niels'
WHERE p.filterX='blaat'
You should have an combined index at category
parent_id,name
AND
id (probably the AI)
A index on product
cat_id
filterX
With this easy solution you can optimize queries from NOT DOABLE to 0.10 seconds, or even faster.
If you use MySQL 5.6 you should step over to INNODB because MySQL is better with optimizing JOINS and sub queries. Also MySQL will try to run them into MEMORY which will make it a lot faster aswel. Please keep in mind that backupping INNODB tables might need some extra attention.
You might also think about making MEMORY tables for super fast querieing (you do still need indexes).
You can also optimize by making integers size 4 (4 bytes, not 11 characters). And not always using VARCHAR 255.

MySQL, how to repeat same line x times

I have a query that outputs address order data:
SELECT ordernumber
, article_description
, article_size_description
, concat(NumberPerBox,' pieces') as contents
, NumberOrdered
FROM customerorder
WHERE customerorder.id = 1;
I would like the above line to be outputted NumberOrders (e.g. 50,000) divided by NumberPerBox e.g. 2,000 = 25 times.
Is there a SQL query that can do this, I'm not against using temporary tables to join against if that's what it takes.
I checked out the previous questions, however the nearest one:
is to be posible in mysql repeat the same result
Only gave answers that give a fixed number of rows, and I need it to be dynamic depending on the value of (NumberOrdered div NumberPerBox).
The result I want is:
Boxnr Ordernr as_description contents NumberOrdered
------+--------------+----------------+-----------+---------------
1 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
2 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
....
25 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
First, let me say that I am more familiar with SQL Server so my answer has a bit of a bias.
Second, I did not test my code sample and it should probably be used as a reference point to start from.
It would appear to me that this situation is a prime candidate for a numbers table. Simply put, it is a table (usually called "Numbers") that is nothing more than a single PK column of integers from 1 to n. Once you've used a Numbers table and aware of how it's used, you'll start finding many uses for it - such as querying for time intervals, string splitting, etc.
That said, here is my untested response to your question:
SELECT
IV.number as Boxnr
,ordernumber
,article_description
,article_size_description
,concat(NumberPerBox,' pieces') as contents
,NumberOrdered
FROM
customerorder
INNER JOIN (
SELECT
Numbers.number
,customerorder.ordernumber
,customerorder.NumberPerBox
FROM
Numbers
INNER JOIN customerorder
ON Numbers.number BETWEEN 1 AND customerorder.NumberOrdered / customerorder.NumberPerBox
WHERE
customerorder.id = 1
) AS IV
ON customerorder.ordernumber = IV.ordernumber
As I said, most of my experience is in SQL Server. I reference http://www.sqlservercentral.com/articles/Advanced+Querying/2547/ (registration required). However, there appears to be quite a few resources available when I search for "SQL numbers table".

MySQL Query Search using Multiple Rows

Firstly I'd like to start by apologizing for the potentially miss-leading title... I am finding it difficult to describe what I am trying to do here.
With the current project I'm working on, we have setup a 'dynamic' database structure with MySQL that looks something like this.
item_details ( Describes the item_data )
fieldID | fieldValue | fieldCaption
1 | addr1 | Address Line 1
2 | country | Country
item_data
itemID | fieldID | fieldValue
12345 | 1 | Some Random Address
12345 | 2 | United Kingdom
So as you can see, if for example I wanted to lookup the address for the item 12345 I would simply do the statement.
SELECT fieldValue FROM item_data WHERE fieldID=1 and itemID=12345;
But here is where I am stuck... the database is relatively large with around ~80k rows and I am trying to create a set of search functions within PHP.
I would like to be able to perform a query on the result set of a query as quickly as possible...
For example, Search an address name within a certain country... ie: Search for the fieldValue of the results with the same itemID's as the results from the query:
'SELECT itemID from item_data WHERE fieldID=2 and fieldValue='United Kingdom'..
Sorry If I am unclear, I have been struggling with this for the past couple of days...
Cheers
You can do this in a couple of ways. One is to use multiple joins to the item_data table with the fieldID limited to whatever it is you want to get.
SELECT *
FROM
Item i
INNER JOIN item_data country
ON i.itemID = country.itemID
and fieldid = 2
INNER JOIN item_data address
ON i.itemID = country.itemID
and fieldid = 1
WHERE
country.fieldValue= 'United Kingdom'
and address.fieldValue= 'Whatever'
As an aside this structure is often referred to as an Entry Attribute Value or EAV database
Sorry in advance if this sounds patronizing, but (as you suggested) I'm not quite clear what you are asking for.
If you are looking for one query to do the whole thing, you could simply nest them. For your example, pretend there is a table named CACHED with the results of your UK query, and write the query you want against that, but replace CACHED with your UK query.
If the idea is that you have ALREADY done this UK query and want to (re-)use its results, you could save the results to a table in the DB (which may not be practical if there are a large number of queries executed), or save the list of IDs as text and paste that into the subsequent query (...WHERE ID in (...) ... ), which might be OK if your 'cached' query gives you a manageable fraction of the original table.

MySQL: How to pull information from multiple tables based on information in other tables?

Ok, I have 5 tables which I need to pull information from based on one variable.
gameinfo
id | name | platforminfoid
gamerinfo
id | name | contact | tag
platforminfo
id | name | abbreviation
rosterinfo
id | name | gameinfoid
rosters
id | gamerinfoid | rosterinfoid
The 1 variable would be gamerinfo.id, which would then pull all relevant data from gamerinfo, which would pull all relevant data from rosters, which would pull all relevant data from rosterinfo, which would pull all relevant data from gameinfo, which would then pull all relevant data from platforminfo.
Basically it breaks down like this:
gamerinfo contains the gamers basic
information.
rosterinfo contains basic information about the rosters
(ie name and the game the roster is
aimed towards)
rosters contains the actual link from the gamer to the
different rosters (gamers can be on
multiple rosters)
gameinfo contains basic information about the games (ie
name and platform)
platform info contains information about the
different platforms the games are
played on (it is possible for a game
to be played on multiple platforms)
I am pretty new to SQL queries involving JOINs and UNIONs and such, usually I would just break it up into multiple queries but I thought there has to be a better way, so after looking around the net, I couldn't find (or maybe I just couldn't understand what I was looking at) what I was looking for. If anyone can point me in the right direction I would be most grateful.
There is nothing wrong with querying the required data step-by-step. If you use JOINs in your SQL over 5 tables, we sure to have useful indexes on all important columns. Also, this could create a lot of duplicate data:
Imagine this: You need 1 record from gamerinfo, maybe 3 of gameinfo, 4 ouf of rosters and both 3 out of the remaining two tables. This would give you a result of 1*3*4*3*3 = 108 records, which will look like this:
ID Col2 Col3
1 1 1
1 1 2
1 1 3
1 2 1
... ... ...
You can see that you would fetch the ID 108 times, even if you only need it once. So my advice would be to stick with mostly single, simple queries to get the data you need.
There is no need for UNION just multiple JOINs should do the work
SELECT gameinfo.id AS g_id, gameinfo.name AS g_name, platforminfoid.name AS p_name, platforminfoid.abbreviation AS p_abb, rosterinfo.name AS r_name
FROM gameinfo
LEFT JOIN platforminfo ON gameinfo.platforminfoid = platforminfo.id
LEFT JOIN rosters ON rosters.gameinfoid = gameinfo.id
LEFT JOIN rosterinfo ON rosterinfo.id = rosters.rosterinfoid
WHERE gameinfo.id = XXXX
this should pull all info about game based on game id
indexing on all id(s) gameinfoid, platformid, rosterinfoid will help on performance