I'm making a friends list chrome extension for an online browser game I play. One feature is that friends can be able to chat with one another. The database I'm using is called firebase which stores its data in a JSON tree format.
My database has this structure:
Users
|
|_USER1
| |
| |__FRIENDS
|
|_USER2
|
|__FRIENDS
I'm trying to figure out what would be the best way to store chats as part of this database. The option I'm leaning towards right now would just keep a copy of the two users chats in both their section of the Users directory, looking like this:
Users
|
|_USER1
| |
| |__FRIENDS
| |
| |__CHATS
| |
| |__chat w/USER2
|
|_USER2
|
|__FRIENDS
|
|__CHATS
|
|__chat w/USER1
This would make it so on each message send I'd have to update two objects, one in each users section. Note since the tree is formatted as 'key/value' pairs, in the CHAT section of each user the keys would be the other user's name, while the value would be the list of messages sent.
Is this a decent way of organizing such a database? The game is pretty small so I'm not expecting huge traffic.
When it comes to the Firebase Database (and most NoSQL data stores), it's often best to flatten your data.
Users
|
|_USER1
| |
| |__FRIENDS
|
|_USER2
|
|__FRIENDS
UserChats
|
|_USER1
| |
| |__chat w/USER2
|
|_USER2
|
|__chat w/USER1
This way you can look up the user's friend list without having to load their list of chats.
Also look at this answer about a convenient scheme for constructing 1:1 chat room identifiers: Best way to manage Chat channels in Firebase
Related
The Problem
I landed a small gig to develop an online quoting system for an electronic distributor. He has roughly a half million parts - one little screw is considered a part, one little led, etc. So there are a LOT of parts.
One Important Note: This is only a RFQ ( Request for Quote ). There are no prices client-side, or totals, or anything to do with money. Just collecting a list of part numbers to send to my client.
I had to collect the part data from multiple sources (vendor website, scanned paper catalog, Excel spreadsheets, CSV files, and even a few JSON files. It was exhausting, but I got it done.
Results
Confusing at first. I had dozens of product categories, and some products had so many attributes that were not common to any other products. I could see this project getting very complicated, and given the fact I bid this job at $900 even, I had to simplify this somehow.
This is what I came up with, and received client approval.
Current Columns
+--------------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+--------------+------+-----+---------+-------+
| Datasheets | varchar(128) | YES | | NULL | |
| Image | varchar(85) | YES | | NULL | |
| DigiKey_Part_Number | varchar(46) | YES | | NULL | |
| Manufacturer_Part_Number | varchar(47) | YES | | NULL | |
| Manufacturer | varchar(49) | YES | | NULL | |
| Description | varchar(34) | YES | | NULL | |
| Quantity_Available | int(11) | YES | | NULL | |
| Minimum_Quantity | int(11) | YES | | NULL | |
+--------------------------+--------------+------+-----+---------+-------+
so all products will fit this page template (menu on bottom is error in screenshot):
Autocomplete Off The Table?
Early on in the design, I implemented a nice autocomplete feature:
BUT .. given the number of products in the table, is this even
practical anymore ???
FINAL PRODUCT COUNT: 223,347
What changes do I need to make to PRODUCTS table so that querying the table will not take forever?
These are the only queries the app will be making ( not sure if this info will help in your solution advice )...
Get all products by category:
Select * from products where category = 'semiconductors'
Get single product:
Select * from products where Manufacturer_Part_Number = '12345'
Get product count by category
I think those three actually cover everything I need to do. Maybe a couple more, but not many.
In closing...
Is there a way to "index" this table with 223000 records where searching by one or more columns can be done efficiently?
I am very new to database design, and know I do need to index SOMETHING, but ... WHAT???
Thank you for taking the time to look at this post.
Regards,
John
Listing the queries is mandatory to answering your question. Thanks for including them.
INDEX(category)
INDEX(Manufacturer_Part_Number)
But I suggest your second query should include Manufacturer, too. Then this would be better it:
INDEX(Manufacturer, Manufacturer_Part_Number)
Everything NULL? Seems unlikely.
(I've done jobs like yours; I can't imagine bidding only $900 for all that scraping.)
What will you do when there are a thousand items in a single category or manufacturer? A UI with a thousand-item list sucks.
For how to handle "so many attributes", I recommend http://mysql.rjweb.org/doc.php/eav (I should charge you $899 for the research that went into that document. Just kidding.)
Don't they need other lookups, like "Flash drive", which need to match "FLASH DRV"?
223K rows -- no problem. The VARCHARs seem to be too short; were they based on the data?
And the table needs a PRIMARY KEY.
I'm working on a database structure for a big project and I'm wondering what method use for the logs table.
I'm using Laravel 5.* with Eloquent.
This table will contain, User_id, User-Agent, IP, DNS, Lang....
Method A :
LOGS_TABLE :
| Id | user_id | dns | ip | user_agent .... |
|-----|----------|-----------------|----------|---------------------|
| 1 | 5 | dns.google.com | 8.8.8.8 | firefox.*........ |
Method B :
LOGS TABLE :
| Id | dns_id | ip_id | user_agent_id | |
|----|--------|-------|---------------|--|
| 1 | 1 | 1 | 1 | |
IP TABLE:
| Id | value |
|----|---------|
| 1 | 8.8.8.8 |
The problem is, there is 10 fields like this and I'm afraid that all the jointures will slowed the requests.
Why we save all the logs ? :
Our tool provide a complete and high standing IP filtering service. The purpose is to let our customers filter their advertised traffic, and choose who is seing their website exactly.
The main purpose is to choose excactly which page they want to send Facebook on, while advertising on Facebook for example.
All the traffic of the service is due to the visitor visiting ads of our customers.
Technically we just do a 301 redirect to the good page and we log the user data on our database.
Thank's for you help.
What do you want to achieve with the log database? If it is just inserting data, I would go for a denormalized table (option 1).
If you want to also select data on every request, both options will slow your application down. You should take a look at a nosql Database maybe.
partitioning
Another option can be to use partitioning, see: https://laracasts.com/discuss/channels/eloquent/partition-table
In this case you can work with a checksum of the unique data and store the corresponding data in a table with a prefix.
For example: $checksum = 'pre03k3I03fsk34jks354jks35m..';, store in table logs_p or logs_pr.
Do not forget to put an index on the checksum column.
We have been developing the system at my place of work for sometime now and I feel the database design is getting out of hand somewhat.
For example we have a table widgets (I'm spoofing these somewhat):
+-----------------------+
| Widget |
+-----------------------+
| Id | Name | Price |
| 1 | Sprocket | 100 |
| 2 | Dynamo | 50 |
+-----------------------+
*There's about 40+ columns on this table already
We want to add on a property for each widget for packaging information. We need to know if it has packaging information, if it doesn't have packaging information or we don't know if it does or doesn't. We then need to also store the type of packaging details (assuming it does or maybe it doesn't and it's reduntant info now).
We already have another table which stores the details information information (I personally think this table should be divided up but that's another issue).
PD = PackageDetails
+--------------------------------+
| System Properties |
+--------------------------------+
| Id | Type | Value |
| 28 | PD | Boxed |
| 29 | PD | Vacuum Sealed |
+--------------------------------+
*There's thousands of rows in the table for all system wide table properties
Instinctively I would create a number of mapping tables to capture this information. I have however been instructed to just add another column onto each table to avoid doing a join.
My solution:
Create tables:
+---------------------------------------------------+
| widgets_packaging |
+---------------------------------------------------+
| Id | widget_id | packing_info | packing_detail_id |
| 1 | 27 | PACKAGED | 2 |
| 2 | 28 | UNKNOWN | NULL |
+---------------------------------------------------+
+--------------------+
| packaging |
+--------------------+
| Id | |
| 1 | Boxed |
| 2 | Vacuum Sealed |
+--------------------+
If I want to know what packaging a widget has I join through to widgets_packaging and join again to packaging if I want to know the exact details. Therefore no more columns on the widgets table.
I have however been told to ignore this and put the value int for the packing information and another as a foreign key to System Properties table to find the packaging details. Therefore adding another two columns to the table and creating yet more rows in the system properties table to store package details.
+------------------------------------------------------------+
| Widget |
+------------------------------------------------------------+
| Id | Name |Price | has_packaging | packaging_details |
| 1 | Sprocket |100 | 1 | 28 |
| 2 | Dynamo |50 | 0 | 29 |
+------------------------------------------------------------+
The reason for this is because it's simpler and doesn't involve a join if you only want to know if the widget has packaging (there are lots of widgets). They are concerned that more joins will slow things down.
Which is the more correctly solution here and are their concerns about speed legitimate? My gut instint is that we can't just keep adding columns onto the widgets table as it is growing and growing with flags for properties at present.
The answer to this really depends on whether the application(s) using this database are read or write intensive. If it's read intensive, the de-normalized structure is a better approach because you can make use of indexes. Selects are faster with fewer joins, too.
However, if your application is write intensive, normalization is a better approach (the structure you're suggesting is a more normalized approach). Tables tend to be smaller, which means they have a better chance of fitting into the buffer. Also, normalization tends to lead to less duplication of data, which means updates and inserts only need to be done in one place.
To sum it up:
Write Intensive --> normalization
smaller tables have a better chance of fitting into the buffer
less duplicated data, which means quicker updates / inserts
Read Intensive --> de-normalization
better structure for indexes
fewer joins means better performance
If your application is not heavily weighted toward reads over writes, then a more mixed approach would be better.
I am making a website. In the database I have a table of articles that kind of looks like this:
id | name | cats | etc.
------------------------------------------------------
1 | "alice" | "this, that, those, them" |
2 | "bob" | "this, that, those" |
3 | "carol" | "this, banana, cupcake" |
4 | "dave" | "other, unrelated, words" |
5 | "errol" | "those, them, fishstick" |
When viewing an article I want to also show some of the most related articles, based on the amount of categories in common.
For example, if I was viewing the Alice article I would want to pick out (in order of preference) Bob (3 cats in common), Errol(2), Carol(1).
I am aware that this would be easier if the data was normalised (I could for example do this) but unfortunately that's not really an option.
I ended up creating a couple of extra tables and populating them with properly normalized data every time something was saved. These run alongside the existing tables so it's not the cleanest of solutions but it works and the query speeds are excellent.
I need to create a large scale DB Model for a web application that will be multilingual.
One doubt that I've every time I think on how to do it is how I can resolve having multiple translations for a field. A case example.
The table for language levels, that administrators can edit from the backend, can have multiple items like: basic, advance, fluent, mattern... In the near future probably it will be one more type. The admin goes to the backend and add a new level, it will sort it in the right position.. but how I handle all the translations for the final users?
Another problem with internationalization of a database is that probably for user studies can differ from USA to UK to DE... in every country they will have their levels (that probably it will be equivalent to another but finally, different). And what about billing?
How you model this in a big scale?
Here is the way I would design the database:
Visualization by DB Designer Fork
The i18n table only contains a PK, so that any table just has to reference this PK to internationalize a field. The table translation is then in charge of linking this generic ID with the correct list of translations.
locale.id_locale is a VARCHAR(5) to manage both of en and en_US ISO syntaxes.
currency.id_currency is a CHAR(3) to manage the ISO 4217 syntax.
You can find two examples: page and newsletter. Both of these admin-managed entites need to internationalize their fields, respectively title/description and subject/content.
Here is an example query:
select
t_subject.tx_translation as subject,
t_content.tx_translation as content
from newsletter n
-- join for subject
inner join translation t_subject
on t_subject.id_i18n = n.i18n_subject
-- join for content
inner join translation t_content
on t_content.id_i18n = n.i18n_content
inner join locale l
-- condition for subject
on l.id_locale = t_subject.id_locale
-- condition for content
and l.id_locale = t_content.id_locale
-- locale condition
where l.id_locale = 'en_GB'
-- other conditions
and n.id_newsletter = 1
Note that this is a normalized data model. If you have a huge dataset, maybe you could think about denormalizing it to optimize your queries. You can also play with indexes to improve the queries performance (in some DB, foreign keys are automatically indexed, e.g. MySQL/InnoDB).
Some previous StackOverflow questions on this topic:
What are best practices for multi-language database design?
What's the best database structure to keep multilingual data?
Schema for a multilanguage database
How to use multilanguage database schema with ORM?
Some useful external resources:
Creating multilingual websites: Database Design
Multilanguage database design approach
Propel Gets I18n Behavior, And Why It Matters
The best approach often is, for every existing table, create a new table into which text items are moved; the PK of the new table is the PK of the old table together with the language.
In your case:
The table for language levels, that administrators can edit from the backend, can have multiple items like: basic, advance, fluent, mattern... In the near future probably it will be one more type. The admin goes to the backend and add a new level, it will sort it in the right position.. but how I handle all the translations for the final users?
Your existing table probably looks something like this:
+----+-------+---------+
| id | price | type |
+----+-------+---------+
| 1 | 299 | basic |
| 2 | 299 | advance |
| 3 | 399 | fluent |
| 4 | 0 | mattern |
+----+-------+---------+
It then becomes two tables:
+----+-------+ +----+------+-------------+
| id | price | | id | lang | type |
+----+-------+ +----+------+-------------+
| 1 | 299 | | 1 | en | basic |
| 2 | 299 | | 2 | en | advance |
| 3 | 399 | | 3 | en | fluent |
| 4 | 0 | | 4 | en | mattern |
+----+-------+ | 1 | fr | élémentaire |
| 2 | fr | avance |
| 3 | fr | couramment |
: : : :
+----+------+-------------+
Another problem with internationalitzation of a database is that probably for user studies can differ from USA to UK to DE... in every country they will have their levels (that probably it will be equivalent to another but finally, different). And what about billing?
All localisation can occur through a similar approach. Instead of just moving text fields to the new table, you could move any localisable fields - only those which are common to all locales will remain in the original table.