Creating News Feeds - mysql

How would you create a news feed such as something facebook has implemented or even LinkedIn? Is there a Generic table in the database Such as:
------------------
| Feed |
------------------
| int Id |
| string type |
| string refId |
| string text |
| Date date |
------------------
Where the refId would be the reference the the key item in another table and type would be something like Upload,Delete,Post Comment, etc and you would have to add a new Feed item to the database for whatever you want to be in the feed. Like if you uploaded multiple documents, you would do refId = "101,203,303" which references another tables keys. Any reading material which gives me an idea of this would be appreciated or even a good solution for small/medium websites.

Related

how to create an elastic search index which emulates my mysql db

I am new to elasticsearch and i am having tough time switching from mysql to elasticsearch
my Mysql my tables looks like this
table : test_request
+---------+-------------+--------------+-----------+------------+-----------+
| test_id | device_name | ip_address | user_name | time_stamp | show_flag |
+---------+-------------+--------------+-----------+------------+-----------+
| 1 | d1 | 0.0.0.0 | admin | | Y |
+---------+-------------+--------------+-----------+------------+-----------+
table: test_results
+----+---------+-----+-----------------------+-------------------------+----------------------------------+-----------+
| id | test_id | cli | xml | json | another json | show_flag |
+----+---------+-----+-----------------------+-------------------------+----------------------------------+-----------+
| 1 | 1 | c1 | some xml format data | {"some":"json here"} | {"some":" another json here"} | Y |
+----+---------+-----+-----------------------+-------------------------+----------------------------------+-----------+
| 2 | 1 | c2 | some xml format data | {"some":"json here"} | {"some":" another json here"} | Y |
+----+---------+-----+-----------------------+-------------------------+----------------------------------+-----------+
| 3 | 1 | c2 | some xml format data | {"some":"json here"} | {"some":" another json here"} | Y |
+----+---------+-----+-----------------------+-------------------------+----------------------------------+-----------+
the test_id field in the test_request table and the id field in the test_results table are auto increment. The json and another json fields are of data type JSON.
I am trying to use elasticsearch_dsl to create index and its mappings. I am going through the docs to figure out how to do achieve this but i couldn't figure out three things
how to get the test_id to auto increment
how to make a field of JSON data type
Best way to setup a relationship between both (i partially understood nested could help here) but looking for the correct way to do this
The auto increment id columns play following rules in the SQL tables:
they are unique identifiers of the row
they allow to link rows between tables
To achieve this in elasticsearch you don't need a auto increment field. You can add document to elasticsearch index, and elasticsearch will add a unique id to it.
For JSON fields use simply object datatype.
There are few options to setup relation like SQL join:
You can put test_results as nested objects within test_request document
You can use join datatype field to link test_results documents to test_request document within the same index
You can denormalize and store every test_result into single document together with its test_request. It is ok, that test_request will be stored many times. Elasticsearch is primarily for searching anyway.
Which version you choose is up to you. It depends, how are you going to use your data, what kind of queries are you going to do. Can you collect all test_results together with the test_request and store it with the single call, or do you need store the test_request and the successively add test_results?
Successively updating nested field would mean reindexing the whole document every time. Join datatype is expensive for querying.
Denormalization adds space usage, but if the number of test_results per request is not large, then it is maybe the best option.

Defining queries around a designed database

I have a database which contains a lot of data and although I was not involved in setting it up it is what I have to work with.
Within this database is somewhat of a lookup table. However, this table has no link to any other tables. It essentially takes the following form
ID | input | table_name |
-------------------------------------
1 | Movie | movie_tbl |
2 | Cartoon | cartoon_tbl |
3 | Animation | cartoon_tbl |
4 | Audio | audio_tbl |
5 | Picture | picture_tbl |
The table is a lot larger than the above, but the structure is as above. So what happens is someone visits my site. Here, they have an input field. Say they enter Movie then the above table is called to find the input with Movie. It then gets what table it needs to look in. I would imagine that the query would be something like
SELECT table_name FROM lookup_table WHERE input LIKE Movie;
Now that should return movie_tbl. I now know that I need to search for Movie within movie_tbl and return all the data for its row. So movie_tbl might be like this (data would be some type of data and the column names different)
ID | input | col_1 | col_2 | col_3 |
----------------------------------------------------
1 | Movie | data | data | data |
2 | Cartoon | data | data | data |
3 | Animation | data | data | data |
4 | Audio | data | data | data |
5 | Picture | data | data | data |
So now my query will be something like this
SELECT * FROM movie_tbl WHERE input LIKE Movie;
Now the tables have tens of thousands of lines of data. My real question is whether the above will be effecient or not? With the database I was given however, I do not see any other way I could do this (I cant touch the database). Is there anything I can do to make this more effecient?
Any advice appreciated
Thanks
Why are you checking for input in the 2nd table? You have already filtered the input from the first table:
SELECT table_name FROM lookup_table WHERE input LIKE Movie;
In this case you dont have to make 2 queries. Just the 2nd one should suffice. Or just having Movie data in the 2nd table and separate tables for Cartoon, Animation etc. Because then you wont be accessing the 'WHERE' clause, just:
SELECT * FROM movie_tbl;
2nd Suggestion: Use = instead of LIKE. No need for pattern matching if you know the exact input string.

Data structure for a set of changes similar to SVN?

So far we have been storing information of changes as following.
Imagine having a changeset table structure of something that gets changed that is called object. The object is connected to say a foreign element by a foreign key. The object gets created like this
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
Now we change the name, the table will look like that after the name change
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
This structure is exactly the minimum. It contains exactly the change we did. But to create the current version of the object, we have to add up the changes to actually get the final version. E.g.
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | null | foo | null
*2015-04-29 23:30:01 | 2 | 123 | foo | none
the * marking the final version, which does not exist in the DB.
So if we only store exactly the changes, we have more work to do. Especially, when coming from a foreign object f. If I have a number of objects f and I want to get all changes to the object from our table, I have to create a bit of an ugly SQL. This obviously gets worse, the more foreign objects you have.
Basically I have to do:
Select all F that I want and
Select all objects WHERE foreignKey = foreignId
OR Select all objects that have objectId in (Select all objects that have foreignKey = foreignId)
e.g. I have to select the objects that have foreignKey 123 or elements that have foreignKey null but there exists an entry with same objectId with foreignKey 123.
The more dependencies, the uglier this SQL gets obviously.
Did I make myself clear?
Wouldn't it be much easier to keep always all fields in all versions
e.g. a simple name change gets:
changesetId (Timestamp) | objectId | foreignKey | name (String) | description (String)
2015-04-29 23:28:52 | 2 | 123 | none | none
2015-04-29 23:30:01 | 2 | 123 | foo | none
now to create a diff I have to compare both versions, but I don't have to do the extra work for selecting the right elements nor for calculating the final version of said timestamp.
What do you consider the proven best solution?
how is svn doing it?
For your use case the method you suggest seem to be better. Key value stores like LSM trees do exactly the same. They just write a newer version of the object without deleting the older version. If, at any point of time, you need the change that was made, I think you can just diff two adjacent versions.
The second method might use more space if you have a lot of variable length text fields, but that's a trade-off you get for speed and maintainability.

How can I retriever the number of DISTINCT projects in one table to store a boolean for each user and each project in another table?

I have a table for project announcements with the structure -
**Announcements Table**
| ---------- | ----------------- |
| Project | Announcement |
| ---------- | ----------------- |
| [String] | [String] |
| ---------- | ----------------- |
I want to allow users to subscribe to projects they would like to receive announcements for so that they may receive these announcements by email. Therein lies my problem.
I could structure the database as follows -
**Subscription Table**
| ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Name | Email | Project1 | Project2 | Project3 | ...etc... |
| ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| [String] | [String] | [Boolean] | [Boolean] | [Boolean] | [Boolean] |
| ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
with a Boolean representing whether or not each user is subscribed to each project.
This has the advantage of being a simple structure, however the disadvantage that any new projects would have to be added manually. This should not be terribly difficult because there are only likely to be <4 new projects per year added to the Announcements table, though obviously it is not ideal.
Obviously to get the number of distinct projects in the announcements table DISTINCT or GROUP BY could be used. But I am not sure how to use this in order to store data / generate data for the second table.
You really shouldn't be doing that. Having a list of boolean columns will give you problems... a lot of problems. As projects change over time, you will have to do maintenance on the table and it will be hell.
If you want to start with a good normalized structure, you should look at something like this:
Users
id INTEGER PRIMARY KEY
Name VARCHAR
email VARCHAR
Projects
id INTEGER PRIMARY KEY
Name VARCHAR
Annoucements
id INTEGER PRIMARY KEY
project_id INTEGER REFERENCES Projects (id)
text VARCHAR
Subscriptions
user_id INTEGER REFERENCS Users (id)
project_id INTEGER REFERENCES Projects (id)
with a UNIQUE or PRIMARY key on user_id, project_id
This is how you should handle many-to-many relations. No structure changes are required when you need a new project. It is all reduced to adding or removing rows from one or more tables... rows, not columns.
For example... if you want a list of all annoucements that users have subscribed to:
SELECT u.Name, u.email, p.Name, a.text
FROM subscriptions s
JOIN users u ON u.id=s.user_id
JOIN projects p ON p.id=s.project_id
JOIN annoucements a ON a.project_id=s.project_id
Separating Projects from Announcements is not necessary, but I do advise it. If you have few projects and many announcements, or if you have different people managing projects and annoucements, then you should definitely use 2 different tables. Using a VARCHAR column to identify groups of rows is not a good habit for various reasons (spelling mistakes, upper/lower case). Your foreign keys should always be field the user knows nothing about and will never want to change.
If you ever want to change a project name, you will have to do it massively. If you ever want to add other info concerning projects, you cannot do it on the annoucements table without duplicating data... which is not a good thing... and will bring integrity problems sooner or later.

MySQL Multiple references between tables

This question is probably quite easy to answer, but since I haven't got much experience in database design, I'm stuck. I don't even know how to google this because I don't know the terminology ....
I have a mysql database with two tables and in the first table i need to make MULTIPLE references to the second table. What should I use? Can I select multiple matches with Enum? Or should I just use a comma separated list and varchar?
|MAIN TABLE
|==========================================
| id (primary index)
| date (tstamp)
| product name (varchar)
| componentids (int, enum, varchar ???)
|===========================================
|COMPONENTS TABLE
|===========================================
| componentid (int)
| name (varchar)
| info (varchar)
|===========================================
so a possible scenario would be this:
|MAIN TABLE
|=====================================================
| id | 1 | 2 |
| date | 34958734958 | 349587123138 |
| product name | A test product | A second product |
| componentids | 2,3 | 1,2 |
|=====================================================
|COMPONENTS TABLE
|========================================================
| componentid | 1 | 2 | 3 |
| name | Component 1 | Component 2 | Component 1 |
| info | info. text | info. text | info. text |
|========================================================
how do I achieve this in an effective way?
thank you very much for your help!
What you're after is a many-to-many relationship. Each component can belong to multiple products, and each product can have multiple components. I'd strongly recommend using a third table for this, maybe called product_components.
Your main table has (id, date, name)
Your components table has (id, name, info)
Your product_components table has (product_id, component_id). Each of these is a foreign key that references the main table and component table respectively.
This maintains "referential integrity" which means that it becomes impossible to have a product referring to a component that doesn't exist (e.g. the database will throw an error if you try).
And yes, you can select the multiple components associated with one product in one go this way.
SELECT components.*
FROM components
JOIN product_components
ON components.id = product_components.component_id
WHERE product_components.product_id = <some product id>
No comma-separated lists or varchar. That's not the relational way.
How should it go? Are there many rows in the main table for every one in component, or visa versa?
A one-to-many relationship means adding a foreign key to the many table and JOINing the two:
select *
from main
join component
on main.componentid = component.componentid
This will match all the rows in the main table with their component counterpart.