Questions about FriendFeed's MySql SchemaLess Design - mysql

Bret Taylor discussed the SchemaLess Design in this blog post: http://bret.appspot.com/entry/how-friendfeed-uses-mysql
It looks like they stored different class's Objects into only one table.Then build more index tables.
my question is that how to build index on one class.
for example, a user's blog is {id,userid,title,body}. A user's tweet is {id,userid,tweet}.
If I want to build an index for users' blogs how can I do?

It's very simple -- perhaps simpler than you expect.
When you store a blog entity, you're going to insert to the main entities table of course. A blog goes like this:
CREATE TABLE entities (
id INT AUTO_INCREMENT PRIMARY KEY,
entity_json TEXT NOT NULL
);
INSERT INTO entities (id, entity_json) VALUES (DEFAULT,
'{userid: 8675309,
post_date: "2010-07-27",
title: "MySQL is NoSQL",
body: ... }'
);
You also insert into a separate index table for each logical type of attribute. Using your example, the userid for a blog is not the same as a userid for a tweet. Since you just inserted a blog, you then insert into index table(s) for blog attribute(s):
CREATE TABLE blog_userid (
id INT NOT NULL PRIMARY KEY,
userid BIGINT UNSIGNED,
KEY (userid, id)
);
INSERT INTO blog_userid (id, userid) VALUES (LAST_INSERT_ID(), 8675309);
CREATE TABLE blog_date (
id INT NOT NULL PRIMARY KEY,
post_date DATETIME UNSIGNED,
KEY (post_date, id)
);
INSERT INTO blog_date (id, post_date) VALUES (LAST_INSERT_ID(), '2010-07-27');
Don't insert into any tweet index tables, because you just created a blog, not a tweet.
You know all rows in blog_userid reference blogs, because that's how you inserted them. So you can search for blogs of a given user:
SELECT e.*
FROM blog_userid u JOIN entities e ON u.id = e.id
WHERE u.userid = 86765309;
Re your comment:
Yes, you could add real columns to the entities table for any attributes that you know apply to all content types. For example:
CREATE TABLE entities (
id INT AUTO_INCREMENT PRIMARY KEY,
entity_type INT NOT NULL,
creation_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
entity_json TEXT NOT NULL
);
The columns for entity_type and creation_date would allow you to crawl the entities in chronological order (or reverse chronological order) and know which set of index tables matches the entity type of a given row.

They do not store objects of different classes in the same table. The 'entities' table they are referring to is used to store only one kind of entities.
For example, a typical entity in FriendFeed might look like this:
"id": "71f0c4d2291844cca2df6f486e96e37c",
"user_id": "f48b0440ca0c4f66991c4d5f6a078eaf",
"feed_id": "f48b0440ca0c4f66991c4d5f6a078eaf",
"title": "We just launched a new backend system for FriendFeed!",
"link": "http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c",
"published": 1235697046,
"updated": 1235697046,
To understand the implementation better, have a look at the example given here: https://github.com/jamesgolick/friendly#readme

Related

What's the best practice to design a table that would have different fields on different conditions?

I need advice in creating tables where there would be different fields based on a condition. I'm pretty new to psql, so I don't really know if I'm going the right way and would appreciate any tips / advice!
Currently I have a table to represent a meeting_note, which can either be a voice recording OR a text.
When the meeting note is of type text, it must have a meeting_content, and can have an optional meeting_summary. audio_source should be null.
When the meeting note is of type audio, it must have an audio_source and the fields meeting_content and meeting_summary should be null.
I was also thinking of creating two tables - one for type audio and another for text, but there is a unique constraint on created_at which represents a date like May 11th. I wasn't sure how to add this constraint between two tables.
Here are the fields for the table meeting_note
id serial PRIMARY KEY,
meeting_id integer REFERENCES meeting(id),
meeting_note_type enum('audio', 'text') NOT NULL,
meeting_content text,
summary varchar(255),
created_at varchar(10) NOT NULL,
recording_source varchar(255)
and the constraints:
UNIQUE (to_char(created_at, 'YYYY-MM-DD')),
CHECK (NOT (meeting_note_type = 'text' AND meeting_content IS NULL)),
CHECK (NOT (meeting_note_type = 'audio' AND audio_source IS NULL)),
CHECK (NOT (meeting_content IS NULL AND audio_source IS NULL),
CHECK (NOT (meeting_content IS NOT NULL AND audio_source IS NOT NULL),
CHECK (NOT (audio_source IS NOT NULL AND summary IS NOT NULL))
Appreciate any help on this. Thank you so much in advance!
There are two common approaches to this problem - using a table-per-type and using one table for everything. The approach you describe in the question is one table for everything; your definition is pretty accurate.
Here is how to do a table-per-type solution: make a "master" table for all notes, and then a table for each note sub-type, like this:
create table note_master(
id serial PRIMARY KEY,
meeting_id integer REFERENCES meeting(id),
created_at varchar(10) NOT NULL
)
create table note_text (
id serial REFERENCES note_master(id),
meeting_content text,
summary varchar(255),
)
create table note_audio (
id serial REFERENCES note_master(id),
recording_source varchar(255)
)
To query for everything you do left-outer joins to note_text and note_audio. This approach lets you skip the enum because you can always figure out what kind of note it is by examining the results of the join.

Working with a JSON array of objects in MYSQL

I have an SQL table:
CREATE TABLE pu_events(
int eid AUTO_INCREMENT PRIMARY KEY,
varchar(20) title,
varchar(255) description,
int(11) start_date UNSIGNED,
int(11) end_date UNSIGNED,
timestamp created DEFAULT CURRENT_TIMESTAMP,
json members
)
I plan on populating the members field with a members json object which will be an array of objects containing the user id (uid) and status of attending members, such as:
{members: [{uid:1, status:0}, {uid:2, status:1}]}
But I'm having trouble finding any resources which describe how to correctly reference this object structure to manipulate it, for example if i wish to 'register' a user to the event, to append their object to the array of members like: (members.push({uid:3, status:0}), or to update the status of a given user once they are confirmed or resign from the event, like: (update members set status = 2 where uid = 1;).
I understand that the pseudo-c I've used is a combination of js & mysql, and i also understand that MySQL now has JSON functions for manipulating this datatype, but I'm confused with the best way to approach this particular use case.
Many thanks for any advice in advance!
The proper solution is to create another table:
CREATE TABLE pu_event_members (
event_id INT NOT NULL,
user_id INT NOT NULL,
status TINYINT NOT NULL,
PRIMARY KEY (event_id, user_id)
);
Then it's easy to register a new member of the event:
INSERT INTO pu_event_members SET event_id=?, user_id=?, status=?
Or update their status:
UPDATE pu_event_members SET status=? WHERE event_id=? AND user_id=?
Working example but only select:
select pu_events.*
from pu_events, JSON_TABLE(members, "$.members[*].uid" COLUMNS(f INT PATH '$')) as list
where list.f = 1
Mysql 8 has support JSON_TABLE
https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html
https://dev.mysql.com/doc/refman/8.0/en/json-table-functions.html

Saving records in MySQL which does not pre-exist

Let say, I have a pre-defined table called cities, with almost all the cities in my country.
When a user register himself (user table), the column cities_id in the table user stores the city id from the table cities (Foreign Key, reference table cities), something like
CREATE TABLE `cities` (
`id` int,
`city_name` varchar(100)
)
CREATE TABLE `user` (
`id` int,
`name` varchar(60)
`****`
`cities_id` FK
)
The user table stores the city id.
But what if I missed a few cities ... How does the user then save his city name in the user table which does not accept any city name but only IDs.
Can I have one more column city_name right after the cities_id in the table user something like
CREATE TABLE `user` (
`id` int,
`name` varchar(60)
`****`
`cities_id` FK
`citiy name` varchar(100)
)
to record the data entered by the user at the time of registration? Can this be done?
You can add a type to city table tag, the user can't find their corresponding to the city allows him to type the name of his city, and then you in the city, and will create a corresponding record in the table type marked as a special status (convenient operating personnel check and correction), at the same time to save the record id to the user record
CREATE TABLE `cities` (
`id` int,
`city_name` varchar(100),
`type` int,
)
CREATE TABLE `user` (
`id` int,
`name` varchar(60)
`****`
`cities_id` FK
)
As #Joakim mentioned in the comment, from a DB perspective, as cities_id is a foreign key referencing to the cities table, inserting a record to the user table will fail if the city in question is not already there in the table.
From a programming perspective, if you want a city which is not there in the table should be first inserted automatically whenever a user is registering, it is possible. Assuming you are using Java and Hibernate and User entity contains City entity, then calling saveOrUpdate() method on the user entity will cause the city record to be inserted if not already there, and a user record will then be inserted into the User table.
That's how I would quickly solve this
Create an additional table to store the missing cities, that will be introduced by users
CREATE TABLE `cities_users` (
`id` int,
`city_name` varchar(100),
`added_by` varchar(100),
`added_TS` DATETIME DEFAULT CURRENT_TIMESTAMP
);
Create a VIEW that UNION the 2 cities tables :
CREATE VIEW all_cities AS
SELECT id, city_name FROM `cities`
UNION ALL
SELECT id, city_name FROM `cities_users`;
Whenever a user register, you query the VIEW to check if the user's city exists. That way you'll kknow if a city exists in your original table OR the cities introduced by users.
If not, you INSERT the new city in the cities_users table (along with the user that created it for logging purposes).
You should generate a unique ID properly, ie one that can't ever exists in the cities table. You can do this in various ways, here's a quick example : Take the last ID in the cities_users table and add 1 million to it. Your cities_users IDs will be like: 1000001, 1000002, 1000003
And finally, you insert the generated cities_users ID in the users table.
Having a separate table for user inputs should help you to keep the database clean :
Your original cities table remains totally unchanged
You will know easily at all times the new cities added by whom and when, and you can create a small interface to review and manage that.
Your users are working for you to complete your database.
If a user suggest a new city you should create a new record into cities table and store city_id into users table. This is the best way to store the table records.
I feel like it should be pointed out, despite answers to the contrary, that your original suggestion of adding a city_name column to the table will work fairly well
If you allow both cities_id and city_name to be nullable then you can validate that one and only one of them is set in the application logic
The benefit of this approach is that it would keep your city table 'pure' and allow you to count duplicates of and analyse the user supplied cities easily
It would however add a very sparse nullable city_name column in your table
I guess it depends on how you want to get the city from the user, (drop-down + text box for others, text-box with suggestions, just a text box) and what you plan to do with the cities you have gathered
You could even change the label to 'city (or nearest city)' with a hard-coded drop-down, or searchable drop-down, and not allow user supplied cities
If you have a buffer table where the raw data is put in, i.e. the relationship between city_name, user_name
CREATE TABLE `buffer_city_user` (
`buffer_id` int,
`city_name` varchar(100),
`user_name` varchar(100),
);
you can first process the buffer table for new city_names - if found, insert into table cities.
Then insert the user info - any new city-names should already be in the cities table and no foreign key issues will occur.

sql good practice for holding long strings as fields

I am creating a sql database with a table holding questionnaire answers. The questions are full sentences (about 150 characters each) and I want to know what is the best method for maintaining that information as the fields. I am till new to SQL, but I see two options:
set each question as a number (1, 2, 3, 4...) and have a separate table holding the actual questions as the data that links to the number in the first table.
some method in CREATE TABLE that lets you set the field as a sentence. I though quotes would work, but they do not.
EDIT:
a quick example of what i am trying to do:
CREATE TABLE survey(
index_id INT PRIMARY KEY,
'between 1 and 10, how do you feel about the transparency of the scientific community?' VARCHAR(5)
);
Thanks!
You are mixing up the data in a table and creating the table.
When you create the table you define the structure of the table
Then you can add data to the table
Then you can query the table.
So for example create a table.
create table questionanswer (
questionnumber integer,
answer varchar(200)
)
add data to the table
insert into questionanswer (questionnumber, answer)
values (1, 'election day')
query the table for values
select answer
from questionanswer
where questionnumber = 1
Generally using VARCHAR(255) with encoding utf8mb4 is a good default. If you need long-form data, like essays, multiple paragraphs, etc. then use TEXT or LONGTEXT.
This is really a one-table problem:
CREATE TABLE questions (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
questionnaire_id INT NOT NULL,
num INT NOT NULL DEFAULT 0,
question VARCHAR(255) NOT NULL
);
Where if you want you can have multiple questionnaires by adding another questionnaire table, or just use that number as-is for partitioning the questions.

WEBSQL simple insert into populated table

Can't seem to figure out why this simple statement doesn't work
tx.executeSql("INSERT INTO history SELECT * FROM scan");
It works correctly if the table history is empty which is not of much use but if the table history has any data then it does not carry out the insert I must do:
tx.executeSql("DELETE FROM history", []);
tx.executeSql("INSERT INTO history SELECT * FROM scan");
Any ideas? Cheers
Edit:
Structures are the same:
tx.executeSql("CREATE TABLE IF NOT EXISTS scan(ID INTEGER NOT NULL PRIMARY KEY, sunum TEXT, binnum TEXT, userid TEXT, added_on DATETIME, upload_on DATETIME)");
tx.executeSql("CREATE TABLE IF NOT EXISTS history(ID INTEGER NOT NULL PRIMARY KEY, sunum TEXT, binnum TEXT, userid TEXT, added_on DATETIME, upload_on DATETIME)");
The problem is that you're attempting to insert a duplicate primary key value into the History table. From your structure, they both have ID listed as a PRIMARY KEY, which cannot contain duplicate values.
Try specifying all columns except for that key:
INSERT INTO History
(sunum, binnum, userid, added_on, upload_on)
SELECT sunum, binnum, userid, added_on, upload_on
FROM Scan
Though, looking at the structure, the ID values aren't auto-incremented. If you don't care what the ID is, you can declare the ID column as: ID INT NOT NULL AUTO_INCREMENT.
If you need to pull the ID over from the other table, you'll have to do an upsert or merge into that table.