How do I prevent users from entering profanities? [duplicate] - language-agnostic

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do you implement a good profanity filter?
I have to take a city name from users as input, but I don't want to accept any profanities. Can anyone tell me how I can keep users from typing such words?

You'd have to scan the input for all vulgarity, either post-entry or during the entry, and reject the content then. Inversoft (http://www.inversoft.com/) has a web service available that can help; you can also do a google search for "java profanity filter" to find other similar products.

Define a list of profanities.
Ask the user for input.
Check the input. If it is equal to any item in your list of profanities, reject it.
Of course users will not be able to enter names of actual cities which are also profanities. By definition, you are excluding such cities.

Related

Suggestions on Database Design

I am building a sample online examination platform (I'm in the process of learning Ruby on Rails) with the following specifications:
There are 1000 different multiple choice questions.
Each question can have up to 5 possible answers, 1 of them is correct.
A user is presented with 10 random questions at a time (let's call this a test). If a user answers a question correctly 2 times, then this question will not be shown to him again.
A user passes the exam if he has answered every question correctly 2 times, in other words when there are no more questions left to show to him.
A first try :
student
-student_id
-name
question
-question_id
-text
option
-option_id
-text
-is_correct
-question_id
student_answer
-student_id
-question_id
-option_id
Although we could only store the correct questions, I've decided to include the 'option_id' in the student_answer table in case I need to display statistics (hardest question etc) in the future.
So up to this point, a question has many options, every option belongs to a single question and every time a student answers a question a student_answer row is created.
It seems to me that this approach would have some performance issues since for each test we'd have to select all the answers given by the user, group them by the question_id, for each question calculate the correct times it has been answered, get a set of question_id that shouldn't be displayed and finally select 10 random questions out of the 1000 initial ones minus those we just excluded.
Another thought I had was to have a JSON array in the form of {[0,0,1,...,1]} for every user. Each cell would be the number of correct answers for the question with an id matching the array index but I find that a bad idea.
Since I'm relatively a beginner when it comes to database design I'd like some feedback on my approach. Please feel free to suggest ways that are completely different than the above.
Thank you very much.
I think you may need to include the question_id in the option table.
One approach would be to move some of the processing into Ruby.
Select the 1000 questions.
Select the answers given by the user: SELECT count(*) counter, question_id, option_id FROM student_answer JOIN option USING (question_id,option_id) WHERE student_id=x AND option.is_correct=1 GROUP BY question_id,option_id HAVING counter>1
Randomize the 1000 questions in ruby, iterate though them, and exclude any that were found in your query of correct answers for this student. Stop after 10 questions.
If only one answer can be correct, then why store the correctness in the option table, the question record should contain the foreign key of the correct answer.
You describe some entities not addresed by your design. You maybe don't need to store 'a test' but this, and a primary key on student_answer makes for a model which makes it a bit easier to answer different questions about the data.
I think you have a good approach going, I would probably do the same. - symcbean does make a good point above, but my solution for this would be to store a boolean column within the student_answer table for whether the answer is correct.

how to create a small search engine [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this question
I'm aim to create a small in-app search engine(something like a google map address search bar).The requirement is quite simple.The item is consisted of many key-words,the user types in a key-word,it gives out corresponding result,the user types in another key-word after that,it continues to filter the result.
The first thing come to my mind is to use mysql,create a key-words table to store every key-wrods and like it to the item table,and when user type in a key-word,it searched through every record in key-words table to give results.Am I thinking in the right way?Could u guys give me some helps?I'm a totally novice in mysql(only learn it in high school lesson).Is there any open-source platform for this?
Note: If your don't need to store keyword frequency, then go with Marmik Bhatt's LIKE suggestion.
If you have large amount of data and you want to do a keyword search only (i.e. you are not going to be searching for phrases or use concepts like "near"), then you can simply create a keyword table:
CREATE TABLE address
(
id INT(10) PRIMARY KEY,
/* ... */
);
CREATE TABLE keyword
(
word VARCHAR(255),
address_id INT(10),
frequency INT(10),
PRIMARY KEY(word, article_id)
);
You then scan through the text that you are "indexing" and count each word that you find there.
If you want to do several keywords:
SELECT address.*, SUM(frequency) frequency_sum
FROM address
INNER JOIN keyword ON keyword.address_id = address.id
WHERE keyword.word IN ('keyword1', 'keyword2', /*...*/)
GROUP BY address.id;
Here i've done a frequency sum, which can be a dirty way to compare the usefulness of the result, when many are given.
Things to think about:
Do you want to insert all keywords into the database, or only those, that have a frequency higher than a specific value? If you insert all your table may become huge, if you insert only higher frequency ones, then you will not find the only article that mentions a specific word, but does so only once.
Do you want to insert all the available keywords for the specific article or only "top ones"? In this case the danger is that frequent words that add nothing to the meaning will begin pushing others out. Consider the word "However", it may be in your article many more times than "mysql", buy it is the latter that defines the article, not the former.
Do you want to exclude words shorter then a specific length of characters?
Do you want to exclude known "meaningless" words?
For search engine, I use 'LIKE' to search parameters...
The query would look like...
SELECT * FROM tbl_keywords
INNER JOIN tbl_addresses ON tbl_addresses.id = tbl_keyword.address_id
WHERE tbl_keywords.keywords LIKE "% $keyword %";
$keyword is a variable retried from GET or POST request from the search bar.
You can also use JSON output of your search result so, using jquery you can provide fast search result output.
Full Text Search
You can also use full text search for searching for places and related keywords
see this link...SQL Full Search Tutorial
One thing you can implement is that you can break down the user keyword based on spaces and it will fetch you out most relevant results.
For example, user types Create search engine
then explode it based on space.
Then query DB for each word.
A REGEXP might be more efficient, but you'd have to benchmark it to be sure, e.g.
SELECT * from fiberbox where field REGEXP 'Create|search|engine';
Use jQuery Autocomplete to make an auto-suggest search like Google does

Separate table for banned users? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What would be the "correct" way of organizing banned users?
Should I simply add a new column in the existing users table called is_banned that acts as a boolean or should I create a new table called banned_users that acts as a pivot table with the user_id?
The same question goes for administrators. Should I create a new table for site admins or just create a new column called is_admin?
What about performance of the two options?
Thanks.
What happens with the next type of users - add another table? Better not.
You could add a new column called type or something like that. One way would be it containing a number indicating the type like
1 = normal user
2 = admin
3 = banned
or you could even add another table called user_types that refer to it, but that would only be necessary if you have the types changing over time.
If you need to combine types - users having multiple types at once, then you could make the column a bit field.
When do you need seperate tables?
When these different users would have different attributes and the tables for each type of users would differ.
You need to think about how the banning concept will play out in the real world. Do you just want a flag? What about when they were banned and by whom? past banning history? a response mechanism for the banned? A list of complaints, with user/date/reason?
Data models are the most difficult part of a system to evolve, so you want to think about all manner of possible futures, even stuff you don't have on the roadmap just yet.
You might decide, for efficiency, that you want a ban table and a banned column. But there's a price to be paid for that too, since you're now capturing the same fact in multiple places.
The issues are subtle and sometimes complex. Don't accept blanket one-size-fits-all answers.
The scalable solution that satisfies multitude of criteria would be this:
table that contains user data, users
table that contains roles - roles
junction table that connects the two, user2roles
You keep the user data separate from their actual role in your app - every user will at least have name and last name, those are not related to their permissions or roles.
You will most likely need to add more roles. For example, one role is being an admin. Another role is being banned. Another role can be being banned for a week, 2 weeks etc - basically you can add those as you go, without needing to alter your tables to support future functionality.
Your application (php, python, whatever) collects the data and then acts upon those roles.
Now you have a system that's got established relations, that you can scale and that's easy to understand by kids in kindergarden.
This is a simplified system that mixes permissions with roles, you can further expand it but IMO it's better to keep it simple.

Is it ever "ok" to use multiple queries instead of one? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
In building a web app recently, I started thinking about the information returned from a query I was making:
Find the user information and (for simplicity sake) the associated phone numbers tied to this user. Something as simple as:
SELECT a.fname, a.lname, b.phone
FROM users a
JOIN users_phones b
ON (a.userid = b.userid)
WHERE a.userid = 12345;
No problem here (yes I'm preventing injection, etc, not the point of this question). When I think about the data that is returned though, I am returning (potentially) several rows of information with that users name on each one. Let's say that single user has 1000 phone numbers associated with it. That's a first name and last name being returned each call a lot. Let's also assume I want to return a lot more than just the first name and last name of that user and in fact I'm starting to return quite a bit of extra rows which I really only needed once.
Are there circumstances in which it is "more appropriate" to make multiple calls to a database?
e.g.
SELECT firstname, lastname
FROM users
WHERE userid = 12345;
SELECT phone
FROM users_phones
WHERE userid = 12345;
If the answer is yes, is there a good/proper method of determining when to use multiple queries versus a single one?
I think that really depends on your use case. In the example you gave, it seems to make sense to return it as two queries, especially if you're passing that info back to a mobile device where you want to make sure you send them as little data as possible (not everyone has unlimited data.....)
I'd probably stick a DISTINCT in those queries as well if that's going to make a difference based on your tables.
A query with a JOIN may be slower than two independent queries. It really depends on the type of access you're doing.
For your example, I'd go with the two query approach. These queries could be executed in parallel, they could be cached, and there's no real reason to JOIN other than for arbitrary presentation concerns.
You'll also want to be concerned about returning duplicate data. In your example it looks like fname and lname would be repeated for each and every phone number, resulting in a lot of data being transmitted that's actually not useful. This is because of the one-to-many relationship you've described.
Generally you'll want to JOIN if it means sending less data, or because the two queries are not independent.
This should be driven by the application. Basically, you retrieve in one query all the information needed in one place. If you take this question page as an example, you see your user ID, the reputation counter, and the badge counters. There's no need to retrieve other user profile information when I first display the question page.
Only when one clicks on the user ID the rest of the profile may be queried, and may be not even all of it, as there are several tabs on the profile page.
However, if your application is guaranteed to access all 1000 phone numbers at once, along with the user's name, then you probably should fetch them all together.

Get specific line from db's text type field [duplicate]

This question already has answers here:
Split string by new line characters
(19 answers)
Closed 9 years ago.
Let's say I have made a table in my database, which table contains a text type field. Within this field there are ten lines which correspond to ten different values. What I want to do for example, is to call this field (I have already done this) but I want to get only the fifth line! Or each line separately anyway... Does anyone have any idea how to do this?
As previously stated, this does not smell good. It sounds a lot like your data is not normalised and you'd make life a lot easier if you did separate the delimited values...
...HOWEVER I know all too well that sometimes it's just not possible. I frequently have to deal with this kind of thing (thanks to decisions made using legacy technologies being perpetuated beyond my control).
If you too are stuck with someone else's design decision, this should do the trick:
http://blog.fedecarg.com/2009/02/22/mysql-split-string-function/
EDIT: Downvote without a comment? Nice...
That's easy.
You need to create another table linked to this one.
And have ten different records in this new table corresponding to one record in your old table.
This is how databases work.