Unfortunately, I have to deal with a lot of user submitted data, text fields rather than option boxes. I have imported it into my MySQL database as strings. I do all this to be able to run statistics quickly on the data like top 10 most common companies. The problem I have run into is that some of the rows have slightly different names for the same companies. For example:
Brasfield & Gorrie, LLC VS Brasfield and Gorrie
Britt Peters and Associates VS Britt, Peters & Associates Inc.
Is there some fairly straightforward MySQL command or external tool that will allow me to go through and combine these sort of rows. I know how to use REPLACE(), but I don't think it has the power to do this simply. Correct me if I'm wrong!
Taking this example:
Brasfield & Gorrie, LLC VS Brasfield and Gorrie
Assuming that I want to keep the first one, I would find all records that have the ID of the second one and update them to use the first, assuming that this table that has these titles also has an ID field for each one.
You would create a page in PHP that will allow you to administer this with mouse clicks, but it will require regular pruning since you allow users to enter this data. For future entries, you can try to apply the Levenshtein Distance and try to provide a suggestion based on available similar matches so that you can help guide the users to something that already exists rather than a new db entry.
Related
So I am building a swingers site. The users can search other users by their interests. This is only part of a number of parameters used to search a user. The thing is there are like 100 different interests. When searching another user they can select all the interests the user must share. While I can think of ways to do this, I know it is important the search be as efficient as possible.
The backend uses jdbc to connect to a mysql database. Java is the backend programming language.
I have debated using multiple columns for interests but generating the thing is the sql query need not check them all if those columns are not addressed in the json object send to the server telling it the search criteria. Also I worry i may have to make painful modifications to the table at a later point if i add new columns.
Another thing I thought about was having some kind of long byte array, or number (used like a byte array) stored in a single column. I could & this with another number corresponding to the interests the user is searching for but I read somewhere this is actually quite inefficient despite it making good sense to my mind :/
And all of this has to be part of one big sql query with multiple tables joined into it.
One of the issues with me using multiple columns would be the compiting power used to run statement.setBoolean on what could be 40 columns.
I thought about generating an xml string in the client then processing that in the sql query.
Any suggestions?
I think the correct term is a Bitmask. I could maybe have one table for the bitmask that maps the users id to the bitmask for querying users interests, and another with multiple entries for each interest per user id for looking up which user has which interests efficiently if I later require this?
Basically, it would be great to have a separate table with all the interests, 2 columns: id and interest.
Then, have a table that links the user to the interests: user_interests which would have the following columns: id,user_id,interest_id. Here some knowledge about many-to-many relations would help a lot.
Hope it helps!
I'm building an stock exchange simulation game. I have a table called 'Market_data' and in the game players simulate being in particular dates and are allowed to use SQL queries to retrieve the historical data and plan their course of action. My difficulty is that I need to limit the rows they can access based on the current date they are playing on so they cant see rows with a date greater than the current date.
Eg: An user is running the game and is currently in the year 2010, if he does a simple select like "SELECT * FROM market_data" I don't want him to see rows with Date > 'x-x-2010'
The only soution that I know of is to parse the user's SQL and add WHERE clauses to remove newer dates but it seems time consuming and prone to errors and I wasn't sure whether there were better alternatives. Any ideas on how to do this right will be thanked.
Solution is SQL Views, Views are used for several different reasons:
*1.*To hide data complexity. Instead of forcing your users to learn the T-SQL JOIN syntax you might wish to provide a view that runs a commonly requested SQL statement.
*2.*To protect the data. If you have a table containing sensitive data in certain columns, you might wish to hide those columns from certain groups of users. For instance, customer names, addresses and their social security numbers might all be stored in the same table; however, for lower level employees like shipping clerks, you can create a view that only displays customer name and address. You can grant permissions to a view without allowing users to query the underlying tables. There are a couple of ways you might want to secure your data:
a.Create a view to allow reading of only certain columns from a table. A common example of this would be the salary column in the employee table. You might not want all personnel to be able to read manager's or each other's salary. This is referred to as partitioning a table vertically and is accomplished by specifying only the appropriate columns in the CREATE VIEW statement.
b.Create a view to allow reading only certain rows from a table. For instance, you might have a view for department managers. This way, each manager can provide raises only to the employees of his or her department. This is referred to as horizontal partitioning and is accomplished by providing a WHERE clause in the SELECT statement that creates a view.
*3.*Enforcing some simple business rules. For example, if you wish to generate a list of customers that need to receive the fall catalog, you can create a view of customers that have previously bought your shirts during the fall.
*4.*Data exports with BCP. If you are using BCP to export your SQL Server data into text files, you can format the data through views since BCP's formatting ability is quite limited.
*5.*Customizing data. If you wish to display some computed values or column names formatted differently than the base table columns, you can do so by creating views.
reference taken from http://sqlserverpedia.com.
1)You can use mysql proxy http://dev.mysql.com/downloads/mysql-proxy/ with custom rules restricting access.
2)You can use stored procedures/functions
3)You can use views
The basic way would be :
-> Prevent that user (or group) from accessing the base table.
-> Define a view on top of that table that shows only the rows these users are supposed to see.
-> Give those users SELECT permission on the view.
-> And you can also use SQL Encryption,Decryption and Hashing concept.
Encryption & Decryption examples can be found here:
http://msdn.microsoft.com/en-us/library/ms179331.aspx
Hashing example can be found here:
http://msdn.microsoft.com/en-us/library/ms174415.aspx
I need to maintain an application of mass email sender. The last programmer did a nice job, but the boss feel that a little optimizations could be done on the database treatment. When a campaing is finished, the report gives the option to save the selected segment. For instance, we send 50000 emails, and we want to save the segment of people who open the newsletter (2000) The tool now creates a new segment duplicating the contacts (with an INSERT), but I think we could improve the tool by saving the id of each contact.
I would like to know if saving the contacts whith an sql IN statement would increase the performance of the tool, or there is another way to perform this. Something like:
Create a list of the ids of the contacts SELECT * FROM contacts
SELECT * FROM contacts WHERE idContact IN (all_contacts_comma_separated) --> I would save
this
Thanks in advance
PD: It's a production environment, so I need to be sure before made any changes :-(
You didn't say where the list of people who opened the email currently resides. If it's not in the database what code/process will you use to generate your IN statement list? If it is in the database why not JOIN your tables to get the information?
Either way I'd not recommend using IN when you have 2000 items in the list.
It might also be worth you reading the following:
SQL Server: JOIN vs IN vs EXISTS - the logical difference
It's written with SQL server in mind (and I'm not sure it all directly applies to MySQL) but the concepts are interesting and you should perform testing before changing your production environment, as eggyal's comment suggested.
Within my database i have 3 different tables for different members. When saving the members details i use a form to save the members all to the same table but i would like to save them to a specific table depending on their details. for example if a member has registered with their school email i would like them to be saved within the student table, if they have used a freemail email address to be saved in the freemail table etc
Would this be run as a query or sorting the one table using if statements?
You probably should not have three tables, just a field that defines the member type. You may wish to read Fundamentals of Relational Database Design.
If you really insist on having three tables, even though it is likely to cause ever more tangled scenarios, you will either have to use VBA to gather the data from an unbound form and then fill it into the appropriate table, or ask the user which table they wish to update before you start and set up the form for that table.
It depends on your development environment. You can either change the switch to an If clause at business level or you can implement it as a database procedure. It's up to you.
http://msdn.microsoft.com/en-us/library/aa933214(v=sql.80).aspx explains how to use If clause in database
I want to demonstrate a product to a potential new customer.
The best source data comes from an existing customer.
I want to use the existing customer's data for the demonstration, but without compromising confidentiality in any way.
The best solution I see is to run a script that replaces all of the names, addresses and locations in the database with randomly selected names.
So, now I need to find a list of place names and person names to use as a source. Preferably this would be in a text file so it can be read easily.
This seems like a pretty common problem. Does anyone know of a site that I can download these names from?
Check out: http://infochimps.org/, for example: http://infochimps.org/datasets/d-1990-census-name-files and http://infochimps.org/datasets/word-list-10-000-common-place-names
I know this question is a bit old, but here's my 2 cents.
I wanted to also suggest that you could pull down data from public universities. For example,
http://www.wisc.edu/directories/?q=john+smith. You can find other open directories for all state schools.
All it would take is writing a script that loops through a list of names, EG http://www.behindthename.com/top/lists/us/2010/1000, and for each name searching a number of publically accessible directories and saving the first 5 results.