Prevent duplicate rows with different queries - mysql

Let's say I have a products grid. In this grid there's a product called "Scarf XY".
Now a user wants to search for all items with similar name, so she types in a live-search box the word "Scarf X", and it will be performed an async request to retrieve from DB all rows that match that word.
I would like to prevent the new query to return again the row for "Scarf XY".
Is there a way to, let's say, "keep track" of already returned rows even from different queries?
(Sorry for my english)
Forgot to mention: every item returned from the DB is preserved in a local Array, that's why every new query may cause duplicate entries.

There is a way to do this with MySQL subqueries, but if this is meant for a site, this will be inefficient. For example, a user may type in search terms and then delete them. Such a system you described would result in eight SQL queries for a search of "Scarf XY", which will put an unnecessary load on your database server.
A more modern and resource efficient way of doing this would be to supply the browser a JSON array and use something like Typeahead.js from Twitter to display the information in a search bar client-side.

Related

Cache, Database, Over 400k Listing

In my MySQL database I have a table of products which contains almost 625k rows. The table has 162 columns.
Now there is a search box on my home page where you can search for anything and, if your search term is matched from any of my product titles, it give you a list of 15 products. This is similar to Amazon and other e-commerce websites.
What I did so far was to create a JSON file with all the product ID's and title names. When user inputs a minimum of 3 chars into the search field, an AJAX request is made and gets the list. But my issue is that the JSON file is almost 12MB in size, and the ajax calls it whenever user write's a char or removes a char. It was working fine until I was on local Machine and now as soon as I made it live it doesn't work for users, having lower then 5 MBPS internet connection. So I am looking for some advice, how do I create it fast as Amazon. I mean the search with auto suggestion from 625K products.
I am really sorry, but there is nothing more to give as an advice here then "go do some reading on database design and schema normalization".
If you have 162 columns in a table you will never be able to do an efficient search. The database (especially MySQL) will not hold the table in memory and indexes will not help either. Yes, you can throw it all into an ElasticSearch instance and it will fix some of your problems. But, honestly, this solution does not clean up the mess you have.
You should have a table with relevant information (titles, names, etc.) in one column (or also a numeric column for prices, etc). This metadata should reference the main table, the column should be fulltext-indexed. This way you ask for matches, filter results and JOIN relevant lines from the main table. This will work quickly with very little resources used.

sql query to check many interests are matched

So I am building a swingers site. The users can search other users by their interests. This is only part of a number of parameters used to search a user. The thing is there are like 100 different interests. When searching another user they can select all the interests the user must share. While I can think of ways to do this, I know it is important the search be as efficient as possible.
The backend uses jdbc to connect to a mysql database. Java is the backend programming language.
I have debated using multiple columns for interests but generating the thing is the sql query need not check them all if those columns are not addressed in the json object send to the server telling it the search criteria. Also I worry i may have to make painful modifications to the table at a later point if i add new columns.
Another thing I thought about was having some kind of long byte array, or number (used like a byte array) stored in a single column. I could & this with another number corresponding to the interests the user is searching for but I read somewhere this is actually quite inefficient despite it making good sense to my mind :/
And all of this has to be part of one big sql query with multiple tables joined into it.
One of the issues with me using multiple columns would be the compiting power used to run statement.setBoolean on what could be 40 columns.
I thought about generating an xml string in the client then processing that in the sql query.
Any suggestions?
I think the correct term is a Bitmask. I could maybe have one table for the bitmask that maps the users id to the bitmask for querying users interests, and another with multiple entries for each interest per user id for looking up which user has which interests efficiently if I later require this?
Basically, it would be great to have a separate table with all the interests, 2 columns: id and interest.
Then, have a table that links the user to the interests: user_interests which would have the following columns: id,user_id,interest_id. Here some knowledge about many-to-many relations would help a lot.
Hope it helps!

How can I allow users sql access to a table limited to certain rows?

I'm building an stock exchange simulation game. I have a table called 'Market_data' and in the game players simulate being in particular dates and are allowed to use SQL queries to retrieve the historical data and plan their course of action. My difficulty is that I need to limit the rows they can access based on the current date they are playing on so they cant see rows with a date greater than the current date.
Eg: An user is running the game and is currently in the year 2010, if he does a simple select like "SELECT * FROM market_data" I don't want him to see rows with Date > 'x-x-2010'
The only soution that I know of is to parse the user's SQL and add WHERE clauses to remove newer dates but it seems time consuming and prone to errors and I wasn't sure whether there were better alternatives. Any ideas on how to do this right will be thanked.
Solution is SQL Views, Views are used for several different reasons:
*1.*To hide data complexity. Instead of forcing your users to learn the T-SQL JOIN syntax you might wish to provide a view that runs a commonly requested SQL statement.
*2.*To protect the data. If you have a table containing sensitive data in certain columns, you might wish to hide those columns from certain groups of users. For instance, customer names, addresses and their social security numbers might all be stored in the same table; however, for lower level employees like shipping clerks, you can create a view that only displays customer name and address. You can grant permissions to a view without allowing users to query the underlying tables. There are a couple of ways you might want to secure your data:
a.Create a view to allow reading of only certain columns from a table. A common example of this would be the salary column in the employee table. You might not want all personnel to be able to read manager's or each other's salary. This is referred to as partitioning a table vertically and is accomplished by specifying only the appropriate columns in the CREATE VIEW statement.
b.Create a view to allow reading only certain rows from a table. For instance, you might have a view for department managers. This way, each manager can provide raises only to the employees of his or her department. This is referred to as horizontal partitioning and is accomplished by providing a WHERE clause in the SELECT statement that creates a view.
*3.*Enforcing some simple business rules. For example, if you wish to generate a list of customers that need to receive the fall catalog, you can create a view of customers that have previously bought your shirts during the fall.
*4.*Data exports with BCP. If you are using BCP to export your SQL Server data into text files, you can format the data through views since BCP's formatting ability is quite limited.
*5.*Customizing data. If you wish to display some computed values or column names formatted differently than the base table columns, you can do so by creating views.
reference taken from http://sqlserverpedia.com.
1)You can use mysql proxy http://dev.mysql.com/downloads/mysql-proxy/ with custom rules restricting access.
2)You can use stored procedures/functions
3)You can use views
The basic way would be :
-> Prevent that user (or group) from accessing the base table.
-> Define a view on top of that table that shows only the rows these users are supposed to see.
-> Give those users SELECT permission on the view.
-> And you can also use SQL Encryption,Decryption and Hashing concept.
Encryption & Decryption examples can be found here:
http://msdn.microsoft.com/en-us/library/ms179331.aspx
Hashing example can be found here:
http://msdn.microsoft.com/en-us/library/ms174415.aspx

How slow is the LIKE query on MySQL? (Custom fields related)

Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.

Save a list of user ids to a mysql table

I need to save a list of user ids who viewed a page, streamed a song and / or downloaded it. What I do with the list is add to it and show it. I don't really need to save more info than that, and I came up with two solutions. Which one is better, or is there an even better solution I missed:
The KISS solution - 1 table with the primary key the song id and a text field for each of the three interactions above (view, download, stream) in which there will be a comma separated list of user ids. Adding to it will be just a concatenation operation.
The "best practice" solution - Have 3 tables with the primary key the song id and a field of user id that did the interaction. Each row has one user id and I could add stuff like date and other stuff.
One thing that makes me lean towards options 2 is that it may be easier to check whether the user has already voted on a song?
tl;dr version - Is it better to use a text field to save arrays as comma separated values, or have each item in the array in a separate table row.
Definitely the 2nd:
You'll be able to scale your application as it grows
It will be less programming language dependent
You'll be able to make queries faster and cleaner
It will be less painful for any other programmer coding / debugging your application later
Additionally, I'd add a new table called "operations" with their ID, so you can add different operations if you need later, storing the operation ID instead of a string on each row ("view", "download", "stream").
It's definitely better to have each item in a separate row. Manipulating text fields has performance disadvantages by itself. But if ever you want to find out which songs user 1234 has viewed/listened to/etc., you'd have to do something like
SELECT * FROM songactions WHERE userlist LIKE '%,1234,%' OR userlist LIKE '1234,%' OR userlist LIKE '%,1234' OR userlist='1234';
It'd be just horribly, horribly painful.