How to convert "top 10" list from SQL to NoSQL - mysql

I have an online game currently using MySQL. I have a Player table looking like this:
create table player (
id integer primary key,
name varchar(50),
score integer
);
I have an index on "score" column and display the rankings like this:
select id, name, score from player order by score desc limit 100
I'd like to migrate my system to Redis (or, if some other NoSQL is more applicable to this kind of problem, please tell). So I wonder what is the way to display this kind of rankings table efficiently?
AFAICT, this could be a Map/Reduce job? I know next to nothing about Map/Reduce although I read some docs I still don't quite understand as I haven't been able to find any real-life examples.
Can someone please give me a rought example how to do the above query in Redis?

In redis you can use Sorted sets ( http://redis.io/commands#sorted_set )
When you have scored items in sorted set you can get top N by invoke ZRANGE players 0 N

Good question - In MongoDB you would have to use the group() function to return this type of query:
select id, name, score from player order by score desc limit 100
Might look something like this:
db.player.group(
{key: { id:true, name:true },
reduce: function(obj,prev) { if(prev.cmax<obj.score) prev.cmax = obj.score; },
initial: { cmax: 0 } // some initial value
});
Using a MapReduce based approach is probably best, see:
http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group
http://cookbook.mongodb.org/patterns/finding_max_and_min_values_for_a_key/

Related

Checking multiple columns for a concatenated match MySQL

Hi i've hit a problem with my SQL queries, i have a table that contains 3 columns, one for vehicle brands, one for models and one for model versions.
So my data is split like
BRAND || MODEL || MODEL VERSION
RENAULT || R4 || R4 1.1 GTL
I've been asked to replace our current dropdown system with an input to make it easier for users to select their vehicle.
I'm using jQuery Autocomplete and my query looks something like this.
SELECT DISTINCT CONCAT (brand, ' ', model, ' ', version) as data from vehicles WHERE brand LIKE '%Golf%' OR model LIKE '%Golf%' OR version LIKE '%Golf%' LIMIT 5
So far so good, this will output "RENAULT R4 R4 1.1 GTL" if i type in "RENAULT"... the problem here comes when the user inserts something like Renault R4 instead of just "Renault"
As they've included the Model name as well as the Brand then it doesn't really match any of my columns in the Database and my Ajax call returns no results.
I need to query the actual result set from that concat instead so that anything the users type in will match the results, but i have no idea how i can do this.
In desperation i tried to type where data LIKE '%RENAULT R4%' but as expected this also doesn't work... What can i do in this situation? Any help would be appreciated.
Easy and slow way: Split the string by spaces and ask for each word.
SELECT ...
WHERE
(brand LIKE '%Renault%' OR model LIKE '%Renault%' OR version LIKE '%Renault%')
AND (brand LIKE '%R4%' OR model LIKE '%R4%' OR version LIKE '%R4%')
LIMIT 5
Keep in mind, that query like this one does not allow use of any index, so it is very slow.
The more complicated, but much faster implementation is to use fulltext index. You need recent version of MySQL (5.6 or newer); older versions support fulltext only on MyISAM tables which are not really a database.
CREATE FULLTEXT INDEX idx ON vehicles(brand, model, version);
SELECT ... FROM vehicles
WHERE MATCH(brand, model, version) AGAINST('Renault R4')
LIMIT 5;
(Query not tested, but you should get the idea.)
I can only think of this one, but I believe there are better ways to do it.
OR CONCAT (brand, ' ', model, ' ', version) LIKE '%RENAULT R4%'

Write query to search inside json record, inside another json in postgreSQL

I have a table which records games of every day. The TEAMS table contains a column that contains json rows, each json string also contains another, something like below:
---------------------------------------------------------------------
| id | doc |
---------------------------------------------------------------------
| 1 | {'team1':{'num':3, 'players':{'bob', 'eli', 'jack'}, 'color':'red'},
'team2':{'num':3, 'players':{'a', 'eli', 'x'}, 'color':'blue'}}
This says that team1 and team2 had a game on a day.
Can I write a query which retrieves all of records that has 'eli' as player?
Thanks for your help
Caveat - I've only just begun working with JSON on Postgresql, so the following works, but might be sub-optimal...
Is that your actual JSON? Because Postgresql 9.3 coughed when I tried to import it; two problems, single quote and not double quotes, and your players were surrounded in curly braces and not square braces.
Anyway. If I got this right, I used this JSON:
create table json_table(data json);
insert into json_table(data)
values('{"team1":{"num":3, "players":["bob", "eli", "jack"], "color":"red"},
"team2":{"num":3, "players":["a", "eli", "x"], "color":"blue"}}')
The following query will work. Apparently, 9.4 has some additional functions that might make your life easier.
SELECT DISTINCT teamname FROM
(SELECT
json_data.key AS teamname,
json_array_elements(json_data.value->'players')::text as players
FROM
json_table,json_each(data) AS json_data)
a
WHERE players = '"eli"'
Stale question, sure, but it's got a much better answer nowadays.
There's a simple version, where you don't care about the top-level key, only the top-level value:
select json_path_query(
doc,
'$.* ? (#.players[*] == $player)',
'{"player": "eli"}'
) from TEAMS
If you want the top-level key, I don't have a good suggestion that doesn't involve conversion and/or expansion.
select json_path_query(
(team).value,
'$ ? (#.players[*] == $player)',
'{"player": "eli"}'
) from (
select json_each(doc) as team from TEAMS
) schedule
As this is an XY problem, most definitely, you might be interested in knowing whether eli's brother, "ERROR:ROOT", played that day too. You can always pass an array of values in for that third argument to json_path_query(), like so:
select json_path_query(
doc,
'$.* ? (#.players[*] == $player[*])',
'{"player": ["eli","ERROR:ROOT"]}'
) from TEAMS
Oh! That raises a really good possibility. Say you have a team, and you want to know what other days any of those players are up. You might do something like so:
select json_path_query(
doc,
'$.* ? (#.players[*] = $players[*])',
'{"num":3, "players": ["a", "eli", "x"], "color":"blue"'
) from TEAMS
If you want to know more, the PostgreSQL docs are expansive but fantastically detailed. There are tons of great examples in there, both at the page on JSON datatype and JSONpath, but also the page on JSON functions.
The JSON datatype: https://www.postgresql.org/docs/13/datatype-json.html
JSON functions: https://www.postgresql.org/docs/13/functions-json.html

ORDERBY "human" alphabetical order using SQL string manipulation

I have a table of posts with titles that are in "human" alphabetical order but not in computer alphabetical order. These are in two flavors, numerical and alphabetical:
Numerical: Figure 1.9, Figure 1.10, Figure 1.11...
Alphabetical: Figure 1A ... Figure 1Z ... Figure 1AA
If I orderby title, the result is that 1.10-1.19 come between 1.1 and 1.2, and 1AA-1AZ come between 1A and 1B. But this is not what I want; I want "human" alphabetical order, in which 1.10 comes after 1.9 and 1AA comes after 1Z.
I am wondering if there's still a way in SQL to get the order that I want using string manipulation (or something else I haven't thought of).
I am not an expert in SQL, so I don't know if this is possible, but if there were a way to do conditional replacement, then it seems I could impose the order I want by doing this:
delete the period (which can be done with replace, right?)
if the remaining figure number is more than three characters, add a 0 (zero) after the first character.
This would seem to give me the outcome I want: 1.9 would become 109, which comes before 110; 1Z would become 10Z, which comes before 1AA. But can it be done in SQL? If so, what would the syntax be?
Note that I don't want to modify the data itself—just to output the results of the query in the order described.
This is in the context of a Wordpress installation, but I think the question is more suitably an SQL question because various things (such as pagination) depend on the ordering happening at the MySQL query stage, rather than in PHP.
My first thought is to add an additional column that is updated by a trigger or other outside mechanism.
1) Use that column to do the order by
2) Whatever mechanism updates the column will have the logic to create an acceptable order by surrogate (e.g. it would turn 1.1 into AAA or something like that).
Regardless...this is going to be a pain. I do not evny you.
You can create function which have logic to have human sort order like
Alter FUNCTION [dbo].[GetHumanSortOrder] (#ColumnName VARCHAR(50))
RETURNS VARCHAR(20)
AS
BEGIN
DECLARE #HumanSortOrder VARCHAR(20)
SELECT #HumanSortOrder =
CASE
WHEN (LEN(replace(replace(<Column_Name>,'.',''),'Figure ',''))) = 2
THEN
CONCAT (SUBSTRING(replace(replace(<Column_Name>,'.',''),'Figure ',''),1,1),'0',SUBSTRING(replace(replace(<Column_Name>,'.',''),'Figure ',''),2,2))
ELSE
replace(replace(<Column_Name>,'.',''),'Figure ','')
END
FROM <Table_Name> AS a (NOLOCK)
WHERE <Column_Name> = #ColumnName
RETURN #HumanSortOrder
END
this function give you like 104,107,119,10A, 10B etc as desired
And you can use this function as order by
SELECT * FROM <Table_Name> ORDER BY GetHumanSortOrder(<Column_Name>)
Hope this helps

Query on custom metadata field?

This is a request from my client to tweak an existing Perl script. However, it is the actual database structure on their end that confuses me.
The requirement looks pretty simple:
only pull records where _X begins with 1, 2, or 9.
However, the underlying database is not that simple, here is the guideline from their DBA:
"_X is a custom metadata field. The database stores this data in rows, not columns, within the customData table. In order to query the custom data table in an efficient manner you need to know the Field_ID for the custom field you get that from the fielddef table:
SELECT Field_ID FROM FieldDef WHERE Name = "_X";
This returns:
10012
"Now you can query CustomData. For example:
SELECT Record_ID FROM CustomData where Field_ID="10012" AND StringValue="2012-04";
He also suggests that in my case, probably it would be:
"SELECT Record_ID FROM CustomData where Field_ID="10012" AND (StringValue LIKE '1%' || StringValue LIKE '2%' || StringValue LIKE '9%')
The weird thing is that the existing Perl script doesn't contain anything like "Select Record_ID FROM" but all like "SELECT StringValue FROM".
So that is why I am very confused here: What is "store in rows, not in columns"? Why first query the Field_ID table then CustomData? I would not be able to communicate with any of them during this weekend but really wish to get some idea on the whole thing, hope experts can help me a little on sorting out the whole structure.
More info(Table schema):
http://pastebin.com/ZiDTCCC0
The existing perl script:(focus on lines 72-136)
http://pastebin.com/JHpikTeZ
Thanks in advance.
What they seem to be using is some kind of Entity-Attribute-Value model, with the entities stored as ints and explained in another table (FieldDef).
You explained pretty well how you queried it (although you can do it in one query, with a join or a subquery), and your problem seems to be that you don't know how the Perl script does it. Unfortunately, without us seeing the Perl script, we can't either :]

Can we control LINQ expression order with Skip(), Take() and OrderBy()

I'm using LINQ to Entities to display paged results. But I'm having issues with the combination of Skip(), Take() and OrderBy() calls.
Everything works fine, except that OrderBy() is assigned too late. It's executed after result set has been cut down by Skip() and Take().
So each page of results has items in order. But ordering is done on a page handful of data instead of ordering of the whole set and then limiting those records with Skip() and Take().
How do I set precedence with these statements?
My example (simplified)
var query = ctx.EntitySet.Where(/* filter */).OrderByDescending(e => e.ChangedDate);
int total = query.Count();
var result = query.Skip(n).Take(x).ToList();
One possible (but a bad) solution
One possible solution would be to apply clustered index to order by column, but this column changes frequently, which would slow database performance on inserts and updates. And I really don't want to do that.
EDIT
I ran ToTraceString() on my query where we can actually see when order by is applied to the result set. Unfortunately at the end. :(
SELECT
-- columns
FROM (SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent1
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent2
WHERE (Extent1.ID = Extent2.ID) AND (Extent2.userId = :p__linq__4)
)
) AS Project2
limit 0,10 ) AS Limit1
LEFT OUTER JOIN (SELECT
-- columns
FROM table2 AS Extent3 ) AS Project3 ON Limit1.ID = Project3.ID
UNION ALL
SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent4
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent5
WHERE (Extent4.ID = Extent5.ID) AND (Extent5.userId = :p__linq__4)
)
) AS Project6
limit 0,10 ) AS Limit2
INNER JOIN table3 AS Extent6 ON Limit2.ID = Extent6.ID) AS UnionAll1
ORDER BY UnionAll1.ChangedDate DESC, UnionAll1.ID ASC, UnionAll1.C1 ASC
My workaround solution
I've managed to workaround this problem. Don't get me wrong here. I haven't solved precedence issue as of yet, but I've mitigated it.
What I did?
This is the code I've used until I get an answer from Devart. If they won't be able to overcome this issue I'll have to use this code in the end.
// get ordered list of IDs
List<int> ids = ctx.MyEntitySet
.Include(/* Related entity set that is needed in where clause */)
.Where(/* filter */)
.OrderByDescending(e => e.ChangedDate)
.Select(e => e.Id)
.ToList();
// get total count
int total = ids.Count;
if (total > 0)
{
// get a single page of results
List<MyEntity> result = ctx.MyEntitySet
.Include(/* related entity set (as described above) */)
.Include(/* additional entity set that's neede in end results */)
.Where(string.Format("it.Id in {{{0}}}", string.Join(",", ids.ConvertAll(id => id.ToString()).Skip(pageSize * currentPageIndex).Take(pageSize).ToArray())))
.OrderByDescending(e => e.ChangedOn)
.ToList();
}
First of all I'm getting ordered IDs of my entities. Getting only IDs is well performant even with larger set of data. MySql query is quite simple and performs really well. In the second part I partition these IDs and use them to get actual entity instances.
Thinking of it, this should perform even better than the way I was doing it at the beginning (as described in my question), because getting total count is much much quicker due to simplified query. The second part is practically very very similar, except that my entities are returned rather by their IDs instead of partitioned using Skip and Take...
Hopefully someone may find this solution helpful.
I haven't worked directly with Linq to Entities, but it should have a way to hook specific stored procedures into certain locations when needed. (Linq to SQL did.) If so, you could turn this query into a stored procedure, doing exacly what is required, and doing it efficiently.
Assuming from you comment the persisting the values in a List is not acceptable:
There's no way to completely minimize the iterations, as you intended (and as I would have tried too, living in hope). Cutting the iterations down by one would be nice. Is it possible to just get the Count once and cache/session it? Then you could:
int total = ctx.EntitySet.Count; // Hopefully you can not repeat doing this.
var result = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).Skip(n).Take(x).ToList();
Hopefully you can cache the Count somehow, or avoid needing it every time. Even if you can't, this is the best you can do.
Could you please create a sample illusrating the problem and send it to us (support * devart * com, subject "EF: Skip, Take, OrderBy")?
Hope we will be able to help you.
You can also contact us using our forums or contact form.
Are you absolutely certain the ordering is off? What does the SQL look like?
Can you reorder your code as follows and post the output?
// Redefine your queries.
var query = ctx.EntitySet.Where(/* filter */).OrderBy(e => e.ChangedDate);
var skipped = query.Skip(n).Take(x);
// let's look at the SQL, shall we?
var querySQL = query.ToTraceString();
var skippedSQL = skipped.ToTraceString();
// actual execution of the queries...
int total = query.Count();
var result = skipped.ToList();
Edit:
I'm absolutely certain. You can check my "edit" to see trace result of my query with skipped trace result that is imperative in this case. Count is not really important.
Yeah, I see it. Wow, that's a stumper. Might even be an outright bug. I note you're not using SQL Server... what DB are you using? Looks like it might be MySQl.
One way:
var query = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).ToList();
int total = query.Count;
var result = query.Skip(n).Take(x).ToList();
Convert it to a List before skipping. It's not too efficient, mind you...