How avoid infinity composed indexes in firestores - json

I'm doing a social media and in my posts i have 'tags' (which are practical infinity, as they are more then 200), i want my users to filter both tags and date, example:
myRef.where("tag" == "tagName").orderBy('date', 'asc')
BUT... I do have infinity number of tagNames, which give me a shock and i couldn't handle.
Should i create a custom map with sections of 1m size ???
Should i create a custom ID with data on it???
How will i be able to mix data asc with these queries or mix two or more types together?

The query you have requires an index on tag + date, not on tagName + date.
But if you want to keep a list of tags for each document, you'll want to store those in an array, and then use array-contains to check whether the document has a certain tag. To see if tagName exists in the array of String values tag, you'd query for:
myRef.where("tag", "array-contains", "tagName").orderBy('date', 'asc')
For more on this, see Better Arrays in Cloud Firestore!

Related

Semantic Mediawiki: aggregation similar to SQL GROUP BY like #ask query

I've implemented a page with a long list of subojects.
Every object contains one article (title + url) and N tags. I'd like to group by tag and show the count of articles related to that tag.
Something like:
SELECT tag, count(distinct article)
GROUP BY tag
I found an answer but it's very generic and I'd also like to document the solution for other user with the same problem.
As you know from previous answers to this question, you cannot have a "distinct" function from an SMW ask query.
My preferred solution is to use the "arrays" extension, that allows you to access PHP array manipulation functions in wiki code. Further than "distinct" list of value, its an irreplaceable tool for handling semantic data from queries.
You can create an array with the following function :
{{#arraydefine: *identifier* | *data* | *delimiter* | *parameters* }}
Identifier is the variable name you want.
Data is the array content, in SMW context, you load it with a query result content.
Delimiter specify the array delimiter relative to data. This have to be
coherent with the delimiter chosen in the ask query.
Parameters is where the magic appends. You can set a "unique" parameter, reducing the data list to unique values, thus, emulating the "distinct" function.
In tour case, you may do something like :
{{#arraydefine:tags
| {{#ask:[[-Has subobject::{{FULLPAGENAME}}]]
|?Tags#-=
| mainlabel=-
|limit = 1000
}}
|,
|unique
}}
Note that SMW ask query are, by default, limited to 50 results. Adding "limit=" adjusts the maximum result size.
At this point, you defined an array called "tags" containing all distinct values of this property.
You can use arrayprint function for any further data treatment or display.

yii pagination issue trying to use 2 criterias

Disclaimer I'm self taught. Got my rudimentary knowledge of php reading forums. I'm an sql newb, and know next to nothing about yii.
I've got a controller that shows the products on our webstore. I would like the out of stock products to show up on the last pages.
I know I could sort by stock quantity but would like the in stock products to change order every time the page is reloaded.
My solution (probably wrong but kinda works) is to run two queries. One for the product that has stock, sorted randomly. One for the out of stock product also ordered randomly. I then merge the two resulting arrays. This much has worked using the code below (although I feel like there must be a more efficient way than running two queries).
The problem is that this messes up the pagination. Every product returned is listed on the same page and changing pages shows the same results. As far as I can tell the pagination only works for 1 CDbCriteria at a time. I've looked at the yii docs for CPagination for a way around this but am not getting anywhere.
$criteria=new CDbCriteria;
$criteria->alias = 'Product';
$criteria->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria->addCondition('Product.parent IS NULL');
$criteria->addCondition('web=1');
$criteria->addCondition('current=1');
$criteria->addCondition('sell>sell_web');
$criteria->order = 'RAND()';
$criteria2=new CDbCriteria;
$criteria2->alias = 'Product';
$criteria2->addCondition('(inventory_avail<1 AND inventoried=1)');
$criteria2->addCondition('Product.parent IS NULL');
$criteria2->addCondition('web=1');
$criteria2->addCondition('current=1');
$criteria2->addCondition('sell>sell_web');
$criteria2->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
//I know there is something wrong here, no idea how to fix it..
$count=Product::model()->count($criteria);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
Clearly I am in over my head. Any help would be much appreciated.
Edit:
I figured that a third CDbCriteria that includes both the in stock and out of stock items could be used for the pagination (as it would include the same number of products as the combined results of the first 2). So I tried adding this (criteria1 and criteria2 remain the same):
$criteria3=new CDbCriteria;
$criteria3->alias = 'Product';
//$criteria3->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria3->addCondition('Product.parent IS NULL');
$criteria3->addCondition('web=1');
$criteria3->addCondition('current=1');
$criteria3->addCondition('sell>sell_web');
//$criteria3->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$count=Product::model()->count($criteria3);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria3);
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
I'm sure I'm missing something super obvious here... Been searching all day getting nowhere.
So you are running into what is IMO one of the potential drawbacks of natural language query builder frameworks. They can get your thinking on how you might approach a SQL problem going down a bad path when trying to work with the "out of the box" methods for building queries. Sometimes you might need to think about using raw SQL query capabilities that most every framework to provide in order to best address your problem.
So let's start with the basic SQL for how I would suggest you approach your problem. You can either work this into your query builder style (if possible) or make a raw query.
You could easily form a calculated field representing binary inventory status for sorting. Then also sort by another criteria secondarily.
SELECT
field1,
field2,
/* other fields */
IF(inventory_avail > 0, 1, 0) AS in_inventory
FROM product
WHERE /* where conditions */
ORDER BY
in_inventory DESC, /* sort items in inventory first */
other_field_to_sort ASC /* other sort criteria */
LIMIT ?, ? /* pagination row limit and offset */
Note that this approach only returns the rows of data you need to display. You move away from your current approach of doing a lot of work in the application to merge record sets and such.
I do question use of RAND() for pagination purposes as doing so will yield products potentially appearing on one page after another as the user paginates through the pages, with other products perhaps not showing up at all. Either that or you need to have some additional complexity added to your applicatoin to somehow track the "randomized" version of the entire result set for each specific user. For this reason, it is really unusual to see order randomization for paginated results display.
I know you mentioned you might like to spike out a randomized view to the user on a "first page". If this is a desire that is OK, but perhaps you decouple or differentiate that specific view from a wider paginated view of the product listing so as to not confuse the end user with a seemingly unpredictable pagination interface.
In your ORDER BY clause, you should always have enough sorting conditions to where the final (most specific) condition will guarantee you a predictable order result. Oftentimes this means you have to include an autoincrementing primary key field, or similar field that provides uniqueness for the row.
So let's say for example I had the ability for user to sort items by price, but you still obviously wanted to show all inventoried items first. Now let's say you have 100K products such that you will have many "pages" of products with a common price when ordered by price
If you used this for ordering:
ORDER BY in_inventory DESC, price ASC
You could still have the problem of a user seeing the same product repeated when navigating between pages, because a more specific criteria than price was not given and ordering beyond that criteria is not guaranteed.
You would probably want to do something like:
ORDER BY in_inventory DESC, price ASC, unique_id ASC
Such that the order is totally predictable (even though the user may not even know there is sorting being applied by unique id).

Find column values that are a start string of given string.

I have a database table that contains URLs in a column. I want to show certain data depending on what page the user is on, defaulting to a 'parent' page if not a direct match. How can I find the columns where the value is part of the submitted URL?
Eg. I have www.example.com/foo/bar/baz/here.html; I would expect to see (after sorting on length of column value):
www.example.com/foo/bar/baz/here.html
www.example.com/foo/bar/baz
www.example.com/foo/bar
www.example.com/foo
www.example.com
if all those URLs are in the table of course.
Is there a built in function or would I need to create a procedure? Googling kept getting me to LIKE and REGEXP, which is not what I need. I figured that a single query would be much more efficient than chopping the URL and making multiple queries (the URLs could potentially contain many path components).
Simple turn around the "Like" operator:
SELECT * FROM urls WHERE "www.example.com/foo/bar/baz/here.html" LIKE CONCAT(url, "%");
http://sqlfiddle.com/#!2/ef6ee/1

Rows count of Couchbase view subset

When I query some view in Couchbase I get the response that has following structure:
{
"total_rows":100,
"rows":[...]
}
'total_rows' is very useful property that I can use for paging.
But lets say I select only a subset of view using 'start_key' and 'end_key' and of course I don't know how big this subset will be. 'total_rows' is still the same number (as I understand it's just total of whole view). Is there any easy way to know how many rows was selected in subset?
You can use the in-built reduce function _count to get the total count of your query.
Just add _count as reduce function for your view. After that, you will need to make two calls to couchbase:
In one call, you'll set the query param reduce=true (along with either group=true or group_level=n, depending upon how you're sending your key(s)). This will give you the total count of your filtered rows.
In the other call, you'll disable the reduce function with reduce=false because you now need the actual rows.
You can find more details about map and reduce at http://docs.couchbase.com/admin/admin/Views/views-writing.html
You can just use an array count/total/length in whatever language you are using.
For example in PHP:
$result = $cb->view("dev_beer", "beer_by_name", array('startkey' => 'O', 'endkey'=>'P'));
echo "total = >>".count($result["rows"])
If you're actually wanting to paginate your data then you should use limit and skip:
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-writing-querying-pagination.html
If you have to paginate the view in the efficient way, you actually don't need to specify both start and the end.
Generally it is possible to use startkey/startkey_id and limit. In this case the limit will tell you that the page won't be bigger than known size.
Both cases are described in CouchDB book: http://guide.couchdb.org/draft/recipes.html#pagination
Here is how it works:
Request rows_per_page + 1 rows from the view
Display rows_per_page rows, store + 1 row as next_startkey and next_startkey_docid
As page information, keep startkey and next_startkey
Use the next_* values to create the next link, and use the others to create the previous link

CakePHP Find - Order By String-To-Int?

I want to use CakePHP to pull an array of photos from a database, sorted by photo title (0, 1, 2, 3...) My query currently looks something like:
$ss_photos = $this->Asset->find('all',array(
'conditions'=>array('kind'=>'photo'),
'order'=>'title'
));
Unfortunately the titles seem to be in string format, leading to an undesirable sort order (2.jpg after 19.jpg, etc). Is there a quick way to cast 'title' as an int for ordering purposes within a Cake query of this type?
Not sure if this is "recommended practice", but on a first pass it seems to work:
$ss_photos = $this->Asset->find('all',array(
'conditions'=>array('kind'=>'photo'),
'order'=>'Asset.title + 0'
));
Any opinions?
The solution is to create a hidden column which is responsible for orders in your example image names should be: 00002.jpg, 00019.jpg - this way the order will work properly.
If the results are not too many, I think it's easier to sort them in PHP (if you use it of course :)) See this natsort() you just need to extract a list of images and to sort them.