How to implement a full text search customized by user relations - mysql

I am currently working on a laravel application where users can add content like articles and those article are searchable via a search engine. I would like to implement a modern full text search solution.
But the fact is it is possible for a user to put an article as private making it readable only by his followers or friends.
Implementing in simple SQL would be simple using a simple where clause on a pivot table relationship, but this is all but performant on large databases.
I made research's and devs on elastic search and other search engines but the limitation is that all the dataset is searchable and I cannot customize the filters according to a user defined relationship.
Should I create one index per user instead of having a global index ? This seems to have a huge impact also
I would really appreciate any of your thought about this, thanks in advance.

Try changing the perspective when looking at the problem.
Instead of thinking in terms of an article being accessible by certain groups of users, think in terms of a user and what articles she/he can access.
The search is always performed by a specific user, so it's known whom she/he follows (followed_user_ids) and is friends with (friend_ids). This information can be used at query build time.
The example query could look like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"private": false
}
},
{
"bool": {
"filter": [
{
"term": {
"private": true
}
}
],
"should": [
{
"terms": {
"author_id": followed_user_ids,
}
},
{
"terms": {
"author_id": friend_ids,
}
}
],
"minimum_should_match": 1
}
}
]
}
}
}
It would find articles that:
are not private (visible to all); or
are private, but authored by friends or users followed by the current user

you can do this in Elasticsearch a few ways, that I can think of
ideally use document level security as it's the most secure approach, but don't use filtered aliases with this
add a boolean private field that you can then filter on. this is far less secure than DLS from above
if you've tried that second approach, then sharing what you did and what didn't work would help

Related

Trying to get filtered response from query with multiple terms from elasticsearch

As the title states, im trying to make a query that doesnt return the entire document, but only certain fields, but with multiple exact terms.
Im using Guzzle from laravel to contruct my query:
$response = $client->post('cvr-permanent/_search', [
'auth' => ['USERNAME', 'PASSWORD'],
'json' => [
"_source" => ["Vrvirksomhed.attributter", "Vrvirksomhed.deltagerRelation.organisationer.medlemsData.attributter"],
"query" => [
"bool"=> [
"must"=> [
[
"term"=> [
"Vrvirksomhed.cvrNummer" => $vat
]
]],
"must_not"=> [ ],
"should"=> [ ]
]
],
"from"=> 0,
"size"=> 10,
"sort"=> [ ]
]
]);
I want the data from the Vrvirksomhed.cvrNummer and the data i want is where Vrvirksomhed.attributter.type => "KAPITAL" and Vrvirksomhed.deltagerRelation.deltager.navne and where Vrvirksomhed.deltagerRelation.organisation.attributter.type = "EJERANDEL_PROCENT"
Im very confused about how to make this query work because it is multiple terms but not really. Also very new to elasticsearch.
I tried the "terms" but couldnt really get it to work.
The query i have made above, return way too much data i dont need, and not all the data i DO need.
Hope you can help
**EDIT
Something like this maybe, but translated to elasticsearch
SELECT attributter.type": "KAPITAL" AND deltagerRelation.deltager.navne AND deltagerRelation.organisation.attributter.type": "EJERANDEL_PROCENT FROM Vrvirksomhed WHERE cvrNummer = $vat
***EDIT
Hopefully more clarification:
Okay, sorry ill try to make it clearer. The object i want is a company with a certain vat number. So Vrvirksomhed.cvrNummer is that, and that has to be the term. It returns a gigantic object with so many arrays in arrays. I do not want all of this data but only some of it. The data i need from this big object, is the object in the array Vrvirksomhed.attributter that has the type : "KAPITAL field, and not all of the attributter. Then i want Vrvirksomhed.deltagerRelation.deltager.navne which i can get by just putting it in the _source because i want all of these objects. But then i want Vrvirksomhed. deltagerRelation.organisation.attributter that again is a bunch of objects in the array attributter but i only want the ones with the type : "EJERANDEL_PROCENT
So i can´t really add them as additional "terms" because the only real term is the "cvrNummer", everything else is just filtering the response. I tried with filters etc, but to no avail
Heres a pastebin so you can see the clusterfuck i am dealing with. THis is what i have been able to sort it to so far, with putting the things in _source but without the extra "filtering" of "KAPITAL" and "EJERANDEL_PROCENT"
https://pastebin.com/b8hWWz1R
You want to get only documents which match several conditions, and you need only a subset of fields from those documents, correct?
In SQL (taking some liberties with the field names and structure), your query would be something like:
SELECT cvrNummer
FROM Vrvirksomhed
WHERE attributter_type = 'KAPITAL'
AND deltagerRelation_deltager_navne = 'you left this out in your question'
AND deltagerRelation_organisation_attributter_type = 'EJERANDEL_PROCENT'
As explained in the Elasticsearch Guide†, the equivalent to this in Elasticsearch is a query with a bool clause that contains all your conditions, and a _source parameter which says what fields you want to get back in the response. Something like the following:
{
"_source": ["cvrNummer"]
"query": {
"bool": {
"must": [
{ "term": "attributter.type": "KAPITAL" },
{ "term": "deltagerRelation.deltager.navne": "you left this out in your question" },
{ "term": "deltagerRelation.organisation.attributter.type": "EJERANDEL_PROCENT" }
]
}
}
}
† Do note that the syntax in this guide is for Elasticsearch 2.x. The current version is 7.x, and many things have changed since then!
See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html for how to construct a bool query using the new syntax;
see https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html for how to use the term-level queries, which you probably want;
also see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html and consider using filter context, since you probably don't care about the score of your query.

Which Database back end should I use for Faculty Feedback System for an Engineering College?

I'm working on A Faculty Feedback System for an Engineering College that'll take feedback from the Students in Intranet.
Functions like,
Student will give feedback for individual faculty members of their respective class for their respective subject.
Students are divided in the batches like (A,B,C,D,E) for their practical purpose and individual feedback should also be there for respective Lab Faculties.
Subject Entry with respective Faculty needs to be stored prior to start taking feedback. So that can automatically retrieved for Student Feedback.
List of Questions is stored to display as feedback form to the student with some categories like (Excellent, Good, Okay, Poor) & when student submit his/her feedback, it need to be stored in the back end DB.
Now for all this, First i thought about the Document Structure based JSON DB and also figure out some structure by dividing Semester, Class, Lab, User, Student, Faculty, Admin, Feedback in the individual JSON nodes with individual entries/fields as below:
{
Semester: { sem01: { name: "", year: "2017"}
},
Class: { class01: { name: "classo1", semid:"sem01"}
},
Lab :{ lab01 : {name : "labA", classid : "class01"}
},
User : { user01 :{ name : "faculty01", pwd: "****", role : "faculty"}
},
Faculty : { faculty01 : { userid : "user01",fname : "NAME", lname : "" }
},
Student : { student01 : { userid : "user02"}
},
Feedback : { feedbackid : { feedback : {}, user : "user02"}
}
}
This is only on paper and thought about this idea.
I am confused how I'll store the Questions in this structures for each feedback. Or else I will move to use Table bases DB for whole system.
I any one have some idea, please help me out. Thanks in advance.
Seeing the amount of data which is being involved in your use case , you can all the time go with relational database .
One should go with nosql , if there are large amount of joins or the data is huge to be incorporated in single machine . Sharding is implemented implicitly in nosql making it easy to shard the data when required .
For nosql , data needs to be stored , in a way such that data is in precomputed form .
So while creating structure for document based database , you need to consider =>
Areas where you can tolerate duplicate entries , but you want data in O(1).
Like in your case , if we want data for teachers , we can store data in following way :-
{
"name" : "xyz",
"feedbacks":[
"Q1":[
{
"cand1":"good",
"cand2":"bad"
}
],
"Q2":[
{
"cand1":"good",
"cand2":"bad"
}
]
]
}
By storing data in this format , you can get data for teachers without referring to some other collection .
Same way you can store data for students .
At the end , there would be redundant data but your retrieval process would be fast .
So its like memory vs retrieval time trade off .

CouchDB get all databases and their last changes

I know that i can get all databases with
GET _all_dbs
and also the last change of a database by
GET /{db}/_changes?descending=true&limit=1
the result will be like:
{
"results":[
{
"seq":112,
"id":"20e3480f5db4802d94a8193ac2246ae7",
"changes":[
{
"rev":"2-fb8204608047ce016282acbf3239cd01"
}
],
"deleted":true
}
],
"last_seq":112
}
Now is it possible to combine these statements to get something like:
{
"results":[
{
"db1":"1-fb8204608047ce016282acbf3239cd01"
},
{
"db2":"2-fb8204608047ce016282acbf3239cd02"
},
{
"db3":"2-fb8204608047ce016282acbf3239cd03"
},
{
"db4":"2-fb8204608047ce016282acbf3239cd04"
}
]
}
where "db1" is a database name and "2-fb8204608047ce016282acbf3239cd04" is the last _rev of the database.
There is no mechanism to make any query across multiple database in couchdb.
You can however do this from your application by joining the result of multiple queries.

Return list of posts in order of time 'liked' as stored in User collection array

Okay, so here's my problem. I now have an array like this that is added to when a user 'likes' a post, which also captures the time that they have liked it.
"liked_times" : [
{
"postId" : "5CeN95hYZNK5uzR9o",
"likedAt" : ISODate("2015-09-28T02:55:08.803Z")
},
{
"postId" : "vN7uZ2d6FSfzYJLmm",
"likedAt" : ISODate("2015-09-28T02:55:26.118Z")
},
{
"postId" : "b5JEtHb9hCsQPeEQB",
"likedAt" : ISODate("2015-09-28T02:55:31.359Z")
}
]
We used to just capture the postId, but updated to getting the time as well with this post How would I return the order of MongoDB Posts by time Favourited by user?
So then we used to be able to retrieve the posts with:
Posts.find({ _id: { $in: user.liked } },
{sort: {createdAt: -1} });
But now I'm trying to work out how I would return the posts a user has liked in the order that he liked them using this new data. I'm sure there must be a relatively easy way.
Edit as per comments: I also have a simple array or list of docs with just the postId
"liked" : [
"D4turxiezBvPtNjDr",
"Hk2YpqgwRM4QCgsLv",
"vN7uZ2d6FSfzYJLmm",
"beNdpXhYLnKygD3yd",
"EBMKgrD4DjZxkxvfY"
],
When a post is liked by the user the Id is added to the end of this list. However when I return posts with return Quotes.find({ _id: { $in: user.liked } }; They are not ordered in the same order (I'm thinking due to a Minimongo local cache behaviour).
Is there another way I should be calling these docs to order them in the specific order that the array stores them? Or should I just go ahead and try to solve this using the likedAt times programatically in the code?

Social media profiles in h-card HTML microformats

I want to use the h-card microformats on my new website.
How do I add social media profiles like twitter, facebook, etc to my h-card in a correct way?
For phone numbers it is possible to add a type attribute. Like 'cell', 'home', or whatever you want.
Is it OK when I do this with social media profiles too? Like:
<span class="u-url">
<span class="type">Twitter</span>:
<span class="value">http://twitter.com/blabla</span>
</span>
According to this page of the documentation this should be possible. But all the tutorials I've found about h-card or hcard just add all the social media profiles without a type attribute. So I'm not sure what's the right way to do it.
There is a note on the microformats2 site at the microformats wiki:
Note: use of 'value' within 'tel' should be automatically handled by
the support of the value-class-pattern. And for now, the 'type'
subproperty of 'tel' is dropped/ignored. If there is demonstrable
documented need for additional tel types (e.g. fax), we can introduce
new flat properties as needed (e.g. p-gel-fax).
That means that the value-class-pattern is not yet supported by the mf2 spec, but you can try the vendor prefixes and use for example u-x-twitter u-url instead of u-url.
This for example:
<div class="h-card">Twitter</div>
would be interpreted like:
{
"items": [
{
"type": [
"h-card"
],
"properties": {
"x-twitter": [
"http:\/\/twitter.com\/blabla"
],
"url": [
"http:\/\/twitter.com\/blabla"
],
"name": [
"Twitter"
]
}
}
],
"rels": {}
}
BTW: You can try/validate your code here: http://pin13.net