I have a bucket containing documents like this:
{
"x" : "x",
"y" : "y",
"z" : "z",
//...
}
And I have a GSI index on 2 properties (x,y).
I'd like to know which of these scenarios is supposed to be more performant if I want to query the document by x and y:
1- SELECT * from bucket where x = "x" and y = "y"
2-
* SELECT meta().id from bucket where x = "x" and y = "y"
* SELECT * from bucket USE KEYS [The keys returned by previous query]
As #deniswsrosa suggested Option 1 performs better.
As the index has all the information about query predicate and IndexScan able to produce exact number of documents (no false positives). Checkout https://blog.couchbase.com/create-right-index-get-right-performance/
Depends on the how many documents query qualifies there might be better options.
As query required "data Fetch", the data needs to go through 2 hops ( Data Node to Query Service, Query Service to Client). If the resultant documents or less and size is small option 1 works fine.
If resultant document count are high and size also high you can explore the following option.
Using covering query produce the document keys
SELECT meta().id from bucket where x = "x" and y = "y"
Then use Couchbase SDKs Asynchronous API and directly get the documents from Data Node.
The first one should significantly faster, essentially because you are running one query instead of 2. However, as you are using "select *" in both queries, you will trigger "data fetch" which is ok in most cases, but if you need maximum performance you should try to use cover indexes instead https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/covering-indexes.html
Related
I have a view where user select fields to make an advance search, the fields are:
**name**- **age** - **location** - **isPaid** - **date_subscription**, **date_expiration**
the user might choose one column or make combinations of multiple columns, I am confused if I should use if statement to detect which columns are selected, and then run the query depends on the selected column, but this way I will need to set all the valid conditions, which mean set the all the valid combination. Is there an other way to execute such queries?
I was writing this query, but i stopped because i just realize how long it will be:
SELECT * FROM internetclientdetails icd INNER JOIN internetclient ic on ic.id = icd.icid WHERE
(icd.date_sub <=".$start_dateSubsc." and cd.date_sub >= ".$end_dateSubsc.")
OR
(icd.date_exp <=".$start_dateExp." and cd.date_exp >= ".$end_dateExp.")
OR
(
(icd.date_sub <=".$start_dateSubsc." and cd.date_sub >= ".$end_dateSubsc.")
AND
(icd.date_exp <=".$start_dateExp." and cd.date_exp >= ".$end_dateExp.")
)
OR
.
.
.
but this is too long to be write since i still have 4 left fields to set OR , AND operators
Generally you add some already query building library, which can construct valid SQL from the input you select.
There are a number of them, for example, Doctrine DB abstraction layer, Medoo and a number of others.
For example, in Medoo a rather complex query such as:
SELECT account,user_name FROM table
WHERE user_id IN (2,123,234,54) OR
email IN ('foo#bar.com','cat#dog.com','admin#medoo.in')
will be writen in PHP as:
$database->select("account", "user_name", [
"OR" => [
"user_id" => [2, 123, 234, 54],
"email" => ["foo#bar.com", "cat#dog.com", "admin#medoo.in"]
]
]);
So all you have to do when using medoo is to pass it the correct input from the form.
As regards your question about how the user selects different columns, you can use something like this:
$mapping=array("start_dateSubsc"=>"date_sub", "end_dateSubsc"=>"date_sub",...);
where you list all the possible fields for user to enter in webpage, which are mapped to the real database table column names.
Then you do something like this, when you process the page:
$wherequery["OR"]=array();
foreach ($mapping as $userfield => $dbfield)
{
if array_key_exists($userfield, $_REQUEST)
array_push($wherequery["OR"], $dbfield => $_REQUEST[$userfield]);
}
$database->select("your columns"),[ $wherequery ]);
This will work for fields that need to be = to what user said and where you must match any of the fields.
You would have a bit more to do with fields that can be in range, and process them seperately, as well as processing fields with "AND", but that depends on the range and possibilities of your actual fields.
I'm trying to set up an ability to get some numbers from my Sphinx indexes, but not sure how to get the info I want.
I have a mysql db with articles, sphinx index set up for that db and full text search, all working. What I want is to get some numbers:
How many times search text (keyword, or key phrase) appears over all articles for all time (more likely limited to "articles from time interval from X and to Y")
Same as previous but for how many times 2 keywords or keyphrases (so "x AND y") appear in same articles
I was doing something similar to first manually using bat file I made
indexer ind_core -c c:\%SOME_PATH%\development.sphinx.conf --buildstops stats.txt 10000 --buildfreqs
Which generated me a txt with all repeating keywords and how often they appear at early development stages, which helped to form a list of keywords I'm interested in. Now I'm trying to do the same but just for a finite list of predetermined keywords and integrated into my rails project to be able to build charts in future.
I tried running some queries like
#testing = Article.search 'Keyword1 AND Keyword2', :ranker => :wordcount
but I'm not sure how it works and how to process the result, as well as if that's what I'm looking for.
Another approach I tried was manual mysql queries such as
SELECT id,title,WEIGHT() AS w FROM ind_core WHERE MATCH('#title keyword1 | keyword2') OPTION ranker=expr('sum(hit_count)');
but I'm not sure how to process results from here either (as well as how to actually implement it into my existing rails project), and it's limited to 20 lines per query (which I think I can change somewhere in settings?). But at least looking at mysql results what I'm interested in is hit_count over all articles (or all articles from set timeframe).
Any ideas on how to do this?
UPDATE:
Current way I found was to add
#testing = Article.search params[:search], :without => {:is_active => false}, :ranker => :bm25
to controller with some conditions (so it doesn't bug out from nil search). :is_active is my soft delete flag, don't want to search deleted entries, so don't mind it. And in view I simply displayed
<%= #testing.total_entries %>
Which if I understand it correct shows me number of matches sphinx found (so pretty much what I was looking for).
So, to figure out the number of hits per document, you're pretty much on the right track, it's just a matter of getting it into Ruby/Thinking Sphinx.
To get the raw Sphinx results (if you don't need the ActiveRecord objects):
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
… this will return an array of hashes, and you can use the weight() string key for the hit count, and the sphinx_internal_id string key for the model's primary key (id is Sphinx's own primary key, which isn't so useful).
Or, if you want to use the ActiveRecord objects, Thinking Sphinx has the ability to wrap each search result in a helper object which passes appropriate methods through to the underlying model instances, but lets weight respond with the values from Sphinx:
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()"; ""
search.context[:panes] << ThinkingSphinx::Panes::WeightPane
search.each do |article|
puts article.weight
end
Keep in mind that panes must be added before the search is evaluated, so if you're testing this in a Rails console, you'll want to avoid letting the console inspect the search variable (which I usually do by adding ; "" at the end of the initial search call.
In both of these cases, as you've noted, the search results are paginated - you can use the :page option to determine which page of results you want, and :per_page to determine the number of records returned in each request. There is a standard limit of 1000 results overall, but that can be altered using the max_matches setting.
Now, if you want the number of times the keywords appear across all Sphinx records, then the best way to do that while also taking advantage of Thinking Sphinx's search options, is to get the raw results of an aggregate SUM - similar to the first option above.
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "SUM(weight()) AS count",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
search.first["count"]
I have a design problem with SQL request:
I need to return data looking like:
listChannels:
-idChannel
name
listItems:
-data
-data
-idChannel
name
listItems:
-data
-data
The solution I have now is to send a first request:
*"SELECT * FROM Channel WHERE idUser = ..."*
and then in the loop fetching the result, I send for each raw another request to feel the nested list:
"SELECT data FROM Item WHERE idChannel = ..."
It's going to kill the app and obviously not the way to go.
I know how to use the join keyword, but it's not exactly what I want as it would return a row for each data of each listChannels with all the information of the channels.
How to solve this common problem in a clean and efficient way ?
The "SQL" way of doing this produces of table with columns idchannel, channelname, and the columns for item.
select c.idchannel, c.channelname, i.data
from channel c join
item i
on c.idchannel = i.idchannel
order by c.idchannel, i.item;
Remember that a SQL query returns a result set in the form of a table. That means that all the rows have the same columns. If you want a list of columns, then you can do an aggregation and put the items in a list:
select c.idchannel, c.channelname, group_concat(i.data) as items
from channel c join
item i
on c.idchannel = i.idchannel
group by c.idchannel, c.channelname;
The above uses MySQL syntax, but most databases support similar functionality.
SQL is made for accessing two-dimensional data tables. (There are more possibilities, but they are very complex and maybe not standardized)
So the best way to solve your problem is to use multiple requests. Please also consider using transactions, if possible.
I have a challenge with the following database structure:
HEADER table called 'DOC' containing document details among which the document ID
DETAIL tabel called 'DOC_SET' containing data related to the document.
The header table is approximately 16000 records. The detail table contains on average 75 records per header table (1.2 million records in total).
I have one source document and its related set (source set). This source set I like to compare to the other documents' sets (which I refer to as destination documents and sets). Through my application I have a list of ID's of the source set available and as such also the length (in the example below shown as a list of 46 elements) which I can use in the query directly.
What I need per destination document is the length of the intersection (number of shared elements) of the source and destination sets and the length of the difference (length of what is in the source set and what is not in the destination set) for display. I also need a filter to retrieve only records for which a 75% intersection between source and destination, compared to the source set is reached.
Currently I have a query which does this by using sub selects containing expressions, but it is utterly slow and the results need to be available at page refresh in a web application. The point is I only need to display about 20 results at a time, but when sorting on calculated fields I need to calculate every destination record before being able to sort and paginate.
The query is something like this:
select
DOC.id,
calc_subquery._calcSetIntersection,
calc_subquery._calcSetDifference
from
DOC
inner join
(
select
DOC.id as document_id,
(
select
count(*)
from
DOC_SET
where
DOC_SET.doc_id = DOC.id and
DOC_SET.element_id in (60,114,130,187,267,394,421,424,426,603,604,814,909,1035,1142,1223,1314,1556,2349,2512,4953,5134,6318,6339,6344,6455,6528,6601,6688,6704,6705,6731,6894,6895,7033,7088,7103,7119,7129,7132,7133,7137,7154,7159,7188,7201)
) as _calcSetIntersection
,46-(
select
count(*)
from
DOC_SET
where
DOC_SET.doc_id = DOC.id and
DOC_SET.element_id in (60,114,130,187,267,394,421,424,426,603,604,814,909,1035,1142,1223,1314,1556,2349,2512,4953,5134,6318,6339,6344,6455,6528,6601,6688,6704,6705,6731,6894,6895,7033,7088,7103,7119,7129,7132,7133,7137,7154,7159,7188,7201)
) as _calcSetDifference
from
DOC
where
DOC.id = 2599
) as calc_subquery
on
DOC.id = calc_subquery.document_id
where
DOC.id = 2599 and
_calcSetIntersection / 46 > 0.75;
I'm wondering if:
this is possible while being performed in < 100msec or so on MySQL
on an average spec server running MySQL fully in memory (24Gb).
I should use a better suiting solution for this, perhaps like a NoSQL solution.
If I should use some sort of temporary table or cache containing
calculated values. This is an issue for me as the source set of id's
might change in between queries and the whole thing needs to be
calculated again.
Anyway, some thoughts or solutions are really appreciated.
Kind regards,
Eric
I am trying to make use of the mobile device lookup data in the WUFL database at http://wurfl.sourceforge.net/smart.php but I'm having problems getting my head around the MySQL code needed (I use Coldfusion for the server backend). To be honest its really doing my head in but I'm sure there is a straightforward approach to this.
The WUFL is supplied as XML (approx 15200 records to date), I have the method written that saves the data to a MySQL database already. Now I need to get the data back out in a useful way!
Basically it works like this: firstly run a select using the userAgent data from a CGI pull to match against a known mobile device (row 1) using LIKE; if found then use the resultant fallback field to look up the default data for the mobile device's 'family root' (row 2). The two rows need to be combined by overwriting the contents of (row 2) with the specific mobile device's features of (row 1). Both rows contain NULL entries and not all the features are present in (row 1).
I just need the fully populated row of data returned if a match is found. I hope that makes sense, I would provide what I think the SQL should look like but I will probably confuse things even more.
Really appreciate any assistance!
This would be my shot at it in SQL Server. You would need to use IFNULL instead of ISNULL:
SELECT
ISNULL(row1.Feature1, row2.Feature1) AS Feature 1
, ISNULL(row1.Feature2, row2.Feature2) AS Feature 2
, ISNULL(row1.Feature3, row2.Feature3) AS Feature 3
FROM
featureTable row1
LEFT OUTER JOIN featureTable row2 ON row1.fallback = row2.familyroot
WHERE row1.userAgent LIKE '%Some User Agent String%'
This should accomplish the same thing in MySQL:
SELECT
IFNULL(row1.Feature1, row2.Feature1) AS Feature 1
, IFNULL(row1.Feature2, row2.Feature2) AS Feature 2
, IFNULL(row1.Feature3, row2.Feature3) AS Feature 3
FROM
featureTable AS row1
LEFT OUTER JOIN featureTable AS row2 ON row1.fallback = row2.familyroot
WHERE row1.userAgent LIKE '%Some User Agent String%'
So what this does, is takes your feature table, aliases it as row1 to get your specific model features. We then join it back to itself as row2 to get the family features. Then the ISNULL function says "if there is no Feature1 value in row 1 (it's null) then get the Feature1 value from row2".
Hope that helps.