I am currently using Sphinx for indexing a MySQL query with 20+ million records.
I am using a delta index to update the main index and add all new records.
Unfortunately allot of changes to the tables are deleted.
I understand that I can use sql_query_killlist to get all document ID's that need to be deleted or updated. Unfortunately I don't understand how this actually works and the documentation from Sphinx does not have a good enough example for me to understand.
If I use the following example, how could I implement the killlist?
in MySQL
CREATE TABLE sph_counter
(
counter_id INTEGER PRIMARY KEY NOT NULL,
max_doc_id INTEGER NOT NULL
);
in sphinx.conf
source main
{
# ...
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
sql_query = SELECT id, title, body FROM documents \
WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}
source delta : main
{
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, title, body FROM documents \
WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}
index main
{
source = main
path = /path/to/main
# ... all the other settings
}
note how all other settings are copied from main,
but source and path are overridden (they MUST be)
index delta : main
{
source = delta
path = /path/to/delta
}
The specifics depend a lot on how you mark deleted documents. But would just add something like
sql_query_killist = SELECT id FROM documents
WHERE status='deleted'
AND id<=( SELECT max_doc_id FROM sph_counter
WHERE counter_id=1 )
to the delta index. That would capture ids of deleted record that are in the main index, and add them to the killlist so they would never appear in search results.
If want to capture updated records, need to arrange for the new rows to be included in the main sql_query of the delta, AND their ids to be in the kill-list.
Related
I've got a machine log available in an SQL table. I can do a bit in SQL, but I'm not good enough to process the following:
In the data column there are entries containing "RUNPGM: Recipe name" and "RUNBRKPGM: Recipe name"
What I want is a view containing 4 columns:
TimeStamp RUNPGM
TimeStamp RUNBRKPGM
Recipe Name
Time Difference in seconds
There is a bit of a catch:
Sometimes the machine logs an empty RUNBRKPGM that should be ignored
The RUNBRKPGM is sometimes logged with an error message. This entry should also be ignored.
It's always the RUNBRKPGM entry with just the recipe name that's the actual end of the recipe.
NOTE: I understand this is not a full/complete answer, but with info available in question as of now, I believe it at least helps give a starting point since this is too complicated (and formatted) to put in the comments:
If Recipe is everything in the DATA field except the 'RUNPGM = ' part you can do somethign similar to this:
SELECT
-- will give you a col for TimeStamp for records with RUNPGM
CASE WHEN DATA LIKE 'RUNPGM%' THEN TS ELSE '' END AS RUNPGM_TimeStamp,
-- will give you a col for TimeStamp for records with RUNBRKPGM
CASE WHEN DATA LIKE 'RUNBRKPGM%' THEN TS ELSE '' END AS RUNBRKPGM_TimeStamp,
-- will give you everything after the RUNPGM = (which I think is the recipe you are referring to)
CASE WHEN DATA LIKE 'RUNPGM%' THEN REPLACE(DATA, 'RUNPGM = ', '' AS RUNPGM_Recipe,
-- will give you everything after the RUNBRKPGM = (which I think is the recipe you are referring to)
CASE WHEN DATA LIKE 'RUNBRKPGM:%' THEN REPLACE(DATA, 'RUNBRKPGM = ', '' AS RUNPGM_Recipe
FROM TableName
Im not sure what columns you want to get the Time Difference on though so I dont have that column in here.
Then if you need to do additional logic/formatting on the columns once they are separated you can put the above in a sub select.
As a first swing, I'd try the following:
Create a view that uses string splitting to break the DATA column into a its parts (e.g. RunType and RecipeName)
Create a simple select that outputs the recipe name and tstamp where the runtype is RUNPGM.
Then add an OUTER APPLY:
Essentially, joining onto itself.
SELECT
t1.RecipeName,
t1.TimeStamp AS Start,
t2.TimeStamp AS Stop
--date func to get run time, pseudo DATEDIFF(xx,t1.TimeStamp, t2.TimeStamp) as RunTime
FROM newView t1
OUTER APPLY ( SELECT TOP ( 1 ) *
FROM newView x
WHERE x.RecipeName = t1.RecipeName
AND RunType = 'RUNBRKPGM'
ORDER BY ID DESC ) t2
WHERE t1.RunType = 'RUNPGM';
I have a couchbase document with id x
x has no subdocument , As all were deleted in some subdoc operation , it is something like this {}
I want to delete all such empty docs with no subdocs. Is it possible in couchbase ,using an N1QL query or otherwise? I tried googling for solutions, but I found no relevant solutions.
Thanks for taking the time to read the question till the end.
The following query requires primary index.
DELETE FROM default AS d
WHERE d = {};
The following query uses ix10 as covered index. Index contain only empty objects.
CREATE INDEX ix10 ON default(OBJECT_LENGTH(self))
WHERE OBJECT_LENGTH(self) = 0;
DELETE FROM default AS d
WHERE OBJECT_LENGTH(d) = 0;
You can verify with the following data. The select should only give "k003"
INSERT INTO default VALUES ("k001",1), VALUES ("k002",{"a":1}), VALUES ("k003",{});
SELECT META(d).id
FROM default AS d
WHERE OBJECT_LENGTH(d) = 0;
I have a file with some empty fields like this:
(first column being primary key- a1,b1,b2)
a1,b,,d,e
b1,c,,,,e
b2,c,c,,
I have already present in table like
a1,c,f,d,e
Now for this key a1 using replace option and lad data infile I want final output like:
a1,b,f,d,e
Here c in second column has been replaced by b,
but f has not been replaced by empty string.
To make it clear: Replace field if an actual value is present in file
if an empty field is present, retain the old value.
Let consider 2 tables having 5 columns present
in t1 table -columns are c1,c2,c3,c4,c5
in t2 table -columns are d1,d2,d3,d4,d5
so query will become like this:
select c1 as e1
ifnull(c2,d2) as e2,
ifnull(c3,d3) as e3,
ifnull(c4,d4) as e4,
ifnull(c5,d5) as e5
from t1
inner join t2 on c1 = d1;
hope it will helpful to you.
Please try the following...
CREATE TABLE tempTblDataIn LIKE tblTable;
/* Read your data into tempTblDataIn here */
UPDATE tblTableName
JOIN tempTblDataIn ON tblTableName.fldID = tempTblDataIn.fldID
SET tblTableName.fldField1 = COALESCE( tempTblDataIn.fldField1, tblTableName.fldField1 ),
tblTableName.fldField2 = COALESCE( tempTblDataIn.fldField2, tblTableName.fldField2 ),
tblTableName.fldField3 = COALESCE( tempTblDataIn.fldField3, tblTableName.fldField3 ),
tblTableName.fldField4 = COALESCE( tempTblDataIn.fldField4, tblTableName.fldField4 );
DROP TABLE tempTblDataIn;
This Answer is based on Eric's Answer at MySQL - UPDATE query based on SELECT Query.
It is also based on the assumption that the data file will contain update data only rather than update data and new records.
Yes, you will need to do a COALESCE() line for each field. You will probably have to code each line yourself. You could use a PROCEDURE if there are many fields with a repeated structure to programmatically produce the above statements, but you may find the above simpler.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://dev.mysql.com/doc/refman/5.7/en/create-table-like.html (on MySQL's CREATE TABLE ... LIKE)
https://dev.mysql.com/doc/refman/5.7/en/update.html (on MySQL's UPDATE statement)
I have a table with the following fields:
id | domainname | domain_certificate_no | keyvalue
An example for the output of a select statement can be as:
'57092', '02a1fae.netsolstores.com', '02a1fae.netsolstores.com_1', '55525772666'
'57093', '02a1fae.netsolstores.com', '02a1fae.netsolstores.com_2', '22225554186'
'57094', '02a1fae.netsolstores.com', '02a1fae.netsolstores.com_3', '22444356259'
'97168', '02aa6aa.netsolstores.com', '02aa6aa.netsolstores.com_1', '55525772666'
'97169', '02aa6aa.netsolstores.com', '02aa6aa.netsolstores.com_2', '22225554186'
'97170', '02aa6aa.netsolstores.com', '02aa6aa.netsolstores.com_3', '22444356259’
I need to sanitize my db such that: I want to remove the domain names that have repeated keyvalue for the first domain_certificate_no (i.e, in this example, I look for the field domain_certificate_no: 02aa6aa.netsolstores.com_1, since it is number 1, and has repeated value for the key, then I want to remove the whole chain which is 02aa6aa.netsolstores.com_2 and 02aa6aa.netsolstores.com_3 and this by deleting the domain name that this chain belongs to which is 02aa6aa.netsolstores.com.
How can I automate the checking process for the whole DB. So, I have a query that checks any domain name in the pattern ('%.%.%) EDIT: AND they have share domain name (in this ex: netsolstores.com) , if it finds cert no. 1 that belongs to this domain name has a repeated key value, then delete. Otherwise no. Please, note tat, it is ok for domain_certificate_no to have repeated value if it is not number 1.
EDIT: I only compare the repeated valeues for the same second level domain name. Ex: in this question, I compare the values that share the domain name: .netsolstores.com. If I have another domain name, with sublevel domains, I do the same. But the point is that I don't need to compare the whole DB. Only the values with shared domain name (but different sub domain).
I'm not sure what happens with '02aa6aa.netsolstores.com_1' in your example.
The following keeps only the minimum id for any repeated key:
with t as (
select t.*,
substr(domain_certificate_no,
instr(domain_certificate_no, '_') + 1, 1000) as version,
left(domain_certificate_no, instr(domain_certificate_no, '_') - 1) as dcn
from t
)
select t.*
from t join
(select keyvalue, min(dcn) as mindcn
from t
group by keyvalue
) tsum
on t.keyvalue = tsum.keyvalue and
t.dcn = tsum.mindcn
For the data you provide, this seems to do the trick. This will not return the "_1" version of the repeats. If that is important, the query can be pretty easily modified.
Although I prefer to be more positive (thinking about the rows to keep rather than delete), the following should delete what you want:
with t as (
select t.*,
substr(domain_certificate_no,
instr(domain_certificate_no, '_') + 1, 1000) as version,
left(domain_certificate_no, instr(domain_certificate_no, '_') - 1) as dcn
from t
),
tokeep as (
select t.*
from t join
(select keyvalue, min(dcn) as mindcn
from t
group by keyvalue
) tsum
on t.keyvalue = tsum.keyvalue and
t.dcn = tsum.mindcn
)
delete from t
where t.id not in (select id from tokeep)
There are other ways to express this that are possibly more efficient (depending on the database). This, though, keeps the structure of the original query.
By the way, when trying new DELETE code, be sure that you stash a copy of the table. It is easy to make a mistake with DELETE (and UPDATE). For instance, if you leave out the WHERE clause, all the rows will disappear, after the long painful process of logging all of them. You might find it faster to simply select the desired results into a new table, validate them, then truncate the old table and re-insert them.
I'm a bit embarassed asking this here, but here goes:
I've got two tables, which you can see here:
http://img411.imageshack.us/img411/4562/query.jpg
I need to copy the effortid from the one table into the other, making sure that the values still maintain the correction relationships. The primary key for each is a combination of loggerid & datetime. What's the best way to do this?
Thanks in advance, and don't make fun :)
Change it to an Update Query instead. The joins should function correctly, but will not add missing rows. To do that, you would use an Append Query, like you have setup, but with a left join and a check for nulls. The sample below updates the LogID table with information residing in LogSiteID table.
Append Missing Records from Logger Site ID to LogID
INSERT INTO logID ( [Datetime], loggerid, temp, effortid )
SELECT ls.datetime, ls.loggerid, ls.temp, ls.effortid
FROM logID AS l RIGHT JOIN [Logger Site ID] AS ls ON (l.temp = ls.temp) AND (l.loggerid = ls.loggerid) AND (l.Datetime = ls.datetime)
WHERE (((l.loggerid) Is Null));
Update effortids from Logger Site ID to LogID
UPDATE logID AS l INNER JOIN [Logger Site ID] AS ls ON (l.Datetime = ls.datetime) AND (l.temp = ls.temp) AND (l.loggerid = ls.loggerid) SET l.effortid = [ls].[effortid];