How to remove duplicate results in splunk - duplicates

Is there any way to remove duplicate values in splunk enterprise?
I have tried using dedup, but it gives only what is duplicated.
I want to grab only the results which are unique.
Thanks in advance.

| dedup _raw
Do this after your base search.
This will remove duplicate raw events

If you have permission to delete command you can remove duplicate data by piping (|) a search to the delete command.
This should be the opposite of dedup:
... | eventstats max(_indextime) AS latestIndexTime by source|where_indextime
Or you can just search for latest results.
... | eval _time=(_indextime) | stats latest(*) by source

Related

Splunk: How to extract field directly in Search command using regular expressions?

I have some log files which looks like this one:
2020-11-18 00:11:22.333 INFO [ABC_service,[{"method":"doSomething","id":"123456789","jsonrpc":"2.0","params":{"taskType":"certainType","clientNotificationInfo":{"priority":xy,"expirationDate":111111111},"priority":xy,"deviceId":"000000000000000","taskPayload":{},"timeout":22222222}}, XYZ]
I now would like to extract fields directly in my search and make a table of the extracted values. I would like to extract the taskType, here: certainType. Now, I was wondering about how to do this.
I tried this command:
source="/log/ABCDE/ABCDE_service.log" doSomething | rex field=_raw "taskType: (?<taskType>.*)" | table taskType
But got an empty table. What is wrong here?
But I got an empty table for both values.
You have the right idea, but the regular expression in the rex command does not match the sample data. Try this.
source="/log/ABCDE/ABCDE_service.log" doSomething
| rex field=_raw "taskType\\\":\\\"(?<taskType>[^\\\"]+)"
| table taskType
The extra backslashes are needed for the multiple layers of escaping needed to get the quotation marks into the regex processor.
BTW, I like to use regex101.com to test regular expressions.

get json key of postgres column containing a specific word

I'm trying to select a key from my db and set its value in a json column with postgres keys are finishing with "_alert".
So in my bd I have a column named data as a json and i just want the keys finishing with "_alert" like "ram_alert", "temperatures_alert", "disk_alert", "cpu_alert".
So I need to get the key and the value to compare with the data I have in my backend app to validate if I need to update the value or dont.
How to do this?
I get all the keys doing select json_object_keys(data) from devices but how to get the key/value pair.. is there a way to use the "like" expression here?
First off, note that your current query will only work if you have one tuple in your 'devices' table. Try inserting another row and you'll get:
ERROR: cannot call json_object_keys on an array
If you're certain that you're only ever going to have ONE result from this table, then the following query should give you what you want:
SELECT key,value FROM devices,json_each(devices.data) where key ~ '_alert$';
I'd still throw something like "LIMIT 1" onto your query to be safe.

Delete duplicate rows in SAS

I am trying to delete duplicate rows from a csv file using SAS but haven't been able to do so. My data looks like-
site1,variable1,20151126000000,22.8,140,1
site1,variable1,20151126010000,22.8,140,1
site1,variable2,20151126000000,22.8,140,1
site1,variable2,20151126000000,22.8,140,1
site2,variable1,20151126000000,22.8,140,1
site2,variable1,20151126010000,22.8,140,1
The 4th row is a duplicate of the 3rd one. This is just an example, I have more than a thousand records in the file. I tried doing this by creating subsets but didn't get the desired results. Thanks in advance for any help.
I think you can use nodupkey for this, just reference your key, or you can use _all_ -
proc sort data = file nodupkey;
by _all_;
run;
In this article you find different options to remove duplicate rows: https://support.sas.com/resources/papers/proceedings17/0188-2017.pdf
If all columns are sorted the easiest way is to use the option noduprecs:
proc sort data = file noduprecs;
by some_column;
run;
In contrast to the option nodupkey no matter which column or columns you state after the by it will always remove duplicate rows based on all columns.
Edit: Apparently, all columns have to be sorted (-> have a look at the comment below).

Bash - Replacing or Removing first occurence in string bash

I know this is a very common question. But I've looked through SO and google but I'm still unable to remove the first occurance or first few characters of a string which I retrieved from mysql.
I've got a table like this.
Name | Age
James| 27
My sql query (Select name from Human)
Once I execute the command from my bash script, it retrieves the row name (Name) and entry (James) in this form Name James when I only need James. How do I remove the word Name (including the space after it)? I want to store the result in a string and not echo it.
I think what you want is the mysql command line option --skip-column-names which will prevent the first line containing the column names to be returned with the SELECT query. Try that.

How to extract all IDs accessed from a mysql general log using the linux commandline?

This should be a trivial question for anyone who's good with bash/sed/awk. Unfortunately, I'm not, yet :)
I've got a general log from MySQL which contains some queries that have a common parameter, they query on a specific id field.
The queries look like
update tbl set col='binary_values' where id=X;
I need to process the log and extract all the IDs that these queries touched, each in it's own line.
The purpose of this is to figure out how many times each ID is accessed. Eventually I'd group and count the values.
The binary values are indeed binary junk, so they kinda messed up some things I've been trying to do.
Eventually we solved the problem temporarily using a python script, but I'm sure the linux command line tool set can do it too. How would you do it?
Update (example of a query in the log):
5999 Query update tbl set col='<AC><ED>\0^Ez\0\0^AaESC\0\0\0^D}k<85><F4>\0\0
c\0\0\0\0\0\0\0\0\0\0\0\0\0^A\0\0\0^A\0^A\0\0\0^A\0^A\0\0\0^A\0\0\0\0\0\0\0\0\0\0\0^A\0\0\0^Z^E^A<F6><DE>^A\0^A<F7><DE>^A\0^A<F8><DE>^A\0^A<F9><DE>^A\0^A<FE><DE>^A\0\0\0\0\0^A\0\0\0Q^E^C<C4>^O^A\0?<80>\0\0\0�°<C2><EA><D2>%^C<CB>^O^A\0?<80>\0\0\0�«<9C><CD><CC>%^C<EA>^Y^A\0?<80>\0\0\0�°<C2><EA><D2>%^C<90>^L^A\0?<80>\0\0\0�°<C2><EA><D2>%^C<F6>^L^A\0?<80>\0\0\0�«<9C><CD><CC>%\0^A\0\0\0T^E^D^A\0^A<83><D2>|^A<C4>^O\0�<D3>�³%^D^B\0^A�<B5>^B^A<F5>^K^A^R�<B2>�³%^D^A\0^A<FA>^L\0\0<AE><96><B1>�³%^D^A\0^A<F7>^W^A<90>^L^AESC<96><FC><B1>�³%^D^A\0^A^T^A<EA>^Y^A^F<F5>�±�³%\0\0\0\0\0\0\0^A\0\0\0^U^A^B\0\0\0\0\0\0^O9\0\0^A+<<87>u<E0>^A<85>^B^A\0\0\0^_^B^A^F^A\0?<80>\0\0\0�°<C2><EA><D2>%^AESC^A\0?<80>\0\0\0�°<C2><EA><D2>%\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0' where id=19284067828
The binary junk contains linebreaks as well as "=" characters, and makes it harder to use "cut".
Based on your example log entry, this might work:
sed -n 's/.*update tbl set col=.*where id=\([0-9]\+\)$/\1/p' file.log
To count the occurrence of each id, you can pipe the output to sort and uniq
sed -n 's/.*update tbl set col=.*where id=\([0-9]\+\)$/\1/p' file.log | sort | uniq -c
It'd help if you copy pasted a couple if these queries, especially regarding the binary junk you talk about. but e.g.
grep "update tbl set col" yourfile.log | cut -d '=' -f 3 | sort | uniq -c
I would think create a mod_date timestamp column , default 0, on update set to current_timestamp (see this)
To get list of updated records, you can easily find by filtering on mod_date
mod_date!=0 --> bindly get count
mod_date!=0 and mod_date>='since last log date'