How to count number of ids in a file - json

So I have a huge file containing hundreds of thousands lines. I want to know how many different sessions or ids it contains. I really thought it wouldn't be that hard to do, but I'm unable to find a way.
Sessions look like this:
"session":"1425654508277"
So there will be a few thousand lines with that session, then it will switch, not necessarily incrementing by one, at all, I don't know the pattern if there's one. So I just want to know how many sessions appear in the document, how many are different between each other (they SHOULD be consecutive but it's not a requirement just something I noticed).
Is there an easy way to do this? Only things I've found even remotely close are excel macros and scripts, which lead me to think I'm not asking the right questions. I also found this: Notepad++ incrementally replace but it does not help in my case.
Thanks in advance.

Consider using jq. You can extract session with [.session], then apply unique, then length.
https://stedolan.github.io/jq/manual/
I am no jq expert, and have not tested this, but it seems that the program
unique_by(.message) | length
might give you what you want.

According to your profile, you know JavaScript, so you can use that:
Load the file.
Look for session. (If this is JSON, this could be as simple as myJson['session'].)
Keyed on session value, add to a map, e.g. myCounts[sessionValue] = doesNotMatter.
Count the number of keys in the map.
There are easier ways, like torazaburo's suggestion to use cat data | uniq | wc, but it doesn't sound like you want to learn Unix, so you may as well practice your JavaScript (I do this myself when learning programming languages: use it for everything).

You won't be able to achieve this with notepad++, but you can use a linux command shell command, i.e.:
cat sessions.txt | uniq | wc

Adding to my own question, if you manage to get the strings you want separated by columns in Excel, Excel has an option to Filter which automatically gives you the different values to filter a column by.
This means, applied to my case, if I get the key-value ("session":"idSession", the 100000 values each in a row), all of it in one column, filter, count manually, I get the number of different values.
Didn't get to try the wc/unix option because I found this while trying to apply the other method

Related

Analysing a number sequence in MySQL

This is certainly not a programming questions, but I hope someone is able to help.
I've been working with tha dataset / mysql database with around 1,000,000 records looking for patterns on it, avarage values, and generally, any kind of indicator.
After I've been doing queries manually for around 2 months already, I'm wondering if there's any kind of software that would execute different queries by itself?
Such as:
Make groups of 5 rows and sum values.
Look for the lowest value in a group,
Look for specific sequences...
Does anyone have any clue how I could approeach this search?
I think it's important to mention that I have done a few algorithms based on query results using PHP, and Python and Keras. But I'm trying to find new patterns more than making something which the ones I already have.
Kind regards;
Chris

Obtaining the average of one field with comma delimited values (InfoPath)

I have a field where the user enters multiple values each separated by a comma eg "1.8, 2, 3".
I want to find the average of those values. Is there a way to utilise avg() to accommodate for stripping the comma and producing the mean?
Unfortunately you can't do that with the built in InfoPath functions (there is no traditional split method for strings).
If you are willing to tackle it - using managed code behind the form will very easily solve your problem (only about 4 lines of code). Basic math and string manipulation should not impose any security restrictions on the form. However you will have to setup for code behind which is easy but can seem like somewhat of a hassle the first time you try it. There are good MSDN articles on how to go about that.
Alternatively, if you can change your data entry from comma separated to a repeating table you can use the built in avg() function.

parse rows based on columns

I have a column which stores row information in a concatenate format such as:
LOCATION_COLUMN
[Country]*[city]*[town]
I want to extract the country )before the first aster ix part of it, [Country], while eliminating the [city][town]. I could complete this using a scripting language such as php which has great parsing tools, but if I could I like to complete the task inside of mysql. Looking at documentation I can't find any parsing techniques (there seem to be some parsing plugins for mysql), but I would like to keep the database as it is as it is used for many other things. Native sql syntax such as the commands LIKE I do not believe as be used as you need to know the characters you are looking for, but in my case I am not looking for specific characters, just want to extract the country parts. Is this somehow possible?
Assuming you mean that the components are separated by asterisks, you can use:
select substring_index(location_column, '*', 1) as country

Store Miscellaneous Data in DB Table Row

Let's assume I need to store some data of unknown amount within a database table. I don't want to create extra tables, because this will take more time to get the data. The amount of data can be different.
My initial thought was to store data in a key1=value1;key2=value2;key3=value3 format, but the problem here is that some value can contain ; in its body. What is the best separator in this case? What other methods can I use to be able to store various data in a single row?
The example content of the row is like data=2012-05-14 20:07:45;text=This is a comment, but what if I contain a semicolon?;last_id=123456 from which I can then get through PHP an array with corresponding keys and values after correctly exploding row text with a seperator.
First of all: You never ever store more than one information in only one field, if you need to access them separately or search by one of them. This has been discussed here quite a few times.
Assuming you allwas want to access the complete collection of information at once, I recommend to use the native serialization format of your development environment: e.g. if it is PHP, use serialze().
If it is cross-plattform, JSON might be a way to go: Good JSON encoding/decoding libraries exist for something like all environments out there. The same is true for XML, but int his context the textual overhead of XML is going to bite a bit.
On a sidenote: Are you sure, that storing the data in additional tables is slower? You might want to benchmark that before finally deciding.
Edit:
After reading, that you use PHP: If you don't want to put it in a table, stick with serialize() / unserialize() and a MEDIUMTEXT field, this works perfectly, I do it all the time.
EAV (cringe) is probably the best way to store arbitrary values like you want, but it sounds like you're firmly against additional tables for whatever reason. In light of that, you could just save the result of json_encode in the table. When you read it back, just json_decode to get it back into an array.
Keep in mind that if you ever need to search for anything in this field, you're going to have to use a SQL LIKE. If you never need to search this field or join it to anything, I suppose it's OK, but if you do, you've totally thrown performance out the window.
it can be the quotes who separate them .
key1='value1';key2='value2';key3='value3'
if not like that , give your sql example and we can see how to do it.

Pathing in a non-geographic environment

For a school project, I need to create a way to create personnalized queries based on end-user choices.
Since the user can choose basically any fields from any combination of tables, I need to find a way to map the tables in order to make a join and not have extraneous data (This may lead to incoherent reports, but we're willing to live with that).
For up to two tables, I already managed to design an algorithm that works fine. However, when I add another table, I can't find a way to path through my database. All tables available for the personnalized reports can be linked together so it really all falls down to finding which path to use.
You might be able to try some form of an A* algorithm. Basically this looks at each of the possible next options to choose and applies a heuristic to it, a function that determines roughly how far it is between this node and your goal. It then chooses the one that is closer and repeats. The hardest part of implementing A* is designing a good heuristic.
Without more information on how the tables fit together, or what you mean by a 'path' through the tables, it's hard to recommend something though.
Looks like it didn't like my link, probably the * in it, try:
http://en.wikipedia.org/wiki/A*_search_algorithm
Edit:
If that is the whole database, I'd go with a depth-first exhaustive search.
I thought about using A* or a similar algorithm, but as you said, the hardest part is about designing the heuristic.
My tables are centered around somewhat of a backbone with quite a few branches each leading to at most a single leaf node. Here is the actual map (table names removed because I'm paranoid). Assuming I want to view data from tha A, B and C tables, I need an algorithm to find the blue path.