I'm trying to index a very small (6 rows) table into solr, and it says that it's added/updated 6 documents, but it doesn't return anything when I search for a field. My table is as follows
League:
field | type |
---------------------
id | int |
leaguename | string|
and here is what solr prints when I try to do the full-import
03data-config.xmlfull-importidle1602011-07-13 19:11:42Indexing completed. Added/Updated: 6 documents. Deleted 0 documents.
2011-07-13 19:11:422011-07-13 19:11:4260:0:0.120
This response format is experimental. It is likely to change in the future.
Is there a way that I can view the values that index is holding? I've tried looking in the data folder in solr, but all the files just seem to have strange non-alphanumeric characters in them.
Luke is a desktop app which allows you to examine an index, run queries and generally muck around.
If the index is remote you will first need to transfer it to your desktop, then just open it in Luke.
http://code.google.com/p/luke/
Luke rocks!
assuming you ar erunning solr on localhost and port 8983 as per standard example
you can do a wildcard query like
http://localhost:8983/solr/select?q=*:*
This will return all the documents with all stored fields.
From the admin screen query a wildcard on whatever your unique key field is.
Uniquekeyfieldname:*
That will get you a count to see if something got indexed. If you want to see all fields too then specify the field list at the end of the query strong
&FL=*
Related
I have an application that let's devices communicate over MQTT.
When two (or more devices) are paired, they are in a session (with a session-id)
The topics are for example:
session/<session-id>/<sender-id>/phase
with a payload like
{'phase': 'start', 'othervars': 'examplevar'}
Every session is logged into a mySQL database into the following format:
| id | session-id | sender | topic (example: phase) | payload | entry-time | ...
Now, when I just want to get a whole session I can just query by session-id.
Another view I want to achieve looks like this:
| session-id (distinct) | begin time | end time | duration | success |
Success is a boolean; true when in the current session there is an entry where the payload has a 'phase':'success'. Otherwise it is not successful.
Now I have the problem that this query is very slow. Everytime I want to access it, it has to calculate for each session if it was successful, along with the time calculation.
Should I make a script at the end of a session, to calculate this information and put it in another table? The problem I have with this solution is that I will have duplicate data.
Can I make this faster with indexes? Or did I just make a huge design mistake from the beginning?
Thanks in advance
Indexes? Yes. YES!
If session-id is unique, get rid of id and use PRIMARY KEY(session_id).
success could be TINYINT NOT NULL with values 0 or 1 for fail or success.
If the "payload" is coming in as JSON, then I suggest storing the entire string in a column for future reference, plus pull out any columns that you need to search on and index them. In later versions of MySQL, there is a JSON datatype, which could be useful.
Please provide some SELECTs so we can further advise.
Oh, did I mention how important indexes are to databases?
eI'd like to create custom primary keys in my Access database.
The database is going to be multi-user, so I need a method that ensures each key is unique even when multiple users are trying to add new records to the same tables.
The reason I need to create custom primary keys is because my database starts off an audit trail that goes in to another, external system that I have no control over.
This other system does however allow the use of a single 12-character length user-defined field for us to pass data of our choice through.
I'd like to use that user-defined field to record a 12-character code that has various abbreviations I can extrapolate later (e.g. first 2 characters relate to a department in our organisation, next 3 characters relate to a product and so on...)
From the reading I've done so far, custom keys in Access seems to be something of a minefield.
For my purposes though, I can kind of see at least a compromise in combining Access' autonumber field to essentially help build the primary key I want.
Here's what I was thinking:
The parts of the code that I would want to extrapolate later can be built by our users, so for example, if the Department was Human Resources, the first 2 characters could always be "HR".
Then lets say I let the AutoNumber in access run for a field in the same table in which my "HR" entry was populated... could I get a third field to automatically concatenate the 2 in the same table (not query)...? i.e. like this:
| Department | AutoNumber | CustomPrimaryKey |
| HR | 1 | HR1 |
If that's something that can be done on some event in VBA, then that would be great (show me the code! :))
The second part would be whether I can get the autonumber to concatenate with leading zeros ensuring the "unique number" part of the custom primary key was between 99999 and 00001, i.e. occupying the same 5 character space like this:
| Department | AutoNumber | CustomPrimaryKey |
| HR | 1 | HR00001 |
| HR | 2 | HR00002 |
It is highly unlikely that I would need more than 100,000 entries.
I hope this is possible and safe!
I'd rather leave this as a comment than an answer as I don't think you're totally clear on what you need, but I'll try to answer as best as possible. Also, I'm not going to "Show you the code!" as you suggest as it teaches nothing.
In the first question of automatically concatenating the third field, it's really a question of how the fields are being populated.
If it's through form input, then you can concatenate all of the component fields into the key field during the update events of the controls those component fields are being populated. In VBA you can easily reference members of the record by accessing the form's recordset.
If you're populating the field through a file import where you already have import specs, then you would perform the import excluding your key field, then open the recordset of the table where you imported and iterate through the recordset. You can learn about ADO recordsets here. Again, I'm not just going to write the code because I don't really know what you need this for.
If you're populating the field through your own parser than I probably don't have to explain how to do this.
To your second question, you can easily right align a number in a string using the format() function. For example format(2,"00000") would yield "00002" and format(210,"0000") would yield "0210". You can also make the number of 0s in which you want to align variable using the string() function. For example format(2054,string(12-len("HR"),"0")) would give you "0000002054"
One additional note that I would leave you on is that it's never a good idea to say something like "It is highly unlikely that I would..." and not prepare for it. Murphy's Law is a pain in the B. You should consider handing conditions where you exceed the limit that your key can handle.
I am looking for a means to relate/find similarities in MSSSQL full text documents to themselves in the same table/column or separately to documents in a different table. I already have the table full text indexed but now need to understand how I might go about finding the documents that are similar. Consider the following example:
I have a ticketing system that stores customer issues:
TicketID TextColumn
1234 My computer has a virus.
1235 My mouse is broken.
1236 There is a virus in my computer
As a new ticket is filed, I want to use the TicketID number say 1236 which I could pass into a stored procedure and see which of the other documents in the table are similar and return them in some sort of ranking order based on highest similarity. So in this case 1236 more closely matches 1234.
All I am finding in my research is the ability to search on specific words or phrases, but I want to pass in the entire document's contents. Does anyone know the method to do this?
I'm working on a URL shortener project with PHP & MYSQL which tracks visits of each url. I've provided a table for visits which mainly consists of these properties :
time_in_second | country | referrer | os | browser | device | url_id
#####################################################################
1348128639 | US | direct | win | chrome | mobile | 3404
1348128654 | US | google | linux | chrome | desktop| 3404
1348124567 | UK | twitter| mac | mozila | desktop| 3404
1348127653 | IND | direct | win | IE | desktop| 3465
Now I want to make a query on this table. for example I want to get visits data for the url with url_id=3404. Because I should provide statistics and draw graphs, for this url, I need these data:
Number of each kind of OS for this URL , for example 20 windows, 15 linux , ...
Number of visits in each desired period of time , for example each 10 minutes in past 24 hour
Number of visits for each country
...
As you see, some data like country may accept lots of different values.
One good idea which I can imagine is to make query which outputs number of each unique value in each column, for example in the country case for the data given above, on column for num_US, one for num_UK, and one for num_IND.
Now the question is how to implement such a high-performance query in sql (MYSQL) ?
Also if you think this is not an efficient query for performance, what's your suggestion?
Any help will be appreciated deeply.
UPDATE: look at this question : SQL; Only count the values specified in each column . I think this question is similar to mine , but the difference is in variety of values possible (as lots of values are possible for country property) for each column which makes the query more complex.
It looks like you need to do more than one query. You probably could write one query with different parameters but that would make it complex and hard to maintain. I would approach it as multiple small queries. So for each requirement I make a query and call them separately or individually. For example, if you want the country query you mentioned, you could do the following
SELECT country, count (*) FROM <TABLE_NAME> WHERE url_id = 3404 GROUP BY Country
By the way, I have not tested this query, so it may be inaccurate, but this is just to give you an idea. I hope this helps.
Also, another suggestion is to use Google Analytics, look into it, they do have a lot of what you already are implementing, maybe that helps as well.
Cheers.
Each of these graphs you want to draw represents a separate relation, so my off-the-cuff response is that you can't build a single query that gives you exactly the data you need for every graph you want to draw.
From this point, your choises are:
Use different queries for different graphs
Send a bunch of data to the client and let it do the required post-processing to create the exact sets of data it needs for different graphs
farm it all out to Google Analytics (a la #wahab-mirjan)
If you go with option 2 you can minimize the amount of data you send by counting hits per (10-minute, os, browser, device, url_id) tupple. This essentially removes all duplicate rows and gives you a count. The client software would take these numbers and further reduce them by country (or whatever) to get the numbers it needs for a graph. To be honest though, I think you're buying yourself extra complexity for not very much gain.
If you insist on doing this yourself (instead of using a service) then go with a different query for each kind of graph. Start with a couple of reasonable indexes (url_id and time_in_second are obvious starting points). Use the explain statement (or whatever your database provides) to understand how each query is executed.
Sorry, I am new to Stack Overflow and having a problem with comment formatting. Here is my answer again, hopefully it workds now:
Not sure how it is poor in performance. The way I am thinking is you will end up with a table that looks like this:
country | count
#################
US | 304
UK | 123
UK | 23
So when you group by country, and count, it will be one query. I think this will get you going in the right direction. In any case, it is just an opinion, so if you find another approch, I am interested in knowing it as well.
Apologies about the comment messup up there..
Cheers
I have the following data which I want to save in my DB (this is used for sending text messages via a 3rd party API)
text_id, text_message, text_time, (array)text_contacts
text_contacts contains a normal array with all the contact_id's
How should I properly store the data in a MySQL database?
I was thinking myself either on 2 ways:
Make the array with contact_id's in a json_encoded (no need for serializing since it's not multi-dimensional) string, and store it in a text field in the DB
Make a second table with the text_id and all contact_id's on a new row..
note: The data stored in the text_contacts array does not need to be changed at any time.
note2: The data is used as individual contact_id to get the phone number from the contact, and check whether the text message has actually been sent.. (with a combination of text_id, and phonenumber)
What is more efficiƫnt, and why?
This is completely dependent upon your expected usage characteristics. If you will have a near-term need to query based upon the contact_ids, then store them independently as in your second solution. If you're storing them for archival purposes, and don't expect them to be used dynamically, you're as well off saving the time and storing them in a JSON string. It's all about the usage.
IMO, go with the second table, mapping text-ids to contact-ids. Will be easier to manipulate than storing all the contacts in one field
This topic will bring in quite a few opinions, but my belief: second table, by all means.
If you ever have a case where you actually need to search by that data, it will not require you to parse it before using it.
It is a heck of a lot easier to debug (for the same reason)
json_encode and json_decode (or equivalent) take far more time than a join does.
Lazy loading is easier, even if not necessary in most cases.
Others will find it more readable and, with a good schema definition, easier to conceptualize and maintain.
Almost all implementations would use one table for storing each text_contacts, and then a second table would use a foreign key to reference the text_contacts table. So, if say you had a table text_contacts that looked like this:
contact_id | name
1 | someone
2 | someone_else
And a text message table that looked like this:
text_id | text_message | text_time | text_contact
1 | "Hey" | 12:48 | 1
2 | "Hey" | 12:48 | 2
Each contact that has been sent a message would have a new entry in the text message table, with the last column referencing the contact_id field of the text_contacts table. This way makes it much easier to retrieve messages by contact, because you can say "select * from text_messages where text_contact = 1" instead of searching through each of the arrays on the single table to find the messages sent by a specific user.