I'm trying to remember the word for a function that can be applied multiple times to the same data set but only makes its change once. I've got a Rails migration that sets some data, but only if that data isn't set. So if it is run multiple times it only does work once.
Are you thinking about the word "idempotent"? (I'm frantically looking for my Category Theory book ...)
Related
Okay, so first of all let me tell a little about what I'm trying to do. Basically, during my studies I wrote a little webservice in PHP that calculates how similar movies are to each other based on some measurable sizes like length, actors, directors, writers, genres etc. The data I used for this was basically a collection of data accquired from omdbapi.com.
I still have that database, but it is technically just a SINGLE table that contains all the information to each movie. This means, that for each movie all the above mentioned parameters are divided by commas. Therefore I have so far used a query that encapsulates all these things by using LIKE statements. The query can become quite large as I will pretty much query for every parameter within the table, sometimes 5 different LIKE statements for different actors, the same for directors and writers. Back when I last used this, it took about 30 to 60 seconds to enter a single movie and receive a list of 15 similar ones.
Now I started my first job and to teach myself in my freetime, I want to work on my own website. Because I have no real concept for what I want to do with it, I thought I'd get out my old "movie finder" again and use it differently this time.
Now to challenge myself, I want the whole thing to be faster. Understand, that the data is NEVER changed, only read. It is also not "really" relational, as actor names and such are just strings and have no real entry anywhere else. Which essentially means having the same name will be treated as the same actor.
Now here comes my actual question:
Assuming I want my select queries to operate faster, would it make sense to run a script that splits the comma divided strings into extra tables (these are n to m relations, see attempt below) and then JOIN all these tables (they will be 8 or more) or will using LIKE as I currently do be about the same speed? The ONLY thing I am trying to achieve is faster select queries, as there is nothing else to really do with the data.
This is what I currently have. Keep in mind, I would still have to create tables for the relation between movies + each of these tables. After doing that, I could remove the columns in the movie table and would end up having to join a lot of tables with EACH query. The only real advantage I can see here, is that it would be easier to create an index on individuals tables, rather than one (or a few) covering the one, big movie table.
I hope all of this even makes sense to you. I appreciate any answer short or long, like I said this is mostly for self studies and as such, I don't have/need a real business model.
I don't understand what you currently have. It seems that you only showd the size of tables but not its internal structure. You need to separate data into separate tables using normalization rules and then put correct indexes. Indexes will make your queries very fast. What does the sizing above your query mean? Have you ever run EXPLAIN ANALYZE for you queries, and please post the query I cannot guess your query out of the result. There are a lot of optimization videos on YT.
I have a tutoring website with a search feature. I want tutors to appear on the list according to several weighted criteria, including whether or not they are subscription holders, if they have submitted a profile photo, if they have included a lot of information about themselves, etc...
Basically, I have a lot of criteria by which I would like to weigh their rank.
Instead of writing a complicated SQL query with multiple ORDER BYs (if this is even possible), I was thinking of creating a table (maybe a temporary one), that assigns numerical values based on several criteria to come up with a final search rank.
I'm not entirely sure about how to go about this, or if this is a good idea, so I would like to know what the community thinks about a) this method, and b) possible ways of implementing this in SQL.
I would add a field to one of the existing tables that more or less was a representation of their "weight" for sorting purposes. I would then populate this column with a database procedure that ran every so often (you could make a queue that only runs on records that have been updated, or just run it on all records if you want). That way, I can just pull back the data and order by one column instead of multiple ones.
Also, you could use a View. It really depends on if you want to number crunching to be done by the procedure or by the database every time you pull data (for a search feature and for speed's sake, I'd suggest the database procedure).
I'll try to make it easy by explaining an example.
So, I consolidate data from two sources, namely 1 and 2. In each of the sources, it has a column "number" that has unique values within a source. But when A and B are consolidated (they have to be), it cannot be checked that they are unique. However, when consolidating 1 and 2, I created a column name "source" and tagged it with its source name (1 or 2). Therefore, if I want to look for a certain specific "number" I submit a query that looks for the desired number AND source.
Is there a better way to do this? It is working just fine because my database is small, but will this work well (i.e. fast, efficiently, etc.) as the DB grows? I mean, it won't have one million entries in the next few years, but I'd still like to perform it in a optimal manner.
The only other way I can think about is to keep separate "number" columns for different sources and query the appropriate columns.. but this will require additional columns to be added as I get additional sources. Hm.. what to do?
Your method should work just fine without causing any perceivable slow downs, if any at all.
In my Rails project, I have several kinds of content types, let's say "Article," "Interview," and "Review."
In planning my project, I decided to go with separate tables for all of this (multiple tables vs single table), namely because while these content types do share some fields in common (title, teaser, etc), they also have fields unique to each content type.
I managed to get all of this working fine in my project, but hit some roadblocks doing so. Namely let's say I want to show all of these content types TOGETHER in order of their publish date (like on a categories page) and I just need to display the fields (title, teaser, etc) that they all have in common...this is problematic.
I went with the has_many_polymorphs plugin for a little while, but it was never officially updated for Rails 3 and I was using a forked branch 'hacked' together for R3. Anyway, it cause problems and spat out weird error messages now and then, and after one of my updates broke the plugin, I decided it was a good idea to abandon it.
Then I looked for a plugin that could perform SQL like UNION joins, but that plugin hasn't been updated since 2009 so I passed that by.
What I ended up doing was making three different queries for each table, then using Ruby to combine the results. (#articles + #reviews +, etc) Not efficient, but I need it to work (and these areas will likely be cached eventually).
However, it 'would' be easier if I could just make queries on a single table, and there is still time to restructure my schema to roll out with the most efficient way to do this. So I'm now wondering if STI is indeed the way to go, and if so, how would I get away with this with different columns on both sides?
UPDATE
I've been thinking heavily about this...namely how I can use STI but have some unique fields for each type. Let's say I have a 'Content' table where I have my shared columns like title, teaser, etc...then I have separate tables like Article, Review where those tables would have my unique columns. Couldn't I technically make those nested attributes of the Content table, or actually the Content table's specific types?
UPDATE 2
So apparently according to what I researched, STI is a bad idea if you need different columns on the different tables, or if you need different controllers for the different content types. So maybe sticking with multiple tables and polymorphs is the right way. Just not sure I'm doing it as effectively as I should be.
UPDATE 3
Thinking mu has the right idea and this is one of those times where the query is so complex that it requires find_by_sql, I created a 'Content' model and made the following call...
#recent_content = Content.find_by_sql("SELECT * FROM articles UNION SELECT * FROM reviews ORDER BY published_at DESC")
And it sort of works...it's returning all types with a single query indeed. The remaining trouble is that because I'm calling ActiveRecord on Content, the following don't work in my collection partial returning recent content...
<%= link_to content.title, content %>
Namely the link path, "content_path doesn't exist." It's tricky because each result returns as a "Content" object, rather than what kind of object it really is. That's because I'm starting the above query with 'Content' of course, but I have to put "something" there to make the activerecord query work. Have to figure out how to return the appropriate path/object type for the different content types, /articles/, /reviews/, etc...
Querying them separately is not as bad as you think. If they have different columns and you need to show different columns in your view, I don't think you can combine them using sql union. For performance issue, you can use cache or optimize your table.
I recently forked a promising project to implement multiple table inheritance and class inheritance in Rails. I have spent a few days subjecting it to rapid development, fixes, commenting and documentation and have re-released it as CITIER Class Inheritance and Table Inheritance Embeddings for Rails.
I think it would do exactly what you need.
Consider giving it a look: http://peterhamilton.github.com/citier
I am finding it so useful! I would (by the way) welcome any help for the community in issues and testing, code cleanup etc! I know this is something many people would appreciate.
Please make sure you update regularly however because like I said, it has been improving/evolving by the day.
Hallo all.
I need to run the 'replace([column], [new], [old])' in a query executing on n Access 2003 DB. I know of all the equivalent stuff i could use in SQL, and believe me I would love to, but i don't have this option now. I'm trying to do a query where all the alpha chars are stripped out of a column ie. '(111) 111-1111' simply becomes '1111111111'. I can also write an awsum custom VBA function and execute the query using this, but once again, can't use these functions through JET. Any ideas?
Thanx for the replies guys. Ok let me clarify the situation. I'm running an .NET web application. This app uses an Access 2003 db. Im trying to do an upgrade where I incorporate a type of search page. This page executes a query like: SELECT * FROM [table] WHERE replace([telnumber], '-', '') LIKE '1234567890'. The problem is that there are many records in the [telnumber] column that has alpha chars in, for instance '(123) 123-1234'. This i need to filter out before i do the comparison. So the query using a built in VBA function executes fine when i run the query in a testing environment IN ACCESS, but when i run the query from my web app, it throws an exception stating something like "Replace function not found". Any ideas?
Based on the sample query from your comment, I wonder if it could be "good enough" to rewrite your match pattern using wildcards to account for the possible non-digit characters?
SELECT * FROM [table] WHERE telnumber LIKE '*123*456*7890'
Your question is a little unclear, but Access does allow you to use VBA functions in Queries. It is perfectly legal in Access to do this:
SELECT replace(mycolumn,'x','y') FROM myTable
It may not perform as well as a query without such functions embedded, but it will work.
Also, if it is a one off query and you don't have concerns about locking a bunch of rows from other users who are working in the system, you can also get away with just opening the table and doing a find and replace with Control-H.
As JohnFx already said, using VBA functions (no matter if built in or written by yourself) should work.
If you can't get it to work with the VBA function in the query (for whatever reason), maybe doing it all per code would be an option?
If it's a one-time action and/or not performance critical, you could just load the whole table in a Recordset, loop through it and do your replacing separately for each row.
EDIT:
Okay, it's a completely different thing when you query an Access database from a .net application.
In this case it's not possible to use any built-in or self-written VBA functions, because .net doesn't know them. No way.
So, what other options do we have?
If I understood you correctly, this is not a one-time action...you need to do this replacing stuff every time someone uses your search page, correct?
In this case I would do something completely different.
Even if doing the replace in the query would work, performance wise it's not the best option because it will likely slow down your database.
If you don't write that often to your database, but do a lot of reads (which seems to be the case according to your description), I would do the following:
Add a column "TelNumberSearch" to your table
Every time when you save a record, you save the phone number in the "TelNumber" column, and you do the replacing on the phone number and save the stripped number in the "TelNumberSearch" column
--> When you do a search, you already have the TelNumberSearch column with all the stripped numbers...no need to strip them again for every single search. And you still have the column with the original number (with alpha chars) for displaying purposes.
Of course you need to fill the new column once, but this is a one-time action, so looping through the records and doing a separate replace for each one would be okay in this case.