Query CSV File/general database questions - csv

OK so I'm kinda new to databases in general. I understand the basic theory behind them and have knocked up the odd Access DB here and there.
One thing I'm struggling to learn about is the specifics of how e.g. an SQL query accesses a database.
So say you have a scenario where there's a database on a LAN server (let's say it's MS Access for arguments sake). You run some SQL query or other on it from a client machine. Does the client machine have to download the entire database to run said query (even if the result of the query is just one line)? Or does it somehow manage to get just the data it wants to come down the ol' CAT5? Does the server have to be running anything to do that? Can't quite understand how the client could get JUST the query results without the server having to do some of the work...
I'm seeing two conflicting stories on this matter when googling stuff.
And so this follows on the next question (which may already be answered): if you CAN query a DB without having to get the whole damn thing, and without the server running any other software, can the same be done with a CSV? If not, why not?
Reason I ask is I'm developing an app for a mobile device that needs to talk to a db or CSV file of some kind, and it'll be updating records at a pretty high rate (barcode scanning), so don't want the network to grind to a halt (it's a slow bag of [insert relevant insult] as it is). The less data travelling from device to server, the better.
Thanks in advance

The various SQL servers are just that: a server. It's a program that listens for client queries and sends back a response. It is more than just its data.
A CSV file, or "flat file" is just data. There is no way for it to respond to a query by itself.
So, when you are on a network, your query is sent to the server, which does the work of finding the appropriate results. When you open a flat file, you're using the network and/or file system to read/write the entire file.
Edit to add a note about your specific usage. You'll probably want to use a database engine, as the queries are going to be the least amount of network traffic. For example, when you scan a barcode, your query may be as simple as the following text:
INSERT INTO barcode_table ('code', 'scan_date', 'user') VALUES ('1234567890', '2011-01-24 12:00:00', '1');
The above string is handled by the database engine and the code (along with whatever relevant support data) is stored. No need for your application to open a file, append data to it, and close it. The latter becomes very slow once files get to a large size, and concurrency can become a problem with many users accessing it.
If your application needs to display some data to your user, it would request specific information the same way, and the server would generate the relevant results. So, imagine a scenario in which the user wants a list of products that match some filter. If your products were books, suppose the user requested a list by a specific author:
SELECT products.title, barcode_table.code
FROM products, barcode_table
WHERE products.author = 'Anders Hejlsberg'
ORDER BY products.title ASC;
In this example, only those product titles and their barcodes are sent from the server to the mobile application.
Hopefully these examples help make a case for using a structure database engine of some kind, rather than using a flat file. The specific flavor and implementation of database, however, is another question unto itself.

Generally speaking, relational databases are stored on a remote server, and you access them via a client interface. Each database vendor has software that you'd install on your remote computer that would allow you to access the database on a server. The entire DB is not sent back to the client when a query is executed, although it can send very large result sets if you are not careful about how to structure your query. Generally speaking the flow is like this:
A database server listens for clients to connect
A client connects and issues a SQL
command to the database
The database builds a query plan to
figure out how to get the result
The plan is executed and the results
are sent back to the client.
CSV is simply a file format, not a fully functional platform like a relational database.

Related

Sync multiple local databases to one remotely

I need to create a system with local webservers on Raspberry Pi 4 running laravel for API calls, websockets, etc. Each RPI will be installed in multiple customers places.
For this project i want to have the abality to save/sync the database to a remote server (when the local system is connected to internet).
Multiple locale databases => One remote database cutomers based
The question is, how to synchronize databases and identify properly each customers data and render them in a mutualised remote dashboard.
My first thought was to set a customer_id or a team_id on each tables but it seems dirty.
The other way is to create multiple databases on the remote server for the synchronization and one extra database to set customers ids and database connection informations...
Someone has already experimented something like that? Is there a sure and clean way to do this?
You refer to locale but I am assuming you mean local.
From what you have said you have two options at the central site. The central database can either store information from the remote databases into a single table with an additional column that indicates which remote site it's from, or you can setup a separate table (or database) for each remote site.
How do you want to use the data?
If you only ever want to work with the data from one remote site at a time it doesn't really matter - in both scenarios you need to identify what data you want to work with and build your SQL statement to either filter by the appropriate column, or you need to direct it to the appropriate table(s).
If you want to work on data from multiple remote sites at the same time, then using different tables requires tyhat you use UNION queries to extract the data and this is unlikely to scale well. In that case you would be better off using a column to mark each record with the remote site it references.
I recommend that you consider using Uuids as primary keys - it may be that key collision will not be an issue in your scenario but if it becomes one trying to alter the design retrospectively is likely to be quite a bit of work.
You also asked about how to synchronize the databases. That will depend on what type of connection you have between the sites and the capabilities of your software, but typically you would have the local system periodically talk to a webservice at the central site. Assuming you are collecting sensor data or some such the dialogue would be something like:
Client - Hello Server, my last sensor reading is timestamped xxxx
Server - Hello Client, [ send me sensor readings from yyyy | I don't need any data ]
You can include things like a signature check (for example an MD5 sum of the records within a time period) if you want to but that may be overkill.

How I can connect and fetch the data from multiple mysql databases on multiple severs?

I want to fetch data from multiple mysql databases which are on multiple servers.
I'm using phpmyadmin (mysql). All the databases will be mysql database (same vendor) which are on multiple servers. First I want to connect to those server databases and then I want to fetch data from them and then put the result in central database.
For example : remote_db_1 on server 1, remote_db_2 on server 2, remote_db_3 on server 3. and I have central database where I want to store the data which comes from different databases.
Query : select count(user) from user where profile !=2; same query will be run for all the databases.
central_db
school_distrct_info_table
id school_district_id total_user
1. 2 50
2. 55 100
3. 100 200
I've tried federated engine but it doesn't fit to our requirement.What can be done in this situation any tool, any alternative method or anything.
In future no. of databases on different server will be increased. It might 50, 100, maybe more, exporting the tables from source server & then load to central db will be hard task. So I'm also looking for some kind of etl tool which can directly fetch data from multiple source databases and then sending the data to destination database. In central db table, structure,datatypes,columns everything will be different. Sometimes we might need to add extra column to store some data I know it can be achieved through etl tool in the past I've used ssdt which works with SQL Server but here this is mysql.
The easiest way to handle this problem is with federated servers. But, you say that won't work for you.
So, your next best way to handle the problem is to export the tables from the source servers and then load them into your central server. But that's much harder. This sort of operation is sometimes called extract / transform / load or ETL.
You'll write a program in the programming language of your choice (Python, php, Java, PERL, nodejs??) to connect to each database separately, then query it, then put the information into a central database.
Getting this working properly is, sad to say, incompatible with really urgent. It's tricky to get working and to test.
May I suggest you write another question explaining why server federation won't meet your needs, and asking for help? Maybe somebody can help you configure it so it does. Then you'll have a chance to finish this project promptly.

How to store sensitive data of different clients in SQL server?

I work at a small company and I am trying to figure out a solution for storing sensitive data of multiple clients in Microsoft SQL server. Actually, I feel like this is a general database question and it is not specific to MSSQL.
Until now we have been using a proprietary database where the client data is stored as db files (flat files) in the client’s root directories in the file system. So the operating system permissions guarantee that the application used by client X can never fetch data from client Y’s database. Please note that there is no database server/instance/engine here…
However, for my project I want to use SQL database. But the security folks are expressing concerns over putting data of different clients on a single database.
One option is to create separate database instances for different clients. However, I am not sure if this idea is scalable.
So my questions are:
1) Is there any mechanism in MSSQL that enables you to store databases ‘separately’ in different files used by the SQL server?
2) Let’s say I have only one database instance where I have databases of client X and client Y. How can I make sure that client X’s requests can never (accidentally) get misdirected to client Y’s database? I do not want to rely on some parameter in my code to determine which database to fetch from! :)
So, is there any solid authentication scheme to guarantee that my queries could not be misdirected to fetch from an incorrect client table?
I think this is a very common problem and there has to be a good solution for this. What are other companies doing?
Please let me know if there are any good articles to read up on this.
Different databases are always stored in different files in SQL Server so you don't even have to do anything special for this. However, NTFS permissions will not help you in this case as the clients aren't ever accessing the files directly on disk.
One possible solution in SQL Server is to create separate sets of Windows user IDs and map those to separate SQL Logins for each customer. You could then only assign those logins access to the appropriate databases. For example, if you were hosting web sites for client X and client Y, you would set up the connection string(s) in the web.config for client X's web site to use the appropriate login(s) for client X's database. Vice versa for client Y. This guarantees that no matter what (barring a hard-coded login), the code from client X's site will never access client Y's database.
You can have 32,000 databases on a single instance of SQL server and having separate databases enables a number of improved serviceability scenarios (such as restoring a single customer's DB in case of a data problem without affecting all of your other customers).
http://technet.microsoft.com/en-us/library/ms143432.aspx

Best way to report events / read events (also MySQL)

So I'm going to attempt to create a basic monitoring tool in VB.net. Now I'd like some advice on how basically to tackle the logging and reporting side of things so I'd appreciate some responses from users who I'm sure have a better idea than me and can tell me far more efficient ways of doing things.
So my plan is to have a client tool, which will read from a MySQL database values and basically change every x interval, I'm thinking 10/15 minutes at the moment. This side of the application is quite easy, I mean I can get something to read a database every x amount of time and then change labels and display alerts based on them. - This is all well documented and I am probably okay with that.
The second part is to have a client that sits in the system tray of the server gathering the required information. Now the system tray part I think will probably be the trickiest bit of this, however that's not really part of my question.
So I assume I can use the normal information gathering commands and store them perhaps as strings and I can then connect to the same database and add them to the relevant fields. For example if I had a MySQL table called "server" and a column titled "Connection" I could check if the server has an internet connection for example and store the result as the value 1 for yes and 0 for no and then send a MySQL command to the table to update the "connection" value to either 0/1.
Then I assume the monitoring tool I can run a MySQL query to check the "Connection" column and if the value is = 0 change a label or flag an error and if 1 report that connectivity is okay?
My main questions about the above are listed below.
Is using a MySQL database the most efficient way of doing something like this?
Obviously if my database goes down there's no more reporting, I still think that's a con I'll have to live with though.
Storing everything as values within the code is the best way to store my data?
Is there anything particular type of format I should use in the MySQL colum, I was thinking maybe tinyint(9)?
Is the above method redundant and pointless?
I assume all these database connections could cause some unwanted server load, however the 15 minute refresh time should combat that.
Is there a way to properly combat delays with perhaps client updating not in time for the reporter so it picks up false data, perhaps a fail safe for a column containing last updated time?
You probably don't need the tool that gathers information per se. The web app (real time monitor) can do that, since the clients are storing their information in the same database. The web app can access the database every 15 minutes and display the data, without the intermediate step of saving it again. This will provide the web app with the latest information instead of a potential 29-minute delay.
In other words, the clients are saving the connection information once. Don't duplicate it in the database.
MySQL should work just about as well as anything.
It's a bad idea to hard code "everything". You can use application settings or a MySQL table if you need to store IPs, etc.
In an application like this, the conversion will more than offset the data savings of a tinyint. I would use the most convenient data type.

Difference between filter and a where clause

I'm always with my Access app..
As far as I know, when I execute a sql clause to my back end (accdb file), say
SELECT * FROM tbl WHERE id=1;
It gets filtered on the back end, then just one record is transmitted over the network.
My question is, when I open a form bounded with a query (no where clause) using a filter parameter, like
DoCmd.OpenForm "Form",,, strFilter
how many records are transmitted on the network? They get filtered like that sql clause or they get filtered locally, meaning a big pile of data has to be sent over the network?
I'm concerned about this because I have many subforms bounded to queries, then I open them in the main forms with filter parameter. And of course, the network here is not very good.
EDIT: The environment of my app is on a factory with no local server. All network/information thing is in company's headquarter 300km away, maybe a WAN.
Except upgrading to SQL server alike, do I have other solutions to make it more reliable? I've heard of something 'Citrix', I happened to have a 'Citrix Neighborhood Agent Program' in my sys tray, can it host my app to make it faster?
DoCmd.OpenForm "Form",,, strFilter
how many records are transmitted on the network?
As many as match your strFilter condition. So, if WHERE id=1 returns one row in the earlier SELECT query, and strFiler = "id=1", that OpenForm will open the form with that single row as its record source.
The WhereCondition parameter is also available for DoCmd.OpenReport, and operates the same way as with OpenForm, which you also may find useful.
Edit: You should have an index to support the WHERE criteria whether you build it into the query or do it "ad hoc" with OpenForm WhereCondition. With an index the database engine will read the index to find which rows match, then retrieve those rows. So retrieval will be more efficient, and therefore faster, than forcing the engine to read every row to determine which of them include matches.
When Jet/ACE requests data from a file server, the first thing it needs is the database header information, which has data structures describing the structure of the data file. This is information is requested once in your Access session, so it's really only an issue at startup.
When you then request a record, Jet/ACE uses the metadata it has about the file to request the relevant index pages for the table(s) involved, then uses those indexes to determine the minimum number of data pages to request.
With properly structured indexing and filters on primary keys the amount of data retrieved is actually quite minimal.
However, it's still going to be more than will allow proper response times across a WAN. Access was designed for use across a wired LAN, back in the days when the networking standard was 10BaseT (10Mbps). Anything less than that and you'll have problems. WiFi is right out, as well, but not because of bandwidth, but because of the unreliability of the connections.
When you need to support users remotely, the easiest solution is to host the Access application on a Windows Terminal Server. WTS is built on technology licensed from Citrix, so you'll often see the whole concept described as Citrix, but your default WTS setup is quite different from a Citrix installation. You have to pay extra for Citrix, and it gives you a lot of different features.
I've used WTS without Citrix in many environments and frankly can't see what the justification would be for Citrix (except when you have to support large numbers of remote users, i.e., in the range of 100 or more). WTS is installed on every Windows Server starting with Windows 2000 and is very easy to set up and configure.
The second easiest solution, in my opinion, is to upsize the back end to a server database and then rewrite for efficiency to insure you're using the server as much as possible and not pulling too much data across the wire.
A third solution would be Sharepoint, but I'm not experienced with that. It is definitely the direction that MS is pushing for Access apps in distributed setups, but it's quite complex and has a whole lot of features. I wouldn't recommend plunging into it without lots of preparation and significant corporate support.
Actually, with Access, there is not really a true back-end as there is with a bona-fide client-server engine like SQL Server or Oracle or Postgres. Access uses a shared-file architecture where the client program itself "owns" chunks of the file on disk, as distinct from a message-passing architecture where the client program sends requests for data to a back-end engine process running on a server where that process "owns" the data. With shared-file, all work occurs on the client, so it is possible for freight-train-loads of data to be brought across the wire if the database file resides on a different machine.
When you ask Access for data, it often reads a lot more data from the MDB file on disk and caches at the local client a lot more data than what your statement has asked for. Access tries to do this intelligently, anticipating your needs. "Now that I'm here", Access says, "I might as well make the expensive trip to disk worthwhile and grab a sh*tload of data". Don't get me wrong. I'm not an Access basher and have been using it for more than 10 years, from back in the days when LAN bandwidth was 10mbit/sec. Access is very good for some things. But Access can gobble up bandwidth like you wouldn't believe.
Read up on "keysets" in Access.
P.S. I am not the same Tim as the Tim who left you a comment.
Some useful links:
http://msdn.microsoft.com/en-us/library/dd942824(v=office.12).aspx
http://support.microsoft.com/kb/209126
http://support.microsoft.com/kb/112112
http://support.microsoft.com/kb/128808