I am building a web application that will run off of data that is produced for the public by a governmental agency. The issue is that the csv file that houses the data I need is a 2,000 column beast of a file. The file is what it is, I need to find the best way to take it and modify it. I know I need to break this data up into much smaller tables within MySQL, but I'm struggling with the best way to do this. I need to make this as easy as possible to replicate for next year when the data file is produced again (and every year after). I've searched for programs to help, and everything I've seen deals with a huge amount of rows, not columns. Has anyone else dealt with this problem before? Any ideas? I've spent the last week color coding columns in excel and moving data to new tabs, but this is time consuming, will be super difficult to replicate and I worry it leaves me open for copy and paste errors. I'm at a complete loss here!
Thank you in advance!
I suggest that you use functions in excel to give every column an automatic name "column1", "column2", "column3", etc.
After that import the entire csv file into MySQL.
Decide on which columns you want to group together into separate tables. This is the longest step and no program can help you manage this part.
Query your massive SQL table to get just the columns you want for each group. Export these queries to CSV and then import them as new tables in your database.
At the end, if you want, query all the columns you didn't put into separate groups. Make this a new table in the database and delete the original table to save on storage space.
Does this government csv file get updated and republished in the same format every time? If so you'll need to write a script to do all of the above automatically.
Related
On our Wordpress site, we use a plugin called s2member and it stores the levels (roles) of our clients as well as the times they were assigned a specific level in our database. I would like to create a table that shows when a user was assigned a specific level. I'm having a challenge getting the data I need because of the way the data is stored in the field. It stores all of the levels along with the associated dates and times when a user's level was changed in one field. In addition, it stores all of the times as Unix timestamps. Here's an example of a typical field associated with a client:
a:20:{s:15:"1562695223.0001";s:6:"level0";s:15:"1562695223.0002";s:6:"level1";s:15:"1562695223.0003";s:6:"level2";s:15:"1562695223.0004";s:6:"level3";s:15:"1577906312.0001";s:11:"ccap_prepay";s:15:"1596575898.0001";s:12:"-ccap_prepay";s:15:"1596575898.0002";s:13:"ccap_graduate";s:15:"1596575898.0003";s:11:"ccap_prepay";s:15:"1596575898.0004";s:7:"-level3";s:15:"1597196952.0001";s:14:"-ccap_graduate";s:15:"1597196952.0002";s:12:"-ccap_prepay";s:15:"1597196952.0003";s:13:"ccap_graduate";s:15:"1597196952.0004";s:11:"ccap_prepay";s:15:"1598382433.0001";s:14:"-ccap_graduate";s:15:"1598382433.0002";s:12:"-ccap_prepay";s:15:"1598382433.0003";s:11:"ccap_prepay";s:15:"1598382433.0004";s:6:"level3";s:15:"1605290551.0001";s:12:"-ccap_prepay";s:15:"1605290551.0002";s:11:"ccap_prepay";s:15:"1605290551.0003";s:13:"ccap_graduate";}
There are four columns in this table: umeta_id; user_id; meta_key; meta_value. The data above is stored in the meta_value column.
You'll notice that it also has multiple ccap_* entries. CCAP stands for custom capapability and I would like to be able to chart those assignments and associated times as well.
Do you have any idea how I can accomplish this?
Thank you for any help you can give.
I talked to an engineer about this and he told me that I would need to learn Python and I believe he said I would need to learn how to also use Pandas and Numpy to extract the data I need but he wasn't exactly sure. I started taking a data analyst course on Coursera but I still haven't learned what I need to learn and it's already been several months. It would be great if someone could provide a solution that I could implement more quickly and use on an ongoing basis.
If there's a way to accomplish my goal by exporting this table to a CSV file and using Microsoft Excel or Google Sheets, I'm open to that too.
Here's an image of the table (if it helps):
Database table
Here's an example of my desired output:
Desired output
In my desired output, I used Excel and created a column that converts the Unix timestamp to a short date and another column where I used a nested IF statement to convert the CCAP or level to its meaning that we understand internally.
I'd like to automate a procedure some. Basically, what I do is import a few spreadsheets from Excel, delete the old spreadsheets that I previously imported, and then change a few queries to reflect the title of the new imports. And then I change the name of the queries to reflect that I've changed them.
I suppose I could make this a bit easier by keeping the imported documents the same name as the old ones, so I'm open to doing that, but that still leaves changing the queries. That's not too difficult, either. The name stays pretty much the same, except the reports I'm working with are dated. I wish I could just do a "find and replace" in the SQL editor, but I don't think there's anything like that.
I'm open to forms, macros, or visual basic. Just about anything.
I've just been doing everything manually.
Assuming I have correctly understood the setup, there are a few ways in which this could be automated, without the need to continually modify the SQL of the queries which operate on the imported spreadsheet.
Following the import, you could either execute an append query to transfer the data into a known pre-existing table (after deleting any existing data from the table), avoiding the need to modify any of your other queries. Alternatively, you could rename the name of the imported table.
The task is then reduced to identifying the name of the imported table, given that it will vary for each import.
If the name of the spreasheet follows logical rules (you mention that the sheets are dated), then perhaps you could calculate the anticipated name based on the date on which the import occurs.
Alternatively, you could store a list of the tables present in your database and then query this list for additions following the import to identify the name of the imported table.
So I'm sure all of you know what a wage report is, it's basically a piece of paper where you write how many hours you worked that day and how much you are making per hour and then you sign a date. So I have an excel version of a physical wage report and I want to be able to take the data from an SQL server and insert it certain pieces of data in certain cells in the excel sheet, because it needs to be put in a certain format. I already know how you can just insert the data from the SQL server into a table in an excel file, but my question is how do I do it if I need the info from SQL displayed in a certain way on the excel sheet.
I haven't tried anything, because I've been searching for a way to do this, but I've come up with nothing. I'm having a hard time knowing where to start, if this is even possible at all.
Good Morning,
I'm fairly new to Access VBA and I've been trying to find a solution to a problem:
I've created a form from which users upload an excel file to a database. File open prompt appears, user selects the file, temp table gets created and data gets pulled to this table. From there a set of macros populate the required fields and push the complete set to a perm table and then temp table gets deleted. Now I would like to take it a step further and try and count how many times a value has been uploaded to the table...
Lets say that the value appears in the table twice already, then if user tries to upload the same value for the third time it will be uploaded to a different table. Bear in mind that the file which users will upload may contain values that will be uploaded for the first, second, third, etc. time.
Do you have any suggestions or solutions to my problem? Is it even possible? If yes then how can I make Access to distinguish which records are being uploaded for the first, second, third, etc. time and follow appropriate paths?
I've been scouting the internet for several days now, but no one seems to have such issue.
Thank you in advance for replies.
I'm not sure I follow. You are essentially trying to prevent inserting duplicate data to a production table and if a duplicate is encountered at the record to a different table?
I have flat file that structured in a hierarchical format that looks something like this:
Area|AreaCode|AreaDescription
Region|RegionCode|RegionDescriptoin
Zone|ZoneCode|ZoneDescription
District|DistrictCode|DistrictDescription
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
RouteFooter
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
RouteFooter
DistrictFooter
District|DistrictCode|DistrictDescription
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
Record|Name|Address|Ect
RouteFooter
Route|RouteCode|RouateDescription
Record|Name|Address|Ect
RouteFooter
DistrictFooter
ZoneFooter
RegionFooter
AreaFooter
I have to bring this into SSIS and consume information about the Record row and also about the header for the current record row. As well as information from several other sources and output a more simple flat file as a result.
I would like to read the flat file above into a structure that each row contains a record with the appropriate header information included.
My question is, what is the best way to do this if it is even possible?
First how do you tell what type of line you are on if you are on say line 3,987,986? How do you tell what is related to what? Is there apossiblity you could get this in a better format? Before spending lots of time (and don't kid yourself, this will take lots of time to set up and test properly) I would kick the file back to the provider and request it in a different format. You won't always get it, but you should at least try.
When I have done this in the past in DTS, the first characters of each line told me which structure the line referred to. I imported all into a staging table with two columns, one for the recordtype data and one for the rest. Then I parsed the rest into the staging tables for the type of record with the correct column structure for that type of record (and any fileds you might need to do the relationships) and then did clean up and then imported to prod tables. AS you also have differnt number of columns I would try that approach (only you may have to manually populate some columns instead of figuring out directly from the file), also give each record an identity filed in the staging tables. this will help you figure out the realtionships I think.