I have an LDAP CSV file that is imported nightly and dumped into my MYSQL database. It has about 70000 employee records.
Included in that is empl#, email, group, supervisor, etc.
I have reports that are being generated from various web sites. We are dumping these reports in the database once a month. These reports usually have empl#, email, hits, logins, whatever...
My goal is to combine the report data and add in things like group, supervisor, etc based on empl#... Speed is a big concern because of the size of the database and number of users.
At first I thought of making a simple left join (given that report data is left - and that all people in the report may not be an employee). However the problem with that is that it does not take a snapshot in time. If report data from 6 months ago is viewed I don't want it mixed with current employee data - I want it to stay a snapshot in time.
What is the best way to handle this?
You will need a date column of some kind in both sets of data on which to join. Once you have that, you can simply put a condition that establishes the snapshot in the WHERE that limits the selection.
Related
I am looking for a way to store auto-generated reports. There are about 10-15 columns and 100-3000 rows depending on the report but each report is consistent in column count.
I am looking for a way to organise and store these reports into a large group without creating an entire new database and 1000s of tables to store each indervidual report.
The reports need to be queryable so they can be subdivided by team/area/person etc as each report can be a combination of 3-4 different sub-reports depending on how you split/sort the data.
I am using Python to collect and sort the data from the database so using MariaDB/MySQL would be preferred but im happy to use something else if there is a pre-exising connection libary for it.
To sum up i need something similar to a excel spreadsheet with each table being a sheet and sheet name being the date it was generated so i can select by the date generated.
Think through the goals.
Is this a legal issue -- you need to produce an unalterable report as something "official". A la a non-editable .pdf?
(at the opposite extreme) Be able to generate (or regenerate) any report for any timeframe.
Is performance an issue? (Either perceived or real)
I like to build and maintain Summary Table(s) for any "Data Warehouse" application. And build "reports" that take as a parameter a date range and a small number of other things. And have the report generation so fast that it does not matter if multiple people are pulling reports at random times.
15 columns and 3000 rows is usually excessive. If pulling a report is trivial enough, it can be less 'massive'; just get the parts you want, without such bulk.
http://mysql.rjweb.org/doc.php/summarytables
I'm trying to develop a new reporting module for a resource management tool (PHP+Mysql).
I am trying to extract data in the following format from mysql:
I have a table that consists of date and location of multiple people(i.e Office, Home or Client).
Sample Data as in DB.
here date_plotted means the date at which the user is engaged and plotting_date represents when this particular entry was made in the system(the date). So User was plotted to be in office on 30th Oct and the same entry was made on 30th Oct.
Data as in resource table
The resource table represents the user table.
Any suggestions on how to do the same in mysql?
These are the primary tables which needs to be used.
The above table id done in excel for now to represent the outcome.
I'm new to SQL so haven't tried anything yet.
There is a tool for Windows that might simplify this operation. It's made by MySQL and called MySQL for Excel. In theory it should allow you to structure and make changes to MySQL databases as well as perform queries that result in spreadsheets.
Without knowing more about your data, for example being supplied an actual csv file to work with, and the parameters of the actual pull, whether it's fix dates always or if this is a dynamic pull based on a range this question could result in 100 different implementations that visually return similar results, but have massively different requirements overhead-wise in implementation.
I receive csv files at the end of each month from my customer for each of their KPI (for example csv's for resumes received, candidates joined, candidates resigned, sales, profits, loss , etc) for that specific month.
I want to be able to query this data inorder to generate reports for any month, day or year. This report will be generated dynamically i.e the admin would specify what rows he would like to have in a report (for eg a report with applications received, applications shortlisted, candidates shortlisted after the 1st interview for the period of jan to july.) for any period of time.
What would be the best way to store the data into my database in order to generate such reports? I am using Mysql as my database.
I am not sure if I would need to flush out the old data from my tables currently. So considering that I keep all the data persistent, what would be the best suited database design for this?
Currently what I do is I have a table for each of their KPI. This table has got a date field which I am using to generate the report. But I am looking for a more optimized way.
Thanks in advance.
It is better to store those values (month or year related values ) in a "Date" type fields which would not need any other manipulation while building reports. The conditions or logic for the specific period of time should be handled in your front end. In this case, the usage of Date field is the optimized way.
im a bit of a newbie in access and i hope im not asking a stupid question. I have recently had to move an inventory system from excel to access. Each product is recipted in tbl.rct and has an order number a lot number quantity and expiry.
Each individual lot number needs to be verified before it can be recipted this information is on tbl.lot.
While making a form to receipt products i noticed that i couldnt add any products without their lot number already on lot.tbl - is there a way to get around this?
http://imgur.com/kCc7G39
Attached relationships
I think you mix between Excel and the Access. These Table imported directly from the excel without any requirement change to meet Access Goals. The Database use to reduce the repeat routine work. The Tables that most be (Products, Order, Receipts, Lot must be Stock and collect data of (Qty, lot#, expiry, damage). Now we make sequence to how insert to Database. Open New Receipt to include in the stock the Product(link ID) and the detail. This is now in the warehouse. For selling you will make invoice when select product will show you the Lot available and its expiry and of course you select filter to filter on FIFP LIFO.
You can send me the excel file to convert to database if yes please provide me more information because the flow not clear well
I'm going to do my best to try to explain this. I currently have a data flow task that has an OLE DB Source transferring data from a table from a different database to a table to another database. It works fine but the issue I'm having is the fact that I keep adding duplicate data to the destination table.
So a CustomerID of '13029' with an amount of '$56.82' on Date '11/30/2012' is seen in that table multiple times. How do I make it so I can only have unique data transferring over to that destination table?
In the dataflow task, where you transfer the data, you can insert a Lookup transformation. In the lookup, you can specify a data source (table or query, what serves you best). When you chose the data source, you can go to the Columns view and create a mapping, where you connect the CustomerID, Date and Amount of both tables.
In the general view, you can configure, what happens with matched/non matched row. Simply take the not matched output and direct it to the DB destination.
You will need to identify what makes that data unique in the table. If it's a customer table, then it's probably the customerid of 13029. However if it's a customer order table, then maybe it's the combination of CustomerId and OrderDate (and maybe not, I have placed two unique orders on the same date). You will know the answer to that based on your table's design.
Armed with that knowledge, you will want to write a query to pull back the keys from the target table SELECT CO.CustomerId, CO.OrderId FROM dbo.CustomerOrder CO If you know the process only transfers data from the current year, add a filter to the above query to restrict the number of rows returned. The reason for this is memory conservation-you want SSIS to run fast, don't bring back extraneous columns or rows it will never need.
Inside your dataflow, add a Lookup Transformation with that query. You don't specify 2005, 2008 or 2012 as your SSIS version and they have different behaviours associated with the Lookup Transformation. Generally speaking, what you are looking to do is identify the unmatched rows. By definition, unmatched means they don't exist in the target database so those are the rows that are new. 2005 assumes every row is going to match or it errors. You will need to click the Configure Error Output... button and select "Redirect Rows". 2008+ has an option under "Specify how to handle rows with no matching entries" and there you'll want "Redirect rows to no match output."
Now take the No match output branch (2008+) or the error output branch (2005) and plumb that into your destination.
What this approach doesn't cover is detecting and handling when the source system reports $56.82 and the target system has $22.38 (updates). If you need to handle that, then you need to look at some change detection system. Look at Andy Leonard's Stairway to Integration Services series of articles to learn about options for detecting and handling changes.
Have you considered using the T-SQL MERGE statement? http://technet.microsoft.com/en-us/library/bb510625.aspx
It will compare both tables on defined fields, and take an action if matched or not.