Getting a list of random place names and person names - language-agnostic

I want to demonstrate a product to a potential new customer.
The best source data comes from an existing customer.
I want to use the existing customer's data for the demonstration, but without compromising confidentiality in any way.
The best solution I see is to run a script that replaces all of the names, addresses and locations in the database with randomly selected names.
So, now I need to find a list of place names and person names to use as a source. Preferably this would be in a text file so it can be read easily.
This seems like a pretty common problem. Does anyone know of a site that I can download these names from?

Check out: http://infochimps.org/, for example: http://infochimps.org/datasets/d-1990-census-name-files and http://infochimps.org/datasets/word-list-10-000-common-place-names

I know this question is a bit old, but here's my 2 cents.
I wanted to also suggest that you could pull down data from public universities. For example,
http://www.wisc.edu/directories/?q=john+smith. You can find other open directories for all state schools.
All it would take is writing a script that loops through a list of names, EG http://www.behindthename.com/top/lists/us/2010/1000, and for each name searching a number of publically accessible directories and saving the first 5 results.

Related

Using a MySQL query and BASH, how can I Delete, Rename, or Move all image files used by Drupal nodes before a certain date?

BACKSTORY IF YOU'RE INTERESTED: A friend of mine owns a magazine and has been publishing a corresponding Drupal 7 website since 2011. The site has thousands of articles and around 50,000 images supporting those articles. Unfortunately, due to copyright trolling attorneys, he's already been hit with a couple of copyright infringement lawsuits over images that he thought were from "creative commons." Since his first lawsuit in 2016, he's made sure all images are from a stock image company. But apparently, very recently, yet another image from before 2016 has caused another copyright troll to seek $18,000 (it's literally a photo of a hotdog by the way). Nevertheless, his business insurance company just wants to pay the settlement fees rather than risk anything in court, but has demanded that all potentially suspect images be deleted from the site going forward. Since 95% of the stories that have been published on his site have had fewer than 1000 views anyway (they are worth less than 50 cents from advertisers), he has agreed to take all those images down because $.50 is definitely not worth the risk of feeding any more trolls.
QUESTION: What's the best way to delete, rename or move all the images that are connected to a story node before a certain date in 2016? It would be nice if we could temporarily just change the filenames on the filesystem from "trollfood.jpg" to "trollfood.jpg.bak" (or something) so that if/when he can ensure that an image is in fact in the public domain, he can revive it. It would also be nice, if we could replace all the potentially suspect images links (in the db) with a placeholder image links for the time being (so that people can still read the article without wondering where the images have gone...perhaps the image will be a brief explanation of the trolling situation). Anyway, it's been a minute since I've done anything with Drupal, so I've forgotten how drupal links files to nodes (and he has some custom content types powering his main articles).
I've been able to get all the potentially suspect images in a list via mysql:
SELECT fid, filename, timestamp, from_unixtime(timestamp, "%Y-%m-%e")
FROM drupal7_therooster.file_managed
where timestamp between unix_timestamp('2011-01-01') and unix_timestamp('2017-01-01');
// here's sample output:
# fid filename timestamp from_unixtime(timestamp, "%Y-%m-%e")
6154 _MG_5147.jpg 1373763148 2013-07-14
6155 _MG_5179.jpg 1373763148 2013-07-14
6161 The Lone Bellow (4 of 5).jpg 1373866156 2013-07-15
6162 The Lone Bellow (1 of 5).jpg 1373866156 2013-07-15
Now, how can I use this to find the potentially offending stories that uses these images, and perform the following:
Create a list of all the stories that use these images so I can save this in case he ever wants to revive these images. I know SQL well enough...I just don't know which tables keep which data.
Create a query that replaces these image associations in these stories to a placeholder image (so if story uses "trollfood.jpg", that story now uses "safetyimageplaceholder.jpg" instead. Some stories have multiple images attached to them.
Once all the potentially offending articles reference a placeholder image instead, I still need to move all the offending files so they can't be accessed by lawyers...I have access via ssh by the way. Are there any good ways of using bash commands to ONLY move/rename files that match the list I generate from an SQL query? I just want to be careful not to delete/rename/move any images that were NOT part of the query. Bear in mind the file creation date in the filesystem is all 2017+ on the server because the server was moved (or copied) in 2017 so the file system's original creation dates are inaccurate.
I know this is a long question...and it involves a Drupal site, but I think I might need the help of proper SQL and bash experts, so I've posted it here instead of the Drupal specific stackexchange. I'm totally open to any suggestions if another completely different approach is better suited for this problem. Cheers!
I was able to answer my own question. I had to do three main things:
STEP ONE: Create a query for Drupal's MySQL database that would give me a list of all potential copyright infringing files that were being used by nodes created between 2012 and 2017:
SELECT fm.fid, fm.filename,
n.title, n.nid, from_unixtime(n.created, "%Y-%m-%d") as 'node_date'
FROM file_managed fm
JOIN file_usage fu ON fm.fid = fu.fid
JOIN node n ON fu.id = n.nid
WHERE created BETWEEN unix_timestamp('2012-01-01') AND unix_timestamp('2017-01-01')
ORDER BY node_date
This is a moderately complex query, but basically it joins columns from three tables (Drupal 7's file_managed, node, and file_usage tables). The file_usage table is a shared key register of which files (via fid) are used on which nodes (via nid).
STEP TWO: Organize and filter the data to create a list of files.
I filtered and ordered the results by node created dates. I got about 48K records from the join query in step one, and then I created a google spreadsheet to clean up and sort the data. Here's a sample of the google spreadsheet. This sheet also includes data from the node_counter table which tracks page views for each node. Using a simple VLOOKUP function to match the total page views for each nid on the main sheet, now the main sheet can be sorted by page views. I did this so I could prioritize which images attached to each nodes/article I should check first. This is the sql query I used to get that data from the db by the way:
SELECT nid, totalcount, daycount, from_unixtime(timestamp, "%Y-%m-%d") as 'date'
FROM node_counter
ORDER BY totalcount DESC
STEP THREE: Write a Shell Script that will take our filtered list of files, and move them somewhere safe (and off the public webserver).
Basically, I needed a simple BASH script that would use the list of files from step two to move them off the web server. Bear in mind, when each image file is uploaded to the server, Drupal can (and did) created about a dozen different aspect ratios and sizes, and placed each one of these copies into corresponding folders. For example, one image filename could be copied and resized into:
files/coolimage.jpg
files/large/coolimage.jpg
files/hero/coolimage.jpg
files/thumbnails/coolimage.jpg
files/*/coolimage.jpg etc etc.
So, I have to take a list of ~50K filenames, and check for those filenames in a dozen different subfolders, and if they are present in a subfolder, move each of them to an archived folder all while preserving the folder/file tree structure and leaving behind files that are "safe" to keep on the public web server. So...I ended up writing THIS simple script and open sourced it on Github in case anyone else might benefit from it.
That's it! Thankfully I knew some SQL and how to use google spreadsheets...and some basic bash..and well, how to use google and solve problems. If google users are able to find this helpful in the future...cheers!

Mapping users to all of their files(URLs) in a mysql database.

What I want is that when I have looked up a user in a table, I want to list all the file urls that the user have access to. My first thought was to have a field in the table with a list of file URLs. However, I have now understood that there are no such field type.
I was then thinking that maybe ForeignKeys might work, but I am having trouble getting my head around it.
Another solution maybe is to have one table for each user, with each row representing each file.
What would you say is best practice in this case?
I am also going to expand into having shared files, but thought that I'd address this issue first.
Suggest you explore the JSON Data Type
2 tables: user and user_uri_permission? 2 columns in the second: userID and URI. When the User-URI pair is in the table the use has access.

Can i store multiple add-on for a product in column of table as JSON?

Prob: I have a table which contains list of toys(dolls).while purchasing user can choose options(black dress or white dress) and accessories(earring and anklet).
options 1: First i thought of adding 2 different table for options and accessories.
http://imgur.com/a/FnE6O
But as i am never going to filter/search on these table so i thought of putting these options as JSON in seperate column.So that frontend can easily render JSON as options to user
QUES: As i have to add total price from options and accessories
Is this ok to store these type of details as JSON format?
I am also open for other suggestions which is easily maintainable
If you are using an sql-database (and it looks like you are) I really advise you against keeping the data as JSON inside.
Even though MySQL 5.7.8 gives you a good way to access data inside json types, you never know when you are going to need to filter/search/group based in that data.
A quick example that jumps here is that you want to give a user that buys a new doll (productid == 1) a list of all the add-on that people that also bought that doll (productid == 1) purchased in the last week (or month). If you are going to save that data as JSON it will be very hard to query.
I know that your example shows only the meta-data of the add-ons and options, but it's "easy" to go and also save the actual purchases that way (just as add-on to the purchase-row itself).
I advise against it.

Need help starting simple MySQL database using data from Excel

I'm and intern and I've been tasked with something I'm pretty unfamiliar with. My manager has requested I create a simple MySQL database using data from an Excel file(s) and I have no idea where to start. I would normally ask someone here for help but everyone seems to be really busy. Basically, the purpose of the database is to see what different object-groups relate to one another so as to keep things standardized. Trying not to go into detail about things not really relevant.
I was asked to first design a schema for the database and then I would get an update on how to implement it. Would I just start by writing queries to create tables? I'm assuming I would need to convert the Excel files to .csv, how do I read this data and send it to the correct table based on Object Type (an attribute of each object, represented in a column)?
I don't want to ask too much right now, but if someone could help me understand what I need to do to get started I would really appreciate it.
Look at the column headers in your spread sheet.
Decide which columns relate to Objects and which columns relate to Groups
The columns that relate to just Objects will become your field names for the Object table. Give this table an ID field so you can uniquely identify each Object.
The columns that relate to the Groups will become field names for a Group table. Give this table an ID field so you can uniquely identify each Group.
Think about if an Object can be in more than one Group - if so you will probably need an Object-Group table. This table would most likely contain an ObjectID and a GroupID.

How can I replace a field with a similar result in MySQL

Unfortunately, I have to deal with a lot of user submitted data, text fields rather than option boxes. I have imported it into my MySQL database as strings. I do all this to be able to run statistics quickly on the data like top 10 most common companies. The problem I have run into is that some of the rows have slightly different names for the same companies. For example:
Brasfield & Gorrie, LLC VS Brasfield and Gorrie
Britt Peters and Associates VS Britt, Peters & Associates Inc.
Is there some fairly straightforward MySQL command or external tool that will allow me to go through and combine these sort of rows. I know how to use REPLACE(), but I don't think it has the power to do this simply. Correct me if I'm wrong!
Taking this example:
Brasfield & Gorrie, LLC VS Brasfield and Gorrie
Assuming that I want to keep the first one, I would find all records that have the ID of the second one and update them to use the first, assuming that this table that has these titles also has an ID field for each one.
You would create a page in PHP that will allow you to administer this with mouse clicks, but it will require regular pruning since you allow users to enter this data. For future entries, you can try to apply the Levenshtein Distance and try to provide a suggestion based on available similar matches so that you can help guide the users to something that already exists rather than a new db entry.