Before you begin
You need:- A project with a MySQL database.
- A command interface installed:
- Sanitizing NoSQL Databases (such as MongoDB)
- Input validation and input sanitization, which both help prevent security vulnerabilities
Sanitize the database
Make sure that you only sanitize preview environments and never the production environment. Otherwise you may lose most or even all of the relevant data stored in your database. First, take a database dump of your preview environment. This is just a safety precaution. Production data isn’t altered. To get a database dump, run the following command:upsun db:dump -e DEVELOPMENT_ENVIRONMENT_NAME.
- Manually
- With Drupal and Drush
Assumptions:
usersis the table where all of your PII is stored in thestagingdevelopment database.stagingis an exact copy of your production database.
-
Connect to the
stagingdatabase by runningupsun sql -e staging. -
Display all fields from your
userstable, to select which ones need to be redacted. Run the following query:You see output like the following: -
Change the fields where PII is contained with the
UPDATEstatement. For example, to change the display name of users with an email address not in your company’s domain to a random value, run the following query:Adapt and run that query for all fields that you need to sanitize. If you modify fields that you shouldn’t alter, you can restore them from the dump you took in step 1. You can create a script to automate the sanitization process to be run automatically on each new deployment. Once you have a working script, add your script to sanitize the database to adeployhook:
What’s next
You learned how to remove sensitive data from a database. To replace sensitive data with other meaningful data, you can add afaker to the process.
A faker is a program that generates fake data that looks real.
Having meaningful PII-free data allows you to keep your current Q&A, external reviews, and other processes.
To add a faker, adapt your sanitizing queries to replace each value that contains PII with a new value generated by the faker.
You might also want to make sure that you implement input validation.
If your database contains a lot of data, consider using the OPTIMIZE TABLE statement
to reduce its size and help improve performance.