File Management
Almost all web apps use user uploaded files. From documents that need to be connected with some models in a database, to images as user’s avatars. The more users your service has, the more files your server will accumulate. If you neglect file management, you will soon have a serious space problem.
Where to store users files?
There is a few main ways of storing user uploaded files: in a filesystem or in database. As with everything, both of these solutions have their pros and cons.
If your application has a low amount of rather small files, that need to be tightly coupled with objects stored in a database, or you need a more secure way of storing sensitive files, then you should use your database for this. However this solution will largely increase the size of your database backups and may require you to convert some files to blobs.
Storing files in a filesystem makes more sense when you are dealing with large files or large numbers of users. Expanding a filesystems size is much cheaper then buying more database space. Another pro of using this approach is the ease of migration. If at any point you will need to migrate user uploaded files to another server or S3 it’s much easier to do this, then to extract them from your database and then migrate them.
How to manage files in a filesystem?
While implementing file management functionality to your server application, you need to remember about a few things:
- all saved files need to be easily identifiable
- you have to delete unused files
- you have to limit access to other files
File identification
Since you will probably need to connect data from a database to specific files in the filesystem (for example connecting a user to his avatar) and you will need to store some data about the file for ease of use, it’s a common practice to create a File model and store it in your database. It could look like this:
case class File( id: UUID, contentType: String, path: String, filename: String)
We will be able to connect the User to his avatar by saving the files ID as a foreign key. In the path
variable we will store a relative path to the image, contentType
will represent MIMO type of file and filename
will store the file’s original name.
Let’s think of scenario where two users try to upload their avatar image “image.jpg”. What would happened if we just save both of these files to same directory? One will override the other. To prevent this, we will create a separate directory for every uploaded file. To ensure that it’s unique we will use the current timestamp as it’s name.
Let’s come back to File
model. Our app will have a designated directory for saving all files (base_file_dir). Since we know that every uploaded file will be stored in a separate subdirectory we can set path
to just “timestamp/file_name”. Now when we want to access this file, we can retrieve it’s partial location from our database and concatenate it with a base directory to get an absolute path.
Important!
While retrieving filename form data sent by a user, you need to be careful to clean it. You should take only the extension and last part of your path and clean it from “..”. This will prevent users from saving files outside of a determined directory.
File cleanup
Let’s say that our user has sent a form with a file attached to it, but the form is not correct and our server returns an error. What should happen to the file? You can save it before or after form validation. If you save it after, the user will need to resend it after correcting the form. If you save it before validation, user won’t have to send it again, but if they won’t send corrected a form your stuck with an unused file. To prevent this we should implement some form of temporary file storage.
The temporary file will be a separate directory withing base_file_dir
, let’s call it tmp
. All files sent to the server will be initially saved to tmp
(within a separate directory with a timestamp as its name) and no File
object will be created yet. When the server determines that our form is valid, then the saved file will be moved form tmp
to base_file_dir
and the File
object will be created with the path set to its new location.
For now there is no difference between temporary storage and base storage. To make tmp
temporary, we need to create a method that will be run periodically, that will delete some files from tmp
. We can’t delete all files. Some of them may be connected to valid forms, that are currently validated or to forms that the user is now correcting. To prevent this we can delete files that are older than some time – let’s say 30m. To do this we will create a method like the one below and run it using Play ScheduleModule
.
def clearTempFiles(): Unit = { val currentTimestamp = LocalDateTime.now().minusSeconds(1800).atZone(ZoneId.systemDefault()).toInstant.toEpochMilli val tmpDirectory = new java.io.File(s"${base_file_dir}/tmp") // Base tmp file directory if (tmpDirectory.exists()) { // if tmp directory exists tmpDirectory.list().filter(_.toLong < currentTimestamp).map { subDirectory => // filter all files in tmp dir, that have timestamp lower then current time - 30min val fullDirPath = s"${tmpDirectory.toPath}/${subDirectory}" try { val dir = new Directory(new io.File(fullDirPath)) if (!dir.deleteRecursively()) { logger.error(s"Unable to delete directory: ${fullDirPath}") } } catch { case e: Throwable => logger.error("Delete file error!") } } }}
Serving files
After saving a file it’s time to serve it to users. One of the most secure ways of accessing it is by it’s ID. The server would search for a File
object in database with given ID then it would retrieve a file from base_file_dir based on the relative path saved in database. Since users don’t have access to any form of path to file, it’s impossible for them to access any other file.