We use many technologies on the Web & backend team here at Raizlabs. Like most projects that involve a user-facing website, we have static files and assets that get served with the page. In cases where Django (and associated libraries) are the basis for the web portion of the project, we like to ensure that our static files are not hosted on the same machine as the python process itself.
What are static files?They are the resources that a web page needs in order to function: CSS, javascript, fonts, images, etc. HTML is a resource, but because our sites are not usually static (never changing), we do not include them in the static files grouping.
Why store them separately?By default, the Django Web framework that provides data to the page will store them on the same machine. However, this also means that every time a web request comes through and Django gives back data, that same machine is also being asked for all of these files, and there are more than you would think, and they are bigger than you’d expect.
So, we want these files stored elsewhere. To the rescue: django-storages is an open-source Python library for Django. This library, with some configuration, will push all of these static files up to your storage area of choice. Our preference is Amazon Web Services ’ (AWS) S3 storage.
For this post, we’re going to focus primarily on setting the S3 Bucket name, which is the directory name where we’ll want our files to be stored on S3. We’ll also cover some basic access policy rules for requests to the storage. Our server that runs Django is also on AWS’s EC2 cloud, which is an additional assumption to the reader of this post.
Fetching the environment nameFor our projects, we generally have a couple different environments. For example, we may have a development and a production environment in which development is changing often while production only changes when we have the next stable build ready. In this case, we want the Bucket name to be different for each environment. Sharing is caring, but sometimes individuality is a good thing!
We could set an environment variable on the AWS machine to be passed down and use that on startup, but that is both tedious and error-prone (typos, forgetting, re-using).
But, wouldn’t it be great if we could ask the system, while on it, what our environment name is? It has to be unique, and it is something we chose already. Unfortunately, there is no overly direct way to do this.
Thankfully, we can use Amazon’s boto library, which is the backbone of their client scripting. It is written in Python…and hey, so is Django!
In this way, we can actually use boto to help us learn something about our system, while running the script on it. We query a private URI from our machine to ask AWS what our instance_id is. But, yuck! The ID is a bunch of letters and numbers that we don’t get to choose, and is not particularly meaningful because of this.
But this ID can be used to get meaningful information about the instance. We tell boto to connect to the EC2 cloud, and we then ask it to return instance information for our ID. Then we get to ask the instance questions, like its environment name.
Here’s an example of how to get the instance_id and then the environment_name :
from boto.ec2import connectionas ec2_connection ec2_conn = ec2_connection.EC2Connection(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY) response = requests.get('http://instance-data/latest/meta-data/instance-id') if response.status_code == 200: instance_id = resp.content.decode(encoding='utf-8', errors='strict') instance = ec2_conn.get_only_instances(instance_ids=[instance_id, ]) environment_name = 'elasticbeanstalk:environment-name' if instanceand instance[0].tags.get(environment_name): env_name = instance[0].tags[environment_name]Set this env_name into the AWS_STORAGE_BUCKET_NAME , and you’re set! So long as this variable is set before you call collectstatic (puts the static files where they need to be), then this upload will work perfectly.
Allowing Cross Origin Resource SharingWait! We’re not done quite yet. We have the bucket name all checked off, but hey, browsers don’t like to load up fonts from domains that aren’t theirs (aws-s3.com vs yourdomain.com). We need to tell S3 that we should allow Cross Origin Resource Sharing (CORS) to this new domain. We can do this by hand by logging into S3 and make that rule active…or we can use boto and do it programmatically on startup. Yeah, that sounds better.
The following code shows how to setup the S3 CORS configuration:
from boto.s3 import connection, bucket, cors
cors_conf = cors.CORSConfiguration() cors_conf.add_rule(allowed_method=['GET', ], allowed_origin=['*', ], allowed_header=['Authorization', ], max_age_seconds=3000) conn = connection.S3Connection(aws_access_key_id=settings.AWS_ACCESS_KEY_ID, aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY) bucket_name = settings.AWS_STORAGE_BUCKET_NAME b = bucket.Bucket(connection=conn, name=bucket_name) b.set_cors(cors_conf)So long as this is done before the server begins running the web page code, you’re fine, otherwise parts of the page won’t show up (like fonts). You can make a custom Django command to run to make this occur when you need.
We connect to the S3 bucket, and add that rule. And then, we’re done! Programmatic, and The Right Way.