Wednesday, August 17, 2016

Drupal Storage API and AWS S3 tutorial

Drupal Storage API and AWS S3 tutorial



By default, Drupal supports a local public and private file system for storing user uploaded files (images, pdfs, etc.).  While it works well for most use cases, there are disadvantages.  For example, its nearly impossible to switch from public to private or vice versa.  Once you make your choice, youre stuck with it.  Also, its somewhat limiting in the modern cloud era with cheap 3rd party cloud storage.  If you have disk space constraints and are considering moving your files to the cloud, the two most popular active Drupal options are S3 File System (s3fs) and Storage API (storage).

s3fs module

    Pros:
  • Active development (three main developers)
  • Uses the official AWS SDK (version 2 only, though)
  • Easy to set up and use
  • Provides a migrate mechanism
    Cons:
  • Vendor lock-in (only supports AWS S3)
  • Name conflicts with other unrelated s3fs projects
  • Doesnt support advanced CSS/JS aggregation
  • Manual file metadata cache refreshing required (issues with modules like imce)
  • AWS S3 has noticeable performance issues (thumbnail creation, initial page load, etc.)
  • No documented recommendations for dev/stage/prod workflow (all sites sharing one bucket)
  • Bucket tagging via the Drupal config not supported

storage module

    Pros:
  • Supports multiple vendors (AWS S3, Rackspace, FTP)
  • Supports CSS/JS aggregation natively 
  • Provides a migrate mechanism
  • More flexible, modular architecture allows for future growth and enhancements
  • De-duplication saves disk space and money
  • Smart cron workflow allows fast local thumbnail creation/image viewing and lazy uploading to AWS S3
  • Support for AWS CloudFront (CDN) so no performance problems
    Cons:
  • Harder to initially set up (somewhat confusing terminology and outdated documentation)
  • Semi-actively developed by only two main developers
  • Custom module-coded implementation of the AWS S3/CloudFront API (instead of the official SDK) so future Amazon API updates may break the module until updated
  • xmlsitemap support currently requires a patch
  • Dynamic image styles (e.g. responsive images) requires additional setup
  • No imce support
  • No documented recommendations for dev/stage/prod workflow (all sites sharing one bucket)
  • Doesnt work with PostgreSQL
  • Bucket tagging via the Drupal config not supported

    After weighing the pros and cons, I eventually decided to go with Storage API.  Heres how to migrate an existing site file system to AWS S3 using that module:

    1.  Download the necessary modules: drush dl imageinfo_cache storage_api-7.x-1.x-dev storage_api_stream_wrapper-7.x-1.x-dev storage_api_populate-7.x-1.x-dev

    2.  If needed, apply patch for xmlsitemap

    3.  Optionally apply this fix to suppress a false positive nag error

    4.  Enable the modules: drush en storage storage_stream_wrapper storage_api_populate imageinfo_cache

    5.  Go to /admin/config/media/file-system and change the default download method to Storage API (public or private depending on your site needs)



    6.  Now, for the somewhat labor-intensive step: update all your content type fields that rely on the file system to use Storage API.  For example, edit the Article Drupal content type (/admin/structure/types/manage/article/fields) and edit the Image field and change the upload destination to Storage API (public or private depending on your site needs)

    7.  Once all your content types are updated to use Storage API, youre ready to have your existing files managed by Storage API.  Go to /admin/structure/storage/populate and check Migrate all local files and Confirm and then click Start

    8.  After the process update completes, you can disable the populate module: drush dis storage_api_populate

    9.  Now that all your static files are managed by Storage API, you need to migrate your dynamic image styles: drush image-generate  Choose all for any prompts:


    10.  Once the image styles have been generated (which may take a while to complete), youre ready to verify the migration.  Move everything in the sites files directory except the storage folder and the .htaccess file to a temporary backup location and then run drush cc all && drush cron

    11.  Now, verify the site functions normally.

    Congratulations, youve updated your site to use Storage API!

    ...But youre probably thinking, "Okay, so whats the big deal?  The site looks the same and it just seems like all the files moved into a new folder called storage.  So what?!"

    Well, get ready to experience the awesome power of Storage API by migrating your file system to AWS S3!  (or you could just as easily move them to Rackspace, etc. using the same process...)

    1.  First, youll need an AWS account with IAM permissions to create S3 buckets and use CloudFront:

    {
    "Statement": [{
    "Sid": "ModifyAssets",
    "Action": [
    "s3:DeleteObject",
    "s3:DeleteObjectVersion",
    "s3:GetObject",
    "s3:GetObjectAcl",
    "s3:GetObjectVersion",
    "s3:GetObjectVersionAcl",
    "s3:PutObject",
    "s3:PutObjectAcl",
    "s3:PutObjectVersionAcl"
    ],
    "Effect": "Allow",
    "Resource": [
    "arn:aws:s3:::yourbucketname/*"
    ]
    }, {
    "Sid": "BucketRights",
    "Action": [
    "s3:ListBucket",
    "s3:ListAllMyBuckets"
    ],
    "Effect": "Allow",
    "Resource": [
    "arn:aws:s3:::*"
    ]
    }]
    }


    {

        "Sid": "Stmt1450391402000",

        "Effect": "Allow",

        "Action": [

    "cloudfront:CreateDistribution",
    "cloudfront:CreateInvalidation",
    "cloudfront:DeleteDistribution",
    "cloudfront:GetDistribution",
    "cloudfront:ListDistributions",
    "cloudfront:UpdateDistribution",
    "cloudfront:ListInvalidations"
    "cloudfront:ListStreamingDistributions"
        ],
        "Resource": [
    "*"
        ]
    }

    2.  Once the account is created with the necessary IAM permissions, youll need to create an access key:



    3.  Once you have your access key ID and Secret, go to your Drupal site and browse to /admin/structure/storage/create-container

    4.  Choose Amazon S3 from the service dropdown and click Next

    5.  Provide your access key ID, Secret, and a globally unique bucket name (I recommend a name that does NOT include a dot [.] since thats interpreted as a subdomain).  In addition, select the AWS region you want to create the bucket in.  Finally, make sure to check the Serve with CloudFront checkbox (note: streaming with CloudFront is out of scope for this tutorial).  You can optionally select the Reduced redundancy checkbox for cheaper 99.99% durability.  Then click Create.


    Note: it may take up to 20 minutes for the CloudFront processing to complete on the AWS backend but you can continue the setup process below immediately:

    6.  Go to /admin/structure/storage/create-class and give it a descriptive name like "Cloud" (keep Initial container Filesystem for performance reasons) and then click Create class


    Note: like others, I have no idea what the other checkboxes do so leave them unchecked.

    7.  On the subsequent screen, choose Amazon S3 (the container you created in the step above) from the dropdown and then click Add


    8.  Now, go to /admin/structure/storage/stream-wrappers and click edit for Public, Private, or both (depending on your use case) and change the Storage class to Cloud




    9.  Finally, run drush cron to actually push your local files to the AWS S3 bucket.  This may take a while so I strongly recommend using drush instead of the Drupal web interface to run cron.

    10.  Verify the site functions as expected.  The images should now be served from amazonaws.com or cloudfront.net

    11.  Celebrate faster page load times and more file system redundancy!  Also, now that your files are in S3, you can even set up a backup strategy for Infrequent Access or Glacier.


    Available link for download