Rails + Paperclip + Open-Uri – Downloading files from the internet and saving them with Paperclip

I don't like hotlinking. Being dependent on someone’s else infrastructure makes me a bit worried. Especially when I don't know who this "someone" is. That's why I've decided to download all the images that users embed in comments. After download, I attach them with Paperclip to their comments. Thanks to that, I can be sure, that they will always be served from my own servers (also I can use them later for some fancy stuff).

Theoretically, to obtain this, we could do something like that:

# We will use open-uri to download embedded images
require "open-uri"

file = open(image_url)
current_comment.pictures.create!(file: file)

Unfortunately this will "almost" work. If we try to get the file url:

current_picture.file.url #=> '/images/pictures/12313/file.'

we'll get it without an file extension. This happens, because open-uri creates a Tempfile without a file extension. Unluckily Paperclip uses file name to determine extension. To get around this issue, we need to create a new Tempfile with a valid extension and copy open-uri Tempfile content to newly created one:

# We will use open-uri to download embedded images
require "open-uri"

file = Tempfile.new ['', ".#{image_url.split('.').last}"]
file.binmode # note that our tempfile must be in binary mode
file.write open(image_url).read
file.rewind
file
current_comment.pictures.create!(file: file)

Update - Paperclip can do this on its own!

Posted by Aditya Sanghi (thanks a lot!):

current_comments.pictures.create!(file: URI.parse(image_url))

Although keep in mind, that you still need to handle 500, 404, etc errors (Paperclip can raise them).

Categories: Default

6 Comments

  1. I think you substituted `file` for `tmp` a few times in your second example. Otherwise, good write-up of a useful technique!

  2. True :) I was just copy pasting ;) thanks – updated!

  3. You’re underestimating paperclip a bit. It’s as easy as this

    current_comments.pictures.create!(file: URI.parse(image_url))

    This has been present in paperclip sines version 3.1.4

    See my answer on SO, http://stackoverflow.com/a/11584741/523692

  4. I really like this idea. I have a few blogs that are years old now and ancient broken content is definitely prevalent in my blogs’ comments! Thumbs up for a great idea: http://i.imgur.com/FVVSUY3.jpg

    I was wondering how your code looks when defending against hosts being down, 500 errors, 404s etc. What kind of stuff are you doing to protect against the images not being present at the time you try to DL them?

  5. Nah I never underestimate Paperclip, but sometimes it is just easier to write your own small feature, instead of checking whether or not it is already there ;) Although I must say – having this on Paperclip side is equal to having less code – which is almost always good. Thanks a lot!

  6. Yeah – I have broken content as well. Especially when the images have their own context and it is hard to understand whole discussion in comments without them. How do I handle 500, 404, etc? Well I try to download content 3 times with 12h intervals. After 3 attempts I just replace original image with a stub, that informs users that this image is no longer available.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑