Tags:
When using cURL the most typical code examples will have you fetch a whole remote response in one go by putting it into a variable. This is fine for small requests, like a web page, but when you scale up, you will run into issues with memory consumption, timeouts, and a poor user experience.
Imagine the scenario: you have a music website where people can download sheet music in the plain text ABC notation format. The music files themselves come from a variety of different third party websites, and you make it all available from one place on your own site.
You have decided that only paying members can access them. So what do you do? You write a thin layer in PHP to verify a member and only show the music in the browser if they pass verification.
A Simple Proxy
You might find yourself doing something like this:
function download_music($song_name)
{
if(is_authenticated_user())
{
$url = "http://tests.local/proxy/$song_name.abc";
$ch = curl_init($url);
curl_setopt_array($ch,
[
CURLOPT_RETURNTRANSFER => true,
CURLOPT_CONNECTTIMEOUT => 30,
]
);
header("Content-Type: application/octet-stream");
header("Content-Description: File Transfer");
header("Content-Disposition: attachment; filename=\"$song_name.abc\"");
$response = curl_exec($ch);
curl_close($ch);
echo $response;
exit;
}
}
This works perfectly for the problem in hand; the music files are small, so you've got no problems with memory consumption, and the downloads will be virtually as fast (there is a difference, but it will be impossible as a human to perceive!) as if you were directly accessing the file on the server.
Extending the Functionality
Your website is a huge success, everyone loves the music that you're offering, and you're looking at new ways to expand your membership numbers.
One user has suggested in the comments section that you could offer recordings of your music on the site to download. It's a great idea. So you just alter your proxy above to output the right filename and mime type for the recordings and you're done, right?
Oh dear, what happened‽ Suddenly, the web server is running out of memory all the time, and members are reporting that sometimes the longer recordings don't work at all and they get an error page! How did this happen?
Well, you might have already guessed from looking at the code. That curl_exec()
call, it loads in the whole file into a variable. If that file is a nice, small 259KB music notation file, but when it's a 20MB audio file then every download is consuming that much memory plus whatever is needed for the PHP parser for a typical request.
Stream the Response
The answer is simple, don't load that file into memory, stream it down to the browser instead. cURL luckily has a couple of little used options that are perfect for this:
CURLOPT_HEADERFUNCTION
CURLOPT_WRITEFUNCTION
They both operate in a similar manner. They will read in a remote stream and allow you to operate on the chunks that are returned as it recieves them. Here is how they're used:
function download_music($song_name, $type)
{
if(is_authenticated_user())
{
$url = "http://tests.local/proxy/$song_name.$type";
$ch = curl_init($url);
curl_setopt_array($ch,
[
CURLOPT_RETURNTRANSFER => true,
CURLOPT_CONNECTTIMEOUT => 30,
]
);
header("Content-Type: application/octet-stream");
header("Content-Description: File Transfer");
header("Content-Disposition: attachment; filename=\"$song_name\"");
curl_setopt($ch, CURLOPT_HEADERFUNCTION, function($curl, $header)
{
header($header);
return strlen($header);
}
);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, function($curl, $body)
{
echo $body;
return strlen($body);
}
);
$response = curl_exec($ch);
curl_close($ch);
exit;
}
}
The first thing we've done is remove the hard-coded extension from the song file request. Obviously this is very simplified for the purposes of this article, you would definitely want to secure how that URL is built, as this is a gaping security hole!
Next, we add the two set_opt()
calls with our callback functions after the headers. These behave in similar ways, they accept a chunk, do something with it, and then return the length of the chunk. It's very important that you return the correct length in these callbacks, even if you do nothing with the chunk (e.g. if you discard a particular header, for example). If you don't, the callback will fail.
Note that if the remote resource (in this case an audio file) has its own Content-Type
, Content-Description
, or Content-Disoposition
headers, then yours will be overwritten. If that's the case, you can filter them out in the header callback function, as long as you still return the original header length.
Of interest here is how the $header
is directly usable to pass straight to a header()
call. This is because chunks are normally delimited with a carriage return followed by a new line. HTTP headers are also delimited like this, so they make perfect natural chunk break points. This means you'll be able to use those headers without alteration.
The body callback then just outputs the bytes as it receives them.
The Result
I tested the initial script running under PHP 7 using a 14MB audio file, and the peak memory usage for the PHP script came in at 18.5MB overall, whereas memory usage when using the callbacks was a much more comfortable 2MB! In-fact. that 2MB usage barely changed at all, no matter the files I threw at it. Here is a table of memory usage before and after for files of various sizes (rounded to the nearest half MB):
Original File Size | Unstreamed Memory Consumption | Streamed Memory Consumption |
---|---|---|
259B | 2MB | 2MB |
4MB | 8.5MB | 2MB |
14MB | 18MB | 2MB |
20.5MB | 25.5 | 2MB |
1.5GB | Out of memory after 134MB | 2MB |
That last one was a video I threw in for fun. As you can see, the streaming version wins hands down, and consumes only as much memory as it needs for the PHP parser and the chunk it's dealing with. When you're dealing with very small files, there's no noticeable difference, so you won't gain anything from streaming but you will have to write more code.
Comments