Skip to content

In PHP is there a reliable way to determine if a remote file exists?

On softwarelode, I coded up the PHP to detect if a program screenshot exists at the URL given in the PAD file. The problem is that sometimes the code works and sometimes it doesn’t. Google Webmasters likes to delay giving me errors but in the last few weeks it’s started complaining of about 100 cases where I give the URLs for the softwarelode screenshot pages but the screenshots don’t exist.

What code, if any, do you use on SoftTester?

Similar Posts:

{ 17 } Comments

  1. Dipsy | March 30, 2009 at 12:10 pm | Permalink

    Yes missing remote images / files is a problem.

    Thats one of the reasons I use to store screen shots, however this became impracticable. However I’ve always had this problem with the icons, which are most obvious when they are missing on the front
    page.

    SoftTester uses a vb.net program, which runs hourly during the night. I do checks with that.

    Although theres still room to having missing images if they delete them before I get round to updating. However, this can be a long time.

    Hmmm.
    I think it should be possible to write a php script, which loads the image remotely and if it isn’t found load the default no-image.

  2. Blueskimonkey | March 30, 2009 at 12:44 pm | Permalink

    The only other problem with remotely pulling images is
    that your site is then relying on a remote server for the
    load time.

    For example Dipsy’s site would load in less than 2s
    however certain images (slow connections) cause pages
    to take over 5s to load.

    Perhaps you guys need to start to store these images
    again but keep the images upto date as possible and
    always purge out deleted items.

    You could write a script to check the remote image if you
    might want to write something in CURL if using PHP and
    set the time out very low otherwise your page could be
    hanging due to waiting for a remote connection to the
    image.

  3. Dipsy | March 30, 2009 at 12:50 pm | Permalink

    I really don’t like the idea of storing images again.

    CURL ?

    What about this ?

    < ?php
    $img = @file_get_contents("http://www.any.com/anygif");
    if ($img !== FALSE) {
    header("content-type: image/gif");
    echo $img;
    } else {
    echo "“;
    }
    ?>

  4. Blueskimonkey | March 30, 2009 at 12:57 pm | Permalink

    Check your sites using pingdom tool:

    http://tools.pingdom.com/

    Dipsys site / homepage takes 8.3sec to load

    without all the extra images (offsite) your load time
    would be 4s.

    You could use file_gets_content but if the remote server
    is down the time it will take to finish will be the maximum
    php server timeout time around 30s.

    http://uk.php.net/curl

    Personally I would store the images makes more sense /
    a faster site and no horrible image not found.

    On another topic any ideas why when adding a comment
    to a post the line wraps are funny in WP?

  5. Dipsy | March 30, 2009 at 1:02 pm | Permalink

    Whys CURL better?

    I dunno about wrapping, but my first comment was ok, but our later ones are not, weird.

  6. Blueskimonkey | March 30, 2009 at 1:04 pm | Permalink

    CURL is better as you can tweak the timeout

    curl_setopt($process, CURLOPT_TIMEOUT, 30);

    // 30 seconds

  7. Dipsy | March 30, 2009 at 1:09 pm | Permalink

    LOL my wrap is ok with my last comment :)

    So wouldn’t a function like what I pasted above with curl be ok?

    Actually, if the function is ran a lot for all the images on the page, wouldn’t that all accumulate the load time of the page?

  8. Blueskimonkey | March 30, 2009 at 1:19 pm | Permalink

    Yes the load time of your actual page would be much
    longer so if your display 30 images that function / routine
    would have to run that many times with the max wait
    time you specified.

    I can’t remember how big your image repository was
    before i know one of the problems we had was that a
    single folder contained thousands of images which made
    explorer take its time showing the contents.

    So if you was to store locally perhaps a more structured
    image store.

    I guess the other issue for you guys is bandwidth but
    perhaps if you was to resize grabbed images when
    processing pads then your local store would be much
    smarter.

  9. Dipsy | March 30, 2009 at 1:23 pm | Permalink

    I did resize mine, to a max of 250 by 250.

    Think I had about 250 or 300 meg or maybe more. Also I wasn’t storing the icon images at all.

    Thinking about it, if the code is called form img src, then wouldn’t they run at the same time?

  10. Blueskimonkey | March 30, 2009 at 1:32 pm | Permalink

    Running the code is server load time wheres as you are
    doing today

    What we could do is enable content expiration on your
    screenshots dir and set this to x days which means if you
    have a repeat visitor they will only download an image
    once not every time because the browser will know its
    from cache.

    This will help reduce bandwidth used, I’m also installing
    hot link protection software soon which helps prevent
    external sites stealing / linking to your images / media
    content.

    Aaron

  11. Dipsy | March 30, 2009 at 1:37 pm | Permalink

    In PHP you can also use set_time_limit(10) or less that 10.

    If have a script which is called from img src, that would be running simultaneously?

  12. Blueskimonkey | March 30, 2009 at 1:42 pm | Permalink

    Yes you could set that but if your whole page doesnt
    process in less than 10s then your users will get an ugly
    message.

    I guess you could have seperate scripts which run in the
    img tag I’m sure though this will still affect your whole
    page load time as the main page will be waiting for sub
    scripts response.

    Perhaps you just need to give it a test I’m still convinced
    storing optimised sized images is better for all.

    Is there any reason why you don’t want to store images?

  13. Dipsy | March 30, 2009 at 1:48 pm | Permalink

    What about if I included the full url ? So it wouldn’t run like a sub page.

    >Is there any reason why you don“t want to store images?

    Plenty.
    1) Disk space, which I’m running low on.
    2) Its difficult for me to keep images up-to-date although that isn’t just a problem with images, it applies to all the pad file info.
    3) Hot linking, but as you say you could fix this now.

    I guess the inline img src route would also incur more bandwidth, unless this is another thing you don’t track ;) ?

  14. Blueskimonkey | March 30, 2009 at 2:03 pm | Permalink

    1) I Guess this and BW is the biggest issue which of
    course can be upgraded but always comes down to
    costs.

    2) Perhaps rename the images to be unique and match
    the Author / PAD i.d then when you purge the DB record
    Purge thr files.

    3) Yep Hot Link Protector will be coming in the next month

    I don’t think Helm / The Server tracks or has anyway to
    track bandwidth used my scripts.

  15. Dipsy | March 30, 2009 at 2:12 pm | Permalink

    2) I meant that theres so many records and I only do 10 or 20 at the end of each hour during the night, after the new additions / updates are dealt with. So to get round all the records, must take 6
    months + I guess.

    Having said this, I have’t checked the update flag to see.

    Its one of these things where I could keep improving the program, but I’d prefer to put the time into PHP instead. Although SoftTester development isn’t my top priority atm. My next softtester task
    will be to upgrade / improve the search script.

    I suppose missing images wouldn’t such a big problem, if my database would up-to-date and kept that way.

    Do you get many missing images Mike? Is it due to old records? I’m guess that as your site is quite new, it can’t be old records. Maybe slow / down sites?

  16. MikeL | March 30, 2009 at 4:17 pm | Permalink

    Lots of comments!

    The comment wrapping thing has always wrapped my
    lines earlier.

    Here’s the code I was using:

    function url_exists($url) {
    $hdrs = @get_headers($url);
    return is_array($hdrs) ? preg_match(‘/^HTTP\\/
    \\d+\\.\\d+\\s+2\\d\\d\\s+.*$/’,$hdrs[0]) : false;
    }

    Figure that out!

    I do get some missing images BUT what I find confusing
    is that sometimes the code works and at others it
    doesn’t. I need to investigate more.

    There is another strategy that could be used to reduce
    the page load time a bit:-

    a. Record in the DB whether an image is available when
    the PAD is submitted.
    b. When a page is loaded, if an image should be
    available according to the DB, check if it really is using
    code like the above. If the image is no longer available
    update the DB to say so and avoid a check next time the
    page is loaded.
    c. Go back to a. when an updated PAD is available.

  17. Dipsy | March 30, 2009 at 4:35 pm | Permalink

    With the code you are using, the file is being got twice, once by you and once by the user.

    I think I do check already if the screen shot is available when the pad is submitted.

    I guess the DB approach you mention above could help, but you don’t know whether their server is down, the internet connection is slow or any number of factors.

Post a Comment

You must be logged in to post a comment.