Spam, spam and more spam

Not sure what’s going on in the last few weeks but the number of submissions on all of my sites has dropped. The drop isn’t drastic, but what’s more worrying is that the percentage of spam submissions has increased dramatically. I now end up rejecting 40 to 50% of submissions.

There seem to be a large number of sites/ blogs using free dvd converter packages for submission purposes. There are also an increasing number of other software download sites using similar programs to make submissions. Also, I’ve noticed a raft of new download sites being created with random names, like This happened some years ago in the general web directory arena and resulted in the whole sector being devalued. I hope software download sites aren’t going the same way. Even if you aren’t worried too much about dubious submissions – content is content after all – since the majority of the submissions use very nearly the same text, it doesn’t take long to rack up a hundred or more pages with virtually identical content.

  1. Dipsy | February 21, 2010 at 4:58 pm | Permalink

    I don’t know whether I mentioned, I probably did.. but a few weeks ago I was doing some fixes to the softtester robot software after I was made aware that I was rejecting pad files which relied purely
    on the new pad spec category.

    At the time I was probably only accepting maybe 20 or 30 updates a day. Since the fixes, I get about 100 valid transactions a day. Apart from your black list I limit the amount of programs an author
    can have and the amount of programs in a category.

    “very nearly the same text”…
    Do you mean the description text, that could be time consuming to do a text search on our databases…

    The only thing I can think of to filter out spam is to store a crc number on the download file and only accept different crcs. But downloading all the programs each time would cause me problems.

    I agree that this could cause problems.

  2. MikeL | February 21, 2010 at 5:08 pm | Permalink

    Good point about the PAD spec. I don’t think this is the
    problem – though I can’t be certain. I do get about 100
    submissions a day on each of my sites. It’s just that a lot
    are spam.

    By similar, I mean that give or take a word, the titles
    and descriptions and images are identical. They are
    almost always very short descriptions too.

  3. Dipsy | February 21, 2010 at 5:24 pm | Permalink

    I have a query which shows 10 relevant programs at the bottom of my program listing page. Maybe deny if theres too many relevant programs ?

    Not sure if it produces a number e.g. % relevance…

    I actually get about 180 submissions a day, but about 80 are either spam emails or I reject to pad file problems.

    Maybe deny short descriptions ?

    Having said that maybe my progs might get rejected 🙂

  4. MikeL | February 21, 2010 at 5:33 pm | Permalink

    It could be perfectly valid that there are lots of relevant
    programs so I wouldn’t use that as a reason to reject.

    If you have a list of submissions with titles etc. it soon
    becomes easier to spot the crap. All I then do is hunt
    down programs by the same authors or with similar titles
    and delete the lot.

    In my admin backend, I have a duplicate set of pages to
    the main site, except that when the pages are accessed,
    extra DELETE links appear, making it a doddle to delete
    listings or everything by an author. For a while now, I’ve
    been stuck at about 27000 listings. As soon as the
    number gets to 27500 I seem to find a load of rubbish I
    can delete. It’s weird how after a while something slips
    you by and you discover 2% of your listings are
    screensavers with almost identical names.

  5. Dipsy | February 21, 2010 at 5:40 pm | Permalink

    I really must write something in to scan with your black list against existing listings. Been on my to do list for quite some time.

