Skip to content

How, House keeping process to remove blacklisted items from DB?

Theres now a blacklisting feature in place at the point where authors / spammers submit their pad files to softtester, behind the authors page.

This stops quite a bit of spam.

But I’ve already got spam and things could be determined as spam at a later date.

So I need a process which will look at the softtester MySQL database and deal with them.

However, there are various problems.
1. The black list is a list of domains, the fields in the database are full urls, so I’d have to do a like % … % query rather than an equals, which will be slow.
2. These fields don’t have indexes.
3. Its going to take a long time to scan the database / fields for each of these domains on a regular basis.

Any solution has to automated and require no involvement from me, when its up and running.

Solutions.
1. Create an offline solution which works on a copy of the database.
Pros.
Quick and won’t affect the web server
Cons.
To be fully automated it would need to grab a copy of the database, import it to a test database then perform the actual process, then action the live database.

2. Do the process manually a couple of times a year.
Pros.
Erm, not sure.
Cons.
Lots of effort, time and lots of transactions on the live database.

Thoughts?

by JM

Similar Posts:

{ 4 } Comments

  1. MikeL | January 9, 2009 at 3:24 pm | Permalink

    I must admit that although I blacklist NEW submissions I don’t go back and delete all existing listings from the same URL.

    The reason for this is that sometimes I blacklist sites submitting screensavers if I think they are taking the mickey. There was one site that started submitting Bald Eage ScrSvr 1, Bald Eagle ScrSvr 2. By the time he’d got to number 40 I had enough and blacklisted him. However, I’m still listing all his other screensavers.

    I could go back and do things manually but I’d probably update my blacklist management page to add a “Delete All From This URL” type of link/ command. I’d then spend a few minutes a day going through the list.

  2. Dipsy | January 9, 2009 at 3:31 pm | Permalink

    But there are some which will have slipped through the net.

  3. MikeL | January 9, 2009 at 4:16 pm | Permalink

    Agreed. But you can’t win them all and at least the ratio
    of spam to real software listings should reduce in the
    future.

  4. Dipsy | January 9, 2009 at 4:35 pm | Permalink

    Well…

    For me, there will be a gap for the period between when I receive and accept (automatically) and when you update the black list.

    There will be some getting through there too.

    I do need something to check afterwards too.

Post a Comment

You must be logged in to post a comment.