Skip to content

Joy another softtester bug and cleanup for me :(

While soubmitted my new pad file I’ve obviouly been examining it on softtester.

Although I do look at pad file addition submissions and convert them to update submissions if they are re-submitted. Somehow I’ve now got 3 copies of mdbsecure on softtester. One from my old site and 2 from the new.

Although the 2 new ones are using the same author record.

Joy, just wonder how many duplicates other authors have :(

by JM

Similar Posts:

{ 5 } Comments

  1. Dipsy | August 23, 2008 at 8:06 am | Permalink

    Ahh I’m actually determining unique programs / pads incorrectly, by PAD URL and what I call PAD CRC, which is a CRC value of the string contents of the pad file.

    So obviously this is wrong as the reason a pad file gets re-submitted is that its been updated and therefore PAD CRC will be different.

    So what makes a program unique?
    I would, off the chuff, say its name, but names can change over time, like my two versions, including the year.

    Suggestions?

  2. MikeL | August 23, 2008 at 9:17 am | Permalink

    I think my site does suffer from duplicates but I added a duplicates deletion page to the admin section. It goes through and deletes anything it thinks is a duplicate. When I first ran it, it deleted about 500 entries out of 16,000. I’m now up to nearly 21000 entries.

    When a submission is made, the admin pages also flag the submission as a duplicate before I ever accept the item into the archive.

    Duplicates are detected in 2 ways:-

    1. A link to the same download URL but in a different PAD file (PAD file URL) in the database.

    2. The same program title and company name in 2 different PAD files.

    First entries matching point 1 are deleted and then entries matching point 2 are deleted. When a PAD is deleted, it’s always the later occurrence that is deleted.

    You have a good point about names changing over the years. Match type 2 obliterates genuine name change PADs. Authors probably change names rarely though. If there is a genuine name change, authors have the option of telling me so they can properly remove the old branded product.

    Note that MDBSecure 2009 would be considered as a seperate title so please submit it ….

  3. Dipsy | August 23, 2008 at 9:33 am | Permalink

    I guess if we use my program / pad file as an example to talk about its a good starting point.

    I think your point 2 is flawed as a company can have many products.

    OK, let us try and come up with a definitive list of things which we need to check.

    1) unique pad url in database, I guess this would solve my new mdbsecure problem straight away.

    2) Unique download url in DB, again this would solve….

    3) What about program names / pro version etc, I’m not sure if theres an easy way to solve this. I mean we could remove spaces, numbers, full stops and some keywords like PRO / LITE etc. But I don’t think this would be definative.

    I guess point 3 would mean I would only have 1 MDBSecure in my DB. Which I guess strictly speaking should be correct.

    Thoughts?

  4. Dipsy | August 23, 2008 at 12:22 pm | Permalink

    Well I’ve fixed my problem, with duplicate pad urls, which is OK for now’ish.
    I already had a delete duplicates php script from earlier this year, had forgotten. But I should get any more, done a program update.

    However I think its a good idea if we discuss this further and come up with a near bulletproof solution.

  5. MikeL | August 23, 2008 at 8:13 pm | Permalink

    Sure thing. I’ll read my code more closely and double-check the duplicate detection algorithm.

    I think point 2 in my firt comment is OK. It will only say a PAD is duplicate if both the company and program name are the same so a company can list many products without being deleted.

Post a Comment

You must be logged in to post a comment.