Tutorial :What is the best way to programmatically detect porn images? [closed]



Question:

Akismet does an amazing job at detecting spam comments. But comments are not the only form of spam these days. What if I wanted something like akismet to automatically detect porn images on a social networking site which allows users to upload their pics, avatars, etc?

There are already a few image based search engines as well as face recognition stuff available so I am assuming it wouldn't be rocket science and it could be done. However, I have no clue regarding how that stuff works and how I should go about it if I want to develop it from scratch.

How should I get started?

Is there any open source project for this going on?


Solution:1

This was written in 2000, not sure if the state of the art in porn detection has advanced at all, but I doubt it.

http://www.dansdata.com/pornsweeper.htm

PORNsweeper seems to have some ability to distinguish pictures of people from pictures of things that aren't people, as long as the pictures are in colour. It is less successful at distinguishing dirty pictures of people from clean ones.

With the default, medium sensitivity, if Human Resources sends around a picture of the new chap in Accounts, you've got about a 50% chance of getting it. If your sister sends you a picture of her six-month-old, it's similarly likely to be detained.

It's only fair to point out amusing errors, like calling the Mona Lisa porn, if they're representative of the behaviour of the software. If the makers admit that their algorithmic image recogniser will drop the ball 15% of the time, then making fun of it when it does exactly that is silly.

But PORNsweeper only seems to live up to its stated specifications in one department - detection of actual porn. It's half-way decent at detecting porn, but it's bad at detecting clean pictures. And I wouldn't be surprised if no major leaps were made in this area in the near future.


Solution:2

This is actually reasonably easy. You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It's win-win.

#!python      import os, glob  from PIL import Image    def get_skin_ratio(im):      im = im.crop((int(im.size[0]*0.2), int(im.size[1]*0.2), im.size[0]-int(im.size[0]*0.2), im.size[1]-int(im.size[1]*0.2)))      skin = sum([count for count, rgb in im.getcolors(im.size[0]*im.size[1]) if rgb[0]>60 and rgb[1]<(rgb[0]*0.85) and rgb[2]<(rgb[0]*0.7) and rgb[1]>(rgb[0]*0.4) and rgb[2]>(rgb[0]*0.2)])      return float(skin)/float(im.size[0]*im.size[1])    for image_dir in ('porn','clean'):      for image_file in glob.glob(os.path.join(image_dir,"*.jpg")):          skin_percent = get_skin_ratio(Image.open(image_file)) * 100          if skin_percent>30:              print "PORN {0} has {1:.0f}% skin".format(image_file, skin_percent)          else:              print "CLEAN {0} has {1:.0f}% skin".format(image_file, skin_percent)  

This code measures skin tones in the center of the image. I've tested on 20 relatively tame "porn" images and 20 completely innocent images. It flags 100% of the "porn" and 4 out of the 20 of the clean images. That's a pretty high false positive rate but the script aims to be fairly cautious and could be further tuned. It works on light, dark and Asian skin tones.

It's main weaknesses with false positives are brown objects like sand and wood and of course it doesn't know the difference between "naughty" and "nice" flesh (like face shots).

Weakness with false negatives would be images without much exposed flesh (like leather bondage), painted or tattooed skin, B&W images, etc.

source code and sample images


Solution:3

I would rather allow users report on bad images. Image recognition development can take too much efforts and time and won't be as much as accurate as human eyes. It's much cheaper to outsource that moderation job.

Take a look at: Amazon Mechanical Turk

"The Amazon Mechanical Turk (MTurk) is one of the suite of Amazon Web Services, a crowdsourcing marketplace that enables computer programs to co-ordinate the use of human intelligence to perform tasks which computers are unable to do."


Solution:4


Solution:5

BOOM! Here is the whitepaper containing the algorithm.

Does anyone know where to get the source code for a java (or any language) implementation?

That would rock.

One algorithm called WISE has a 98% accuracy rate but a 14% false positive rate. So what you do is you let the users flag the 2% false negatives, ideally with automatic removal if a certain number of users flag it, and have moderators view the 14% false positives.


Solution:6

Nude.js based on the whitepaper by Rigan Ap-apid from De La Salle University.


Solution:7

There is software that detects the probability for porn, but this is not an exact science, as computers can't recognize what is actually on pictures (pictures are only a big set of values on a grid with no meaning). You can just teach the computer what is porn and what not by giving examples. This has the disadvantage that it will only recognize these or similar images.

Given the repetitive nature of porn you have a good chance if you train the system with few false positives. For example if you train the system with nude people it may flag pictures of a beach with "almost" naked people as porn too.

A similar software is the facebook software that recently came out. It's just specialized on faces. The main principle is the same.

Technically you would implement some kind of feature detector that utilizes a bayes filtering. The feature detector may look for features like percentage of flesh colored pixels if it's a simple detector or just computes the similarity of the current image with a set of saved porn images.

This is of course not limited to porn, it's actually more a corner case. I think more common are systems that try to find other things in images ;-)


Solution:8

The answer is really easy: It's pretty safe to say that it won't be possible in the next two decades. Before that we will probably get good translation tools. The last time I checked, the AI guys were struggling to identify the same car on two photographs shot from a slightly altered angle. Take a look on how long it took them to get good enough OCR or speech recognition together. Those are recognition problems which can benefit greatly from dictionaries and are still far from having completely reliable solutions despite of the multi-million man months thrown at them.

That being said you could simply add an "offensive?" link next to user generated contend and have a mod cross check the incoming complaints.

edit:

I forgot something: IF you are going to implement some kind of filter, you will need a reliable one. If your solution would be 50% right, 2000 out of 4000 users with decent images will get blocked. Expect an outrage.


Solution:9

A graduate student from National Cheng Kung University in Taiwan did a research on this subject in 2004. He was able to achieve success rate of 89.79% in detecting nude pictures downloaded from the Internet. Here is the link to his thesis: The Study on Naked People Image Detection Based on Skin Color
It's in Chinese therefore you may need a translator in case you can't read it.


Solution:10

short answer: use a moderator ;)

Long answer: I dont think there's a project for this cause what is porn? Only legs, full nudity, midgets etc. Its subjective.


Solution:11

Add an offensive link and store the md5 (or other hash) of the offending image so that it can automatically tagged in the future.

How cool would it be if somebody had a large public database of image md5 along with descriptive tags running as a webservice? Alot of porn isn't original work (in that the person who has it now, didn't probably make it) and the popular images tend to float around different places, so this could really make a difference.


Solution:12

If you're really have time and money:

One way of doing it is by 1) Writing an image detection algorithm to find whether an object is human or not. This can be done by bitmasking an image to retrieve it's "contours" and see if the contours fits a human contour.

2) Data mine a lot of porn images and use data mining techniques such as the C4 algorithms or Particle Swarm Optimization to learn to detect pattern that matches porn images.

This will require that you identify how a naked man/woman contours of a human body must look like in digitized format (this can be achieved in the same way OCR image recognition algorithms works).

Hope you have fun! :-)


Solution:13

Seems to me like the main obstacle is defining a "porn image". If you can define it easily, you could probably write something that would work. But even humans can't agree on what is porn. How will the application know? User moderation is probably your best bet.


Solution:14

I've seen a web filtering application which does porn image filtering, sorry I can't remember the name. It was pretty prone to false positives however most of the time it was working.

I think main trick is detecting "too much skin on the picture :)


Solution:15

Detecting porn images is still a definite AI task which is very much theoretical yet.

Harvest collective power and human intelligence by adding a button/link "Report spam/abuse". Or employ several moderators to do this job.

P.S. Really surprised how many people ask questions assuming software and algorithms are all-mighty without even thinking whether what they want could be done. Are they representatives of that new breed of programmers who have no understanding of hardware, low-level programming and all that "magic behind"?

P.S. #2. I also remember that periodically it happens that some situation when people themselves cannot decide whether a picture is porn or art is taken to the court. Even after the court rules, chances are half of the people will consider the decision wrong. The last stupid situation of the kind was quite recently when a Wikipedia page got banned in UK because of a CD cover image that features some nakedness.


Solution:16

Two options I can think of (though neither of them is programatically detecting porn):

  1. Block all uploaded images until one of your administrators has looked at them. There's no reason why this should take a long time: you could write some software that shows 10 images a second, almost as a movie - even at this speed, it's easy for a human being to spot a potentially pornographic image. Then you rewind in this software and have a closer look.
  2. Add the usual "flag this image as inappropriate" option.


Solution:17

The BrightCloud web service API is perfect for this. It's a REST API for doing website lookups just like this. It contains a very large and very accurate web filtering DB and one of the categories, Adult, has over 10M porn sites identified!


Solution:18

I've heard about tools which were using very simple, but quite effective algorithm. The algorithm calculated relative amount of pixels with color value near to some predefined "skin" colours. If that amount is higher than some predefined value then image is considered to be of erotic/pornographic content. Of course that algorithm will give false positive results for close-up face photos and many other things.
Since you are writing about social networking there will be lots of "normal" photos with high amount of skin colour on it, so you shouldn't use this algorithm to deny all pictures with positive result. But you can use it provide some help for moderators, for example flag these pictures with higher priority, so if moderator want to check some new pictures for pornographic content he can start from these pictures.


Solution:19

This one looks promising. Basically they detect skin (with calibration by recognizing faces) and determine "skin paths" (i.e. measuring the proportion of skin pixels vs. face skin pixels / skin pixels). This has decent performance. http://www.prip.tuwien.ac.at/people/julian/skin-detection


Solution:20

Look at file name and any attributes. There's not nearly enough information to detect even 20% of naughty images, but a simple keyword blacklist would at least detect images with descriptive labels or metadata. 20 minutes of coding for a 20% success rate isn't a bad deal, especially as a prescreen that can at least catch some simple ones before you pass the rest to a moderator for judging.

The other useful trick is the opposite of course, maintain a whitelist of image sources to allow without moderation or checking. If most of your images come from known safe uploaders or sources, you can just accept them bindly.


Solution:21

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.

â€" United States Supreme Court Justice Potter Stewart, 1964


Solution:22

You can find many whitepapers on the net dealing with this subject.


Solution:23

It is not rocket science. Not anymore. It is very similar to face recognition. I think that the easiest way to deal with it is to use machine learning. And since we are dealing with images, I can point towards neuronal networks, because these seem to be preferred for images. You will need training data. And you can find tons of training data on the internet but you have to crop the images to the specific part that you want the algorithm to detect. Of course you will have to break the problem into different body parts that you want to detect and create training data for each, and this is where things become amusing.

Like someone above said, it cannot be done 100% percent. There will be cases where such algorithms fail. The actual precision will be determined by your training data, the structure of your neuronal networks and how you will choose to cluster the training data (penises, vaginas, breasts, etc, and combinations of such). In any case I am very confident that this can be achieved with high accuracy for explicit porn imagery.


Solution:24

This is a nudity detector. I haven't tried it. It's the only OSS one I could find.

https://code.google.com/p/nudetech


Solution:25

There is no way you could do this 100% (i would say maybe 1-5% would be plausible) with nowdays knowledge. You would get much better result (than those 1-5%) just checking the image-names for sex-related-words :).

@SO Troll: So true.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »