Thursday, September 23, 2010

How YouTube detects copies of copyrighted material ?

Coding Horror Blog of Jeff Atwood usually have very useful and well written articles.  In the recent article titled 'YouTube vs. Fair Use' he talks about his experience of uploading a 90 second video from a movie as reference to a blog article. The interesting part of the article is his observations about how You Tube is able to 'detect' that this 90 sec video is from some movie (a copyrighted material).

While reading this article, I discovered a bunch of interesting links and information about the detecting the copies of audio and video files.

A TED Talk by Margaret Gould Stewart on "How YouTube thinks about copyright" 
The interesting parts of this video describe how YouTube detects possible 'copies' of the copyrighted material.
"The scale and speed of this system is truly breathtaking -- we're not just talking about a few videos,we're talking about over 100 years of video every day between new uploads and the legacy scans we regularly do across all of the content on the site. And when we compare those 100 years of video, we're comparing it against millions of reference files in our database. It'd be like 36,000 people staring at 36,000 monitors each and every day without as much as a coffee break. "
While Google tools usual work on massive scale, this one is in a class of its own. As Jeff has observed in his article, the scope and scale is AMAZING.

I also discovered an interesting mobile phone application named "Shazamwhile reading the related linksShazam is an application which you can use to analyse/match music. When you install it on your phone, and hold the microphone to some music for about 20 to 30 seconds, it will tell you which song it is from.
  • This is an article which explores "How Shazam works?"
  • There is another article which describes an experimental implementation of Shazam in Java. "Creating Shazam in Java". The code is not available because of Patent issues.

Duplication detection (in text, Audio and video) is very interesting problems. Implications of automatic duplication detection are useful as well as frightening.

1 comment:

Unknown said...

Yeah Nikhil, I totally agree with you. I was at once awed by youtube/google's capabilities. And yes, having worked on duplication detection before I know how difficult it can be and what youtube does is truely amazing! Infact I think their treatment of what happens when a transgression is observed is most interesting. DRM itself is now changing into something quite different.