ffmpeg Scene Change Detector Examination

Feb 27, 2014 - By Rusty Conover

ffmpeg is a very versatile piece of software for working with video. At its core there are video filters, and one of these filters is called select. The select filter allows a user to only write certain frames that pass thresholds to the output, in other words you can pull out interesting pieces of video using a select filter. A common use of the select filter is to produce thumbnails for each scene in a video.

What are Scene Changes?

There are many different ways to detect scene changes. Some are more difficult to detect than others but they've been characterized into three main groups, hard cuts, gradual transitions and fades. Hard cuts are the video flips from one scene to another instantly. Gradual transitions are dissolves from one scene to another over a certain period of times. Fades are where the video fades to a particular solid color. Dissolves between scenes are generally used to denote the passage of time. Cuts are frequently used in dialog, and fades are useful for beginning or ending clips of video or logical divisions in a plot.

ffmpeg's scene detector

ffmpeg's scene detector works such that it produces a value between 0 and 1 that denotes how likely a scene change has occurred at a particular frame. It is well documented on their site.

At a high level, the current scene detector compares each frame of the video the one preceding it to compute the scene change score score. The current implemented uses a metric called sub of absolute differences or SAD.

Annotated version of ffmpeg's scene detector


static double get_scene_score(AVFilterContext *ctx, AVFrame *frame)
{
  double ret = 0;

  // This is a pointer to things dealing with the select filter, importantly it points
  // to the DSP context that we will use.

  SelectContext *select = ctx->priv;

  // A reference to the previously processed frame, will be useful to compare against.

  AVFrame *prev_picref = select->prev_picref;

  // If there is a previous picture, and the frame with and height are
  // equal, and the linesize is also equal, I don't know of a stream
  // where the frame size is variable, but I wouldn't be surprised.

  if (prev_picref &&
      frame->height    == prev_picref->height &&
      frame->width    == prev_picref->width &&
      frame->linesize[0] == prev_picref->linesize[0]) {

    // x and y are counters for offsets in the picture, used in nb_sad
    is the number of pixels compared between // the frames.

    int x, y, nb_sad = 0;
    //  sad is sum of absolute differences
    int64_t sad = 0;

    double mafd, diff;

    // Both of these variables are pointers to the line data of the
    // frame.  the scene filter requires that the frame format be
    // RGB24 or BGR24 which is 24 bits per pixels ordered either red,
    // green, blue or blue, green, red.  The ordering doesn't matter
    // much because SAD is insensitive to the ordering.
    
    uint8_t *p1 =      frame->data[0];
    uint8_t *p2 = prev_picref->data[0];

    // the linesize is the length of the picture data line one row of
    // pixels in other words.

    const int linesize = frame->linesize[0];
      
    // Why we're looping by increments of 8 is because that is is the
    // best for the SAD function's DSP/MMX implementation.  also loop
    // over the height of the image while incrementing by 8 on every
    // interation, 0, 8, 16, 24, 32

    for (y = 0; y < frame->height - 8; y += 8) {

      // loop over the width of the frame*3 incrementing by 8 on every
      // iteration 0, 8, 16, 24, 32 because the frame is RGB or BGR we
      // want to run SAD against the previous frame's same color.  So
      // the offsets need to be correct.

      for (x = 0; x < frame->width*3 - 8; x += 8) {
        
        // calculate the sum of absolute differences between the
        // pixels the SAD function looks at pixels 8 wide by 8 across
        // and returns the difference.

        sad += select->c.sad[1](select, p1 + x, p2 + x,
                                linesize, 8);

        // Increment the number of pixels compared so we can calculate
        // the average

        nb_sad += 8 * 8;
      }

      // To move forward through the image, while y is being incremented, 
      // we just increment 
      p1 += 8 * linesize;
      p2 += 8 * linesize;
    }

    // clear out any MMX state that was used by the DSP functions.
    emms_c();

    // calculate the mean absolute frame difference, the SAD divided
    // by the number of pixels compared.  this is the mean difference
    // per pixel compared to the previous frame, it is a floating
    // point double.  Since frames may not an even multiple of 8, its
    // best to use the number of pixels compared.
    mafd = nb_sad ? sad / nb_sad : 0;


    // calculate the difference between the previous mafd and the
    // current mafd.
    diff = fabs(mafd - select->prev_mafd);

    // take the smaller of the MAFD and the difference between the
    // previous MAFD divided by 100 and clip the float between 0 and 1
    ret  = av_clipf(FFMIN(mafd, diff) / 100., 0, 1);

    // save the mafd for the next comparison.
    select->prev_mafd = mafd;

    // free the reference to the previous frame.
    av_frame_free(&prev_picref);
  }

  // save the current frame as the previous frame.
  select->prev_picref = av_frame_clone(frame);

  return ret;
}

Commentary

The ffmpeg's scene change detector is fast because it leverages possibly optimized CPU instructions to calculate the SAD value across frames. This is important because it is comparing the pixels of two frames which can be quite large when dealing with HD sized video (1920x1080). You can easily find many examples of how ffmpeg's detector performs on the web, it does do a remarkable job and is sensitive to all changes that can be detected on frame by frame basis.

The weakness of ffmpeg's scene change detector is that it only compare each frame to the one preceding it. Since some scene changes occur over many frames such as cross dissolves or fades ffmpeg may not detect these changes since the transition is very gradual when looking at a frame by frame basis.

There is room for improvement to incorporate other scene change detection algorithms to ffmpeg to detect scene changes.