Design Article
Next-generation surveillance system design
Mark Oliver, Stretch
2/11/2008 3:00 AM EST
While this "cost reduction" approach has been successful in driving rapid growth in the industry, digital surveillance systems have been little more than replacements for their analog counterparts. The true potential of the leading-edge surveillance technology has not yet been realized. This article discusses the technical requirements of future video surveillance systems and the hardware and software changes that are taking place to meet these needs.
Video Surveillance – Not your Father's Video System
Video codecs in a surveillance system have requirements that are very different from those seen in other video applications. For example, latency is relatively unimportant in a broadcast video environment, but it is extremely important in surveillance systems. Surveillance systems often include monitoring personnel who must respond to events in real time. For example, monitoring personnel might track individuals with a Pan Tilt Zoom (PTZ) camera, or they might use an audio link to direct the actions of someone at the scene.
As another example, broadcasters have very little ability to change encoding schemes because any changes they make must be compatible with the receivers already in the field. In contrast, surveillance systems tend to be closed systems. As a result, the operator is free to select standards that best suit his or her needs including frame rate, resolution, and the codecs. Operators may even adjust resolution and frame rate dynamically. This may be done in response to changes in an observed scene or the capabilities of the consuming device. For example, the system might encode several versions of the same video stream. On stream might go to a high-definition monitor in an observation room, while another goes to a PDA carried by on-site security personnel.
Of codecs available to surveillance operators, most are leveraged from other industries, and do not satisfy the requirements of surveillance applications. Table 1 summarizes the high-level features of various common codecs.

Table 1. High Level codec Comparison.
None of the codecs in Table 1 satisfy all the desired characteristics of surveillance systems: low latency, high compression efficiency, resolution and frame rate flexibility, low complexity, and low cost. H.264 Baseline Profile probably offers the best compromise, but lacks the inherent scalability needed in surveillance applications.
Scalable Video Coding
The Scalable Video codec (SVC) is an extension of the current H.264 standard. SVC was developed with a view to using a single encoded stream to satisfy diverse requirements in terms of bit rate, quality and resolution. SVC supports a high degree of scalability. It scales spatially, allowing for varying display resolutions. It scales temporally, allowing for varying frame rates. And it scales in quality, allowing for varying resulting image quality. For example, an H.264 SVC video stream can be decoded by two different devices with different frame rates and resolutions. In conventional video encoding, if the video stream were to be viewed at a reduced resolution on a portable device, the entire stream would have to be decoded and resized. With SVC, only the portion of the stream yielding the desired resolution and frame rate is decoded.
An SVC decoder's flexibility in how it deals with an SVC bitstream results in many benefits to the user. These benefits include ease of adaptation for different displays; resource-conserving video transmission, storage and display; higher transmission robustness; and ease of heterogeneous network support (for example, simultaneously supporting a number of different transmission networks). An additional benefit of SVC is that the compressed stream can be parsed while stored on a disk. The portions of the files that are used to reconstruct high frame rate or high quality images can be progressively removed over time. This is not possible with conventional codecs where the video data is either there or it isn't, and one has to select a date upon which the original resolution file will be totally deleted. With an SVC system, the video can be kept for longer periods as storage requirements are gradually reduced.
Next: Video Requirements




jimhoerricks
4/1/2008 12:23 PM EDT
A very well stated overview of your product with one very important omission - video surveillance systems as generators of evidence that can end up in court.
If the purpose of your system is to monitor an area, without recording - then the system as described would work great. It's when the record button is pressed and a crime is "witnessed" by one of these systems that confusion begins.
It all starts with the designer of the system. What is the system's purpose? Is it observation or monitoring of an area? Is to help in the recognition that some activity is occurring within an area? Is it there to aid in the identification of individuals and objects for internal purposes? With these questions in mind, will these recordings ever be turned over to the police?
Each of the above questions will yield a slightly different system design. You simply will not be able to identify someone/something at a range of 250m with a 4.5mm lens. You will have difficulty identifying someone at 100' at 1CIF. But you can observe activity at these resolutions and distances.
The state where the installation takes place will have it's own evidence code, governing statutes, and case law. These all need to be taken into consideration when designing a system and selection a codec where the data will be used to prosecute offenders. There is an interesting trend in the courts where MPEG4 based video is becoming more of a problem in a prosecutor's case than a help.
Here's a question: Can a "B" or "P" frame be considered a "true and accurate representation of a scene?" Many states, and the Federal Rules of Evidence, have specific guidance as to how this "true and accurate" clause is to be interpreted. If the B and P frames are only representations of the change that the computer predicts between I frames, then how could they be considered "true and accurate" under the rules of evidence - thus summarizing the defense's objection to the evidence as such? With that in mind, how frequent are I frames generated? 1 second? 5 seconds? It varies by manufacturer and by installation. How reliable are the rates in practice vs. what is published by the manufacturer? In one famous case in Florida, there was such a variance that the prosecutor could not use the video - prompting the dropping of charges and a counter suit for false imprisonment.
Hopefully, some consideration will be given for the fundamental change that occurs when the "record button" is pressed - for the potential problems that exist when multimedia data becomes multimedia evidence.
Jim Hoerricks
Forensic Image Analyst
http://forensicphotosho.blogspot.com
Sign in to Reply