Design Article
Cameras for Desktop Videoconferencing
Andrew Davis
3/13/1997 12:00 AM EST
Videoconferencing is definitely coming into the mainstream, particularly as a function for desktop computers. The growth in the industry is being fueled by several factors, including price, and the emergence of ITU standards which cover audio and video compression as well as call control over a variety of networks. During 1997 we can expect to see H.324 software, which sets standards for videoconferencing over V.34 modems and ordinary phone lines, to be bundled with many PC systems shipped. And by 1998, if not earlier, a similar strategy can be expected for H.323 conferencing over IP LANS and the Internet.
Pricing has been driven by semiconductor and software advances which have let to tremendous price declines. The cost for an ISDN-based add-on kit for a desktop computer has fallen from about $5000 to approximately $1600 in less than two years. Today an add-on kit for POTS videoconferencing can be had for about $500, including the V.34 modem. Future price declines are likely to continue. The next set of price advances, however, are likely to be led by dramatic changes in camera technology. In this overview, we look at camera technology from the perspective of a videoconferencing user.
By definition, you can't videoconference without a camera, a fact not lost on the many vendors eyeing the rosy forecast for desktop videoconferencing system sales. As will be obvious from the details below, DSP technology, as well as a philosophical break from the traditions of the past, are opening the doors to new camera designs. This article catches the industry at an interesting inflection point, as camera technology is undergoing a "paradigm shift" in order to capitalize on new silicon, new markets, and new cost points. The basic issue is that the new products are designed to be computer peripherals and feel no allegiance to broadcast television, analog VCRs, or other product constraints.
Cameras for videoconferencing come in a very wide range of price-performance-feature sets to accommodate nearly all user needs. The days of a $30 desktop camera are not far away, while high end camera systems today (designed for group videoconferencing systems) sell for $30,000, 1000 times more expensive than the soon-to-be bottom of the market. Camera designs, like that of most video products, have their origins in the standards set by the broadcast television industry. While these have made perfect sense for decades, interfacing standards designed to meet the needs of the 1930s to computer technology of the 1990s often leads to some "irregularities."
Desktop cameras can be segmented by the sensor type used to form the image and by the output signal type which carries the video information.
| CCD Sensor | CMOS Sensor | |
| NTSC Analog Output | Ö | Ö |
| Digital Output Signal | Ö | Ö |
Table 1: Sensor types in desktop cameras
Most products today use a lens to focus light onto a CCD array (or three arrays in the case of very high end cameras).The CCD photosensor is composed of discrete photosites or pixels that accumulate (analog) electrical charges based on the quantity of light hitting each one. Each photosite is scanned out of the CCD device in a pixel-by-pixel, line-by-line sequence, thereby creating an analog video signal. Typically the analog voltage levels are sequenced in accordance to the RS-170 or CCIR video standard (see below) and the appropriate synchronization signals are added by other pieces of the camera's electronics. The result is a standard video signal that is compatible with other standardized video devices, including TVs and VCRs. "The key to getting costs down," says Mike Miller of Silicon Vision, "is getting the die smaller so that more fit on a single wafer." Over time, CCD technology has moved from 1/2-inch to 1/3-inch to 1/4-inch and some vendors are now testing 1/5-inch designs. These still produce a $30-$60 CCD sensor however, which is a major cost element for a desktop camera.
Newer cameras designed for desktop applications are replacing CCDs with CMOS technology, which promises lower costs. CMOS devices can be built on the same production lines as other high volume silicon devices. It is only within the past year that researchers have solved many of the noise and quality problems that plagued CMOS in the past and made CMOS-based image quality vastly inferior to that possible with CCDs. A major advantage of CMOS is that it is possible to to integate an entire imaging system, including A/D converter, signal processing, color encoding, and compression, on a single sensor chip. Videoconferencing industry pundit Will Strauss of Forward Concepts says, "CMOS is the way to the $30 camera and true desktop ubiquity. By 1998 we expect CMOS quality to be equal to that of CCDs."
Video signals have long adhered to standards, a necessity in the world of broadcast TV, where content providers and broadcasting companies needed the support of multiple, independent video equipment vendors and the availability of consumer-level plug-and-play capabilities. Standards specify the specific scan-rate timing, the number and order of lines in an image frame, the image aspect ratio, synchronization signals that indicate the beginning of each line and each frame, color signal encoding, if any, and image brightness and color signal voltage levels. Videoconferencing cameras adhere to one of the following:
- EIA RS-170
In the United States, RS-170, produced by the Electronic Industries Association, embodies the technical specifications that were originally defined in the late 1930s in order to standardize the black and white TV industry. RS-170 defines an aspect ratio of 4:3, a 2:1 interlaced scan technique, and horizontal and vertical sync pulses. An entire RS-170 frame consisting of one even and one odd field is made up of 525 lines; each frame is sequenced out every 33.33 milliseconds (ms). The vertical sync interval and settling period chew up 20 line times per field, however, leaving 242.5 lines per field or 485 lines per frame for image. RS-170 also specifies electrical voltages. - NTSC
In the 1950s, the National Television Systems Committee adapted the RS-170A color standard, widely known as NTSC. NTSC describes a composite color signal created by combining color and brightness information on a single signal. Hue and saturation are combined using phase and amplitude modulation techniques into a single chrominance signal. This is added to the RS-170 brightness signal (luminance), together with a reference called color burst at the start of each line. The NTSC system allows the coexistence of monochrome and color television (known in the computer industry as backward compatibility), an important constraint at the time it was introduced. - CCIR
The CCIR video standard is the European equivalent of RS-170. CCIR specifies a 625-line image with a frame rate of 40 ms, a 2:1 interlaced scan, and a 4:3 aspect ratio. The CCIR standard was also adopted for color; this is known as PAL, phase alternate line. However, France, Russia and a few other countries use a third standard called SECAM. - Y/C-Video
While component systems carry the R-G-B color information on separate signals, and composite signals carry all the information on one signal, an intermediate standard has evolved. The Y/C component color standard conveys the color video signal as a luminance (Y) signal identical to the standard RS-170 monochrome video signal and a chrominance (C) signal identical to the chrominance subcarrier defined in the NTSC standard. However, by using separate signals, a higher quality level is achieved. Y/C video is also known as S-Video, super-video, and S-VHS. Many videoconferencing systems handle NTSC and S-video signals.
RS-170 was optimized for the human perceptual system and the technology available to the broadcast TV industry decades ago. Interlaced video reduces flicker for the human eye and 30 frames per second eliminates many noise problems associated with 60 Hz power supplies. The 4:3 aspect ratio makes for a pleasing TV image. But these technologies are not well suited for the computer industry. For example, with interlaced lines coming off a camera, the computer (which uses progressive scan) has to reorder the data to make a sensible image while in the human eye this is done automatically (through persistence). And for applications like videoconferencing, where other constraints may limit a system to 20 frames per second or slower, being locked in to the 30 frames or 60 fields per second specified in RS-170 has no logical basis and is a distinct disadvantage.
Consider: if you are using a desktop videoconferencing system with an NTSC camera, here are the steps that need to take to place:
- The camera's sensor produces an analog signal of your image.
- The camera then converts the analog signal coming off the sensor to a digital signal that is compatible with the camera's internal processor.
- The camera processor then processes the digital signal to produce the required data stream.
- This data stream is then "encoded" back to an industry-compliant analog signal (NTSC or PAL) and fed into your computer (or TV monitor or video recorder).
- If the destination is a TV or video recorder no further processing is needed. But, if the destination is a PC, the signal must be converted back to digital using a chip in the PC. Typically this is done on a frame grabber board, also known as a camera interface.
- Now in the correct digital format for the computer, the (videoconferencing) application can process your image.
The process of converting an analog signal to a digital signal to an analog signal then back to a digital signal seems wasteful, and expensive. This is the basis for the interest in digital cameras for the PC industry, and videoconferencing in particular. There really isn't any reason to stay with the NTSC constraint, other than most cameras are NTSC devices, and according to Don Lake, general manager of VLSI vision, "NTSC's days in the computer industry are numbered."
Digital cameras do away with all the convolutions described above and send a digital signal back, sometimes already compressed, to the videoconferencing host through one of several interfaces (described below). The first really successful device was the Connectix QuickCam, a black and white, tennis-ball-shaped device that connects to a PC via the parallel port or to a Macintosh via the serial port. Bundled with several third-party videoconferencing software applications as well as with a package from Connectix itself, the camera set a low-cost bar for many desktop users. Other vendors have since jumped into the market, all with an eye towards several market desires:
- Deliver full imager resolution at a full 30 frames per second to the PC bus
- Connect to the PC through a small, inexpensive, readily available connector
- Be powered directly from the PC, eliminating the need for an external power pack
- Support software controlled image processing
- Minimize cost
Silicon Vision's iVision architecture is based on CCD technology. The PC interface and image processing are performed at the PC end of the camera cable, either on an add-in card or on the motherboard itself. Only 8 bit "pixel samples" need be transported to the PC instead of 16 bit YUV processed pixels. With a convenient camera head control capability in place, host software can interrogate the camera's EEPROM chip to learn specific and useful data about the connected camera, such as resolution, bias setting, color calibration, and even serial number. Silicon Vision's iCamHost chip on the host side replaces the traditional analog video decoder chip. Software controls include variable frame rate and continuously variable exposure times.
Another company in the digital camera field is VLSI Vision (VVL), which uses CMOS technology for the image sensor. With a VVL chip, you apply power, and get video out. VVL sells CMOS sensors, camera modules, and complete cameras with plastic housing, depending on the needs of the customer. Omnivision Technologies is also producing cameras based on CMOS technology. The company announced a $29 black and white camera (OEM prices) and a $59 color camera (not yet in production) at the 1997 Consumer Electronics Show.
No matter what sensor technology is used to capture an image, the output signal can take one of several forms. NTSC cameras present an analog signal to the personal computer. This must be converted to digital by a frame grabber, a product available in a wide variety of prices and configurations. Note that even if video compression is accomplished with host-based software, a camera interface is needed if NTSC cameras are used. An advantage of NTSC is that a wide variety of cameras and frame grabbers are available, and interoperable.
For digital devices, there are several ports on a PC that customers can use, including the keyboard port, mouse port, joystick port, serial port, parallel port, audio in port, video in and video out ports, and proprietary ports used by specific manufacturers for specific devices. This is confusing (and wasteful) to the end user. The key aspect behind new technologies is the standardization of a single connector for a wide variety of devices. New digital cameras use one of four approaches to connecting to the PC: proprietary interfaces, parallel ports, FireWire, or USB. The last three do not require the user to open the PC, a major advantage.
With proprietary interfaces, the digital camera comes with its own interface board.
The most common approach today is to use the parallel port. The major advantage is that the parallel port is available on all PCs. The disadvantage is that the parallel port imposes a limit to videoconferencing performance.
P1394 (FireWire) is a complex, IEEE standard serial bus. FireWire promises several advantages, including speeds up to 200 Mbps (400 and 1200 Mbps under study), isochronous support, linking of up to 63 devices, and ease-of-use through automatic configuration. P1394 interface boards are now coming to market, and motherboard implementations are likely to follow. FireWire has been endorsed by many companies. Sony, however, was the first to commercialize it with the DCR-VX1000 and VX700 camcorders, introduced at the end of 1995.
Universal Serial Bus (USB) is peripheral bus designed to provide a plug and play environment (without rebooting) outside the box, eliminating the need install cards into dedicated computer slots and to reconfigure the system. USB will allow up to 127 devices to be connected simultaneously to a computer. The bus (via Windows software) automatically determines what host resource, including driver software and bus bandwidth, each peripheral needs and makes those resources available without user intervention. USB peripherals, including keyboards, monitors, mice and joysticks have already hit the market, and videoconferencing cameras have been announced by Xirlink. The USB provides high-speed data transfer rates of 12 megabits-per-second, as compared to today's standard PC serial port rates of 115 (maximum) kilobits-per-second. USB also provides isochronous and asynchronous data transfer as well as a star-hub architecture that makes it possible for a single PC port controller to link up to 63 digital peripherals.
USB and FireWire/IEEE 1394 are likely to coexist for quite some time and together will dominate the future of the PC peripherals market. USB is low cost, available now, and better suited for slower speed PC connections (keyboard, mouse, etc.). USB's data rate is adequate for many multimedia applications if the bus is not shared by lots of peripherals. FireWire, which uses more expensive chips and cables, will be used for higher-speed multimedia connections and is probably another year away from widescale availability.
Between the image sensor type and output format choices, many combinations are possible, and nearly all are represented by products in the market today. Examples include, but are not limited to:
| Vendor Example | Image Sensor | Output Format |
|
Marshall Electronics OmniVision |
CMOS CMOS |
NTSC NTSC |
|
Vivitar VLSI Vision |
CMOS CMOS |
Digital-Parallel Digital-Parallel |
|
Philips Samsung |
CCD CCD |
NTSC NTSC |
|
Connectix ACS Sanyo Silicon Vision |
CCD CCD CCD CCD |
Digital-Parallel Digital-Parallel Digital-USB Digital-Proprietary |
Table 2: Examples of camera designs
After all the technology issues are said and done, color videoconferencing cameras today appear to fall into five price/performance categories.
| <$200 | This is the mainstream of the desktop market and includes both analog and digital cameras |
| $200-$500 | This is the "high end" of the desktop cameras and generally provides higher quality video. Enhancements include glass optics instead of plastic, automatic gain control, iris control, white balance etc. Some of these cameras may also provide more than 330 lines of resolution. |
| $500-$1200 | This is the bulk of the small group systems camera market. Added features include pan/tilt/zoom control, better microphones. For the desktop, these cameras often suit dual purposes, such as document imaging and still photography. |
| $1200-$5000 | Higher end group system cameras include autotracking systems, remote camera control and other functions to make group meetings as natural as possible. |
| >$5000 | This range today extends all the way to $30,000, including cameras used in distance education and broadcast TV. These use the highest quality optics, and a full range of pan/tilt/zoom/focus automation. |
Table 3: Camera market price segments
When you put all this together, one thing is clear. The future will bring cameras specifically designed for desktop videoconferencing. Shedding their NTSC technology roots, these designs will result in both higher performance and lower costs. And lower camera costs will remove one of the last obstacles to widespread videoconferencing adoption.



