A World Wide Web Telerobotic Remote Environment Browser


Eric Paulos and John Canny

Department of Electrical Engineering and Computer Science
University of California
Berkeley, CA 94720-1776


Robots provide us with a means to move around in, visualize, and interact with a remote physical world. We have exploited these physical properties coupled with the growing diversity of users on the World Wide Web (WWW) [1] to create a WWW based active telerobotic remote environment browser. This browser, called Mechanical Gaze, allows multiple remote WWW users to actively control up to six degrees of freedom of a robot arm with an attached camera to explore a real remote environment. The initial environment is a collection of physical museum exhibits which WWW users can view at various positions, orientations, and levels of resolution.

Keywords: telerobotics, teleoperation, telepresence, robotics, museum

1 Introduction

We have designed this teleoperated WWW server in order to allow users throughout the world to visit actual remote spaces and exhibits. It also serves as a useful scientific tool by promoting discussion about the physical specimens in the browser such as insects, live reptiles, rare museum collections, and recently discovered artifacts.

The use of an on-line controlled camera eliminates many of the resolution and depth perception problems of libraries of digitized images. The user has complete control over the viewpoint, and can experience the exhibit in its state at a particular moment in time, under the same conditions and lighting as a viewer who is in the actual space.

In addition, each exhibit has a hypertext page with links to texts describing the object, other WWW pages relevant to it, and to comments left by other users. These pages can be accessed by navigating the camera in a physical space, and centering on a particular object. The pages can be thought of as mark-ups of 3D objects in the spirit of VRML [2] [3] [4], but where the objects are actual physical entities in a remote space rather than simply models.

Exhibits can be added or removed in a matter of a few minutes, allowing for an extremely dynamic array of objects to be viewed over the course of only a few months. Users are encouraged not only to check back for upcoming exhibits, but to participate themselves. Users can leave commentary about an item on exhibit, creating dialogue about the piece, as well as feedback to the owner, artist, or curator of the object. Institutions, museums, curators, scientists, artists, and individual users are all invited to exhibit objects in the browser.

2 Goals and Motivation

Early in the Summer of 1994, we realized that we had the equipment and resources to design an inexpensive, publicly accessible tool for remote environment browsing. We were also inspired by the diversity and growth of the WWW as a medium for this tool. In addition we were driven to develop a useful application for interactive robots on the WWW.

The restrictions imposed by the Hyper Text Markup Language (HTML) made it difficult to design an intuitive user interface to a complex 6 axis robotic system. Certainly, we could have chosen to construct custom navigation software for users to download. While this would allow us more freedom in the design of the overall system, it would severely restrict the accessibility of the browser. Since we consider the quantity and diversity of users on the WWW as one of its most powerful aspects, we choose to constrain the development of our system within the accessibility of WWW users.

2.1 Background

One of the early goals of the project was to incorporate methods in which users could remotely examine and comment on actual museum exhibits. At first we were interested in how well such a tool would operate on insect exhibits. We developed a prototype telerobotic browser and presented it at the Biological Collections Information Providers Workshop in January of 1995. At this workshop we received feedback about the uses and implications of such an application to natural science research. Later, in April of 1995 we presented the browser at Wavelength, an art installation in San Francisco exploring the science and nature of movement. At these two arenas we were able to learn what elements of the browser were important, not only to scientists performing research, but also to novice users attempting to explore various remote spaces.

2.2 Goals

Before designing the system we set forth our goals for the project. Our primary goal is to provide a universal remote environment browsing tool that is useful for the arts, sciences, and in the development of education and distant learning. To meet this goal we agreed upon several elements that we felt were essential to any remote environment browser.

First, we wanted to insure universal unrestricted access to the browser. This would allow access to artifacts and objects by a wider audience than previously available. Current access restrictions are usually the result of geographic, political, or monetary constraints preventing the individual from traveling to the object. Likewise, owners and curators of exhibits do not always have the resources or the desire to tour the object throughout the world. We wanted to develop a tool that would attempt to solve many of these problems by bringing the people together with the objects at a minimum cost.

Rather than a fixed, static display, the browser must allow users true three-dimensional navigation around objects at varying positions, orientations, and levels of resolution. As David Gelernter suggests in his book Mirror Worlds [5], such systems that gaze into remote spaces should show each visitor exactly what they want to see. This requires the system to provide millions of different views from millions of different focuses on the same object. Certainly visitors will desire to zoom in, pan around, and roam through the remote environment as they choose. More importantly, they should be permitted to explore this space at whatever pace and level of detail they desire. Users should also be free to swivel and rotate the image, to get a better look at regions that might be obscured in the initial perspective.

The browser should provide to the exhibit owners, curators, and caretakers a forum to receive feedback and commentary about their exhibit. This same forum should also allow scientists to discuss details concerning classification of specimens such as insects or the origins of a recently discovered artifact. Essentially, some method for leaving comments and creating dialogue should be provided.

Finally, the system should allow exhibits to be added and removed with a minimum of effort, thus providing the possibility of exhibiting a wide variety of objects over the course of a few months. In addition, recently discovered/developed scientific objects should be able to be added for universal browsing within the order of a few minutes.

2.3 Why Use Live Images?

A common objection to our approach is why we simply do not use pre-stored digitized images for browsing objects and spaces. While we agree upon the importance of such pre-stored images, the remote environment browser offers several distinct advantages over conventional image database solutions.

For example, the standard approach to providing remote access to museum collections' visual data is to digitize and pre-store images of all artifacts or specimens. This solution requires considerable expense and time commitment to complete the capture, storage, and serving of digitized images. Our telerobotic approach allows remote users to interactively view museum artifacts and specimens on demand. This allows them to achieve much higher image resolution without the expensive digital storage requirements typically associated with large image databases. Our interactive viewing solution also relieves museums of the need to store digital images of entire collections over a variety of resolutions.

Our approach allows immediate visual access to any/all collection materials from the beginning of a project. Traditional image capturing can take several years for large research collections, with millions of specimens that require special handling. The remote environment browser solution eliminates the waiting period that usually occurs during serial indexing and image capture. Museums that utilize a remote browsing model are able to provide remote access to any/all of their collection materials at a moment's notice, as opposed to access to a serially increasing number of objects over time. The ability to view specimens is more valuable if all specimens are available, the fewer specimens in a collection that are digitized, the less research value accrues to the resource as a whole.

With a three dimensional object there will always be arguments surrounding what view to capture. By allowing researchers to choose their own view and magnification of the specimen or artifact, arguments over which specific view or number of views a museum should provide to remote users are eliminated. Unless users can choose their own view of museum collections' materials, they will not be satisfied with using digital images for research. Even more importantly, some visually oriented research uses, such as taxonomy and morphology cannot be supported in the digital environment without the provision of multiple views and magnifications. Useful statistics can be gathered by the browser as to which views are more popular among scientists and hence draw conclusions as to the relative importance of particular views and resolutions. This statistical information also provides useful data when later choosing a single static view to best represent the the object.

Certainly, dynamic exhibits such as live creatures, moving liquids, and mechanical systems must be viewed using live images. These live views are necessary to study the behavior of such systems.

Further discussions about the use of digital images in art and science, as well the implications of their use can be found is several sources [6] [7] [8] [9].

3 Previous and Related Work

The sensation of embodiment of an individual in a real life distant location has provided more than enough impetus for people to develop remote telepresence systems.

3.1 Historical Telepresence Systems

Methods of achieving telepresence are not new. Early devices such as the Camera Obscura allowed viewers to be tricked into believing they were in another space. In the 1940's Joseph Cornell produced various boxes that created the illusion when peeped into of a miniature three-dimensional space. Later in the 1960's the picturephone, although never widely adopted, provided an early sensation of remote interaction.

One of the earliest mechanical teleoperational systems was developed by Goertz [10] in 1954. Many subsequent systems were aimed at safely exploring hostile remote environments such as battlefields, nuclear reactors [11], deep oceans [12], mining [13], and outer space [14]. Additional applications for teleoperated surgery [15] and manufacturing [16] have been explored by many researchers [17] [18] [19].

Most of these system are quite complex, requiring special purpose dedicated hardware to control and interact with the mechanism in the remote environment. As one of our goals states, we wanted to constrain development to a system that would be accessible to a wide audience without additional expensive or extraordinary hardware.

3.2 Telepresence on the WWW

The spontaneous growth of the WWW over the past several years has resulted in a plethora of remote controlled mechanical devices which can be access via the WWW [20]. Some of these early systems employed fixed cameras in remote spaces where users could observe dynamic behavior such as the consumption and brewing of coffee in a coffee pot [21] or the activity of a favorite pet in its native habitat.

Systems evolved to allow users various levels of control via the WWW such as the LabCam [22] developed by Richard Wallace. His system allows remote users to aim a pan/tilt camera using an intuitive imagemap [23] interface.

Progression to intricate control of more degrees of freedom was realized by introducing robots to the WWW. Ken Goldberg et al. [24] developed a three axis telerobotic system where users were able to explore a remote world with buried objects and, more interestingly, alter it by blowing bursts of compressed air into its sand filled world. Mark Cox [25] developed a system for allowing users to request images from a remotely controlled telescope. Another remote robotic system, developed by Ken Taylor [26], allowed WWW users to remotely manipulate blocks using a robot with an attached gripper. More recently, Ken Goldberg et al. have developed a telerobotic system called the TeleGarden [27] in which WWW users are able to observe, plant, and nurture life within a living remote garden. As of this writing, well over a hundred interesting mechanical devices are connected to the WWW with more spawning daily.

Currently, manipulation of three-dimensional virtual objects requires separate browsers such as WebSpace [28] for documents written in the Virtual Reality Modeling Language (VRML) [4] or browser extensions such as those for the Object Oriented Graphics Language [29]. Standardized systems for browsing real remote spaces have yet to come to maturity.

4 Overview

Our design choice for the user interface to the remote environment browser was to mimic much of the look and feel of a museum. We choose this approach, hoping that users would find it familiar to navigate, and thus more intuitive and inviting to use.

As a user enters Mechanical Gaze, they are presented with a chance to view some general information about the project, receive a brief introduction, obtain help in using the system, statically view previous and upcoming exhibits, or enter the actual exhibition gallery.

Users who enter the exhibition gallery are presented with an up to date listing of the exhibits currently available for browsing. These are the exhibits that are physically within the workspace of the robot and can be explored. The idea behind the exhibition gallery is to give only a brief introduction to each of the available exhibits. This typically consists of providing the name of each exhibit, the dates it will be available, the presenter(s), and perhaps a very brief description.

Users who wish to more closely examine an exhibit can simply select it from the listing. The user will then be presented with a more detailed description of the exhibit as well as a chance to either browse the exhibit using the robot or request to view the comments corresponding to that exhibit.

5 Hardware

The browser is composed of an Intelledex 605T robot, a 1970's era industrial robotic arm with six degrees of freedom. This robot's use as a research tool has diminished over the years, and it is now primarily used for laboratory instruction in an introductory robotics course. As a result, it is inactive for all but a few weeks a year. Through this project, we were able to place most of this equipment back into useful service.

Image capturing is performed using a camera and frame grabber hardware. Images are received from a modified RCA Pro843 8mm video camera mounted onto the last link of the robot. The auto-focus feature of the video camera allows users to view a variety of objects clearly, irregardless of the object's own height or the distance from which it is viewed. Typical exhibition spaces allow users to capture clear images anywhere from 1-30 cm from the surface of the object. Since we desired an easily reconfigurable exhibition space, a fixed focus camera would not be able to accommodate the wide variety of differently sized objects. Likewise, using a custom built mechanism that allowed users to adjust the focus manually, would unnecessarily complicate the hardware in the system and almost certainly the user interface. A manual or remote controlled focusing feature is more applicable in a teleoperated system with real-time image feedback such as that available through the multicast backbone (MBONE) [30] [31] of the Internet as discussed in Section 9.

Image digitization occurs on an SBUS based VideoPix frame grabber card attached to a Sun IPC workstation. Eight bit 360x240 color images are captured in less than 50 ms. Further computation to convert the image into a compressed GIF or JPEG format for incorporation into HTML documents and save it to disk takes an additional 2-3 seconds. Overall, the time required to capture, convert, and save an image is on the order of 2-3 seconds.

The actual Hyper Text Transmission Protocol (HTTP) server containing the custom common gateway interface (CGI) scripts and state information for individual users operates from an HP 715/60 workstation. This machine provides the front end interface to the system by receiving requests from users, employing the services of the other hardware in the system, and delivering the results back to the user in an HTML format.

Figure 1: System Architecture

The browser has also used a four axis RobotWorld robot with an SGI based frame grabber. Our browser is designed to operate correctly on a variety of different physical robotic systems. In fact, plans to operate the browser simultaneously across several robots, transparent top the user, is in progress.

6 Robot Interface and Control

Control of the robot is through interpreted commands sent to the robot on a 9600 baud serial line connected to a Sun IPC workstation. In order to interface this to the WWW, two separate software tools were constructed. First, a daemon was setup to handle requests involving the robot and camera hardware. The interface from the WWW to this daemon is controlled by the second group of software.

6.1 Radius: The Robot Control Daemon

The Intelledex robot is connected via a serial line to a Sun Workstation where a robot daemon called Radius runs continuously. When first executed, this daemon initializes all of the hardware. This includes the serial line connection to the Intelledex, the VideoPix video digitizing board on this Sun workstation, and the Intelledex itself. It then listens for robot requests via a dedicated socket connection. All requests for any service that involves control of the robot or camera hardware are handled by Radius.

When a socket connection is made, Radius first checks for authentication using a known encoding. This prevents unauthorized control of the robot hardware. This is particularly important as we move towards devices with the capacity of physical manifestations of energy in a remote environment. The damage resulting from an unauthorized access into such as system can cause not only irreparable damage to the robotic equipment and exhibits, but human injury as well. Therefore, measures to prevent at least the most naive attacks should be included in such systems.

Authorized connections to Radius include a 4 byte request type descriptor. The request encodes the type of request and a mask. The request type can be a position query, motion command, or image capture command. The robot can be queried or commanded using either direct commands to the robots joint motors or by providing a cartesian pose from which the inverse kinematics are calculated for the corresponding robot joint values. It's important to note that Radius keeps track of the robot's state so that no extra communication is necessary to the Intelledex to satisfy any robot position queries. In addition the request mask determines which axis or joint values should be moved and the results, sent back to the client. This allows for movement of a single degree or freedom if desired.

Motion requests are converted by Radius into Intelledex native control commands and send on the serial port to the robot hardware. Radius can also query the robot to determine when all motions have stopped, hence allowing an image to be captured.

Image capturing is also handled by Radius. When an image capture request is received, the VideoPix hardware digitizes an image, converts it to a Portable Pixmap (PPM) format internally and finally to a compressed GIF or JPEG file. The file is output into a temporary space and assigned a unique identification number. This number is passed back to the requesting process so that the correct image will be displayed in the HTML document passed back to the corresponding user.

This socket connection also provides the mutual exclusion necessary to insure the correct functionality of Mechanical Gaze even when handling multiple requests. Since our interface design is WWW based, requests are event driven. After a user has loaded an image, the robot is left idle until the user makes another request. Instead of allowing this exclusive access to the robot, leaving the robot idle while the user contemplates his or her next action, we service additional requests from other users. By multitasking, we provide increased access to the robot as well as a more efficient use of system resources. However, we must provide a method to guarantee that certain atomic operations are exclusive. For example, a request to move and grab an image, must be exclusive. This insures that no other motion occurs between the time we move the robot and capture the image. If we had failed to implement this, we would have no guarantee that the image delivered back to the user was actual taken from the location that they requested.

6.2 Remote Browser Page Construction

Requests to browse an exhibit are handled by a custom CGI script. Initially, the script is passed a unique identifying internal number corresponding to the exhibit to be browsed. The script reads in the current list of exhibits and extracts the relevant information for the exhibit of interest (see
Section 7.2). One of these items is the physical location of the exhibit in the remote environment. Using this information, a socket connection is opened to Radius, the robot control daemon (see Section 6.1). Once the socket connection is established, a request is made to move the robot to the desired location and capture an image.

When the result of that request is received, the CGI script dynamically lays out the HTML page. First, it extracts information from the internal list of exhibits. This provides the name of the HTML file to place at the head of the browser page. Next, it inlines the captured and converted GIF or JPEG image, placing it within an imagemap with a unique user identification number. Then the location of the robot relative to the boundaries of the exhibit provide a measure for composing the various status indicators. One of these indicators, shown in Figure 3, is a graphical representation of the location of the image displayed with respect to the boundaries of the exhibit volume. This indicator is image-mapped and can be used for navigation within the browser. The other indicators are also image-mapped and used for navigation. They reflect the current state of the zoom, roll, and pitch. All of the the status indicators are generated using GD, a graphics library for fast GIF creation developed by Thomas Boutell of the Quest Protein Database Center at Cold Spring Harbor Labs [32].

Additional navigation icons are attached to the page. These icons allow users to leave comments about the exhibit, move to the next or previous exhibit, returning to the list of exhibits, obtain help, or return home. Finally, the comments left by users viewing this exhibit are appended to the page, completing the delivery of the HTML file. The CGI script also writes out a unique internal user file. This file contains the state information concerning the page and accompanying image just delivered such as the position of the robot when the image was captured, the exhibit being viewed, etc. This allows for subsequent requests by this user to result in correct robot motions relative to a users current image. Remember that between requests from a particular user, any number of addition requests may have been handled by Radius, and there is no guarantee on the current location of the robot when the next request from that user is received. The final result of a remote environment navigation request is a page similar to the one depicted in Figure 2.

Figure 2: An active browser navigation page

System Utilities

Our system is fairly distributed, employing several different pieces of hardware. To manage these systems as well as maintain the system in a functional state, several utilities were developed.

7.1 Dynamic HTML

Dynamic HTML (dHTML) is a superset of the HTML language. dHTML documents appear and behave similar to HTML except for a special escape sequence described by a series of double less than signs (<<) , followed by a keyword, and terminated by corresponding greater than signs (>>). Their main function is in describing relationships to other documents that may change dynamically. These dHTML documents are pre-processed by the dynamic HTML parser which converts the escaped keywords into ``correct'' HTML.

For example, in Mechanical Gaze we may desire that an anchor be set to link from the present exhibition document to the next exhibit. However, we don't know a priori the name of the HTML file for the next exhibit or if it even exists? Even if we did know, the removal of a single exhibit would require time consuming hand updating of many of the other active exhibit pages to maintain correct functionality of the system. Therefore, we exploit the property of dHTML documents to perform these tasks. We solve the above example and others like it by placing the escaped keyword sequence for ``next exhibit'' in the dHTML document in place of an HTML link to the next exhibit. When requested, this page passes through the dHTML pre-processor which knows information about all of the exhibits and the specific exhibit corresponding to the page it is processing. When it encounters the escaped keyword sequence, it substitutes the correct corresponding link dynamically.

7.2 Adding and Removing Exhibits

Since our system is dynamic by the very nature that it moves and delivers current images from a remote environment, we also wanted to allow the individual exhibits to be dynamic and change rapidly. The only limit on the number of exhibits available is the physical dimensions of the robot's workspace which is approximately 8000 square cm.

Each exhibit contains an entry in the current exhibits file from which the CGI script extracts various information. Included in this file is the number of exhibits along with additional information about each exhibit such as the robot's location for entry into the exhibit and the physical bounding volume available for browsing this exhibit. The bounding volume is described by limits set on the length, width, and zoom for each object. There are also limitations on the amount of roll and pitch permitted. If while browsing an exhibit, a user makes a navigation request that would move the robot out of the legal boundary for that exhibit, an alert page is presented with a description of the illegal motion and help on how to continue browsing.

A unique directory name for each exhibit is also contained in the current exhibits file. This directory contains an introduction HTML file used to describe the exhibit when users request the list of current exhibits, a description HTML file containing additional information about the exhibit, a header HTML file to place at the beginning of each browser page, and a file containing the running dialogue and comments for the exhibit which is attached to the end of the browser page. Usage statistics corresponding to each exhibit are also located in this directory.

The result of this approach is that adding and removing exhibits is quick and easy. To add an exhibit, one places it into the robot workspace and provides the appropriate introduction, description, and header files. The addition is immediately active by simply inserting the physical location of the exhibit and its boundaries into the list of current exhibits. Removing an exhibit is accomplished by the even easier task of taking its entry out of the current exhibits list. All modifications of the current exhibits list are effective immediately.

7.3 User Registration

One of our goals is to provide all WWW users unrestricted access to the browser. However, certain features of the system are more effective when reasonably accurate information is known about the user. For example, when leaving comments, it is helpful to append the message with the name of the user for identification, an e-mail address with which other users can use to correspond privately, and perhaps a pointer to a home page so viewers can familiarize themselves with that individual. Allowing users to enter all of this information manually for each comment is not only tedious but problematic. There's little preventing a user from assuming the identity of another user or anonymously dumping pages of garbage text into the commentary. Therefore, we developed a method for users to register themselves by providing a name, e-mail address, and a home page pointer (optional). A password is mailed back to them to be used for registering themselves on subsequent visits. This request for information is not intended to be a violation of a user's privacy. Nor is it intended to be sold or given out. More importantly, this does not violate our goal of unrestricted access since anyone can become a member.

Registered users gain a few additional privileges. When navigating the robot, they are provided the roll and pitch control tools shown in Figure 6. These two tools permit full control of all robot axes. For non-registered users these tools are replaced with the simplified zoom in and zoom out buttons to guide the robot as shown in Figure 5. Also, only registered users are permitted to leave comments about the various exhibits.

8 Navigational Tools

After receiving a remote environment browser page, a user may wish to modify the vantage point of the exhibit and obtain a new image. This modification takes place by employing one or more of the navigational tools presented to the user from the active browser navigation page shown in Figure 2. These tools are used often since they allow for motions which provide the user with the sensation of browsing or exploring a remote space.

8.1 Scrolling

The captured image is image-mapped and provides the interface to scroll the camera. Scrolling moves the camera in the direction relative to the center of the image and in the plane normal to its previous position. The distance from the center of an image also affects the magnitude of motion in that direction. For example, selections on the outer border of an image move the camera a predefined maximum step in that direction, while points closer to the center of the image result in smaller fractions of motion. This allows users to perform both coarse and fine motions. The maximum step is a function of the height of the camera from the exhibition floor. This provides the proper scaling of scroll motions when the camera is closely zoomed into an object. For most exhibits the result is that selections in the image are brought to the center of the image. However, this is difficult to guarantee since we have exhibits with a wide variety of different heights and sizes.

Figure 3: The location status indicator tool before and after scrolling

Large macro-motions can be performed by selecting a new location within the exhibit space using the location status indicator tool shown in Figure 3. This indicator gives the location relative to the boundaries of the particular exhibit from which the image was taken. Selecting a new location within the indicator causes the next image to be delivered from that new vantage.

8.2 Zooming

Every exhibit allows a user to zoom closer to an object for more detailed inspection, as well as to zoom out to achieve a wide angle view. Zooming is accomplished through the zoom status indicator tool located on the right size of the image and shown in Figure 4. The camera mimics the motions of the thermometer indicator, raising and lowering itself from the exhibit. An imagemap on this tool allows transformation of user selections directly to zoom levels. Like the location status indicator tool, this tools is re-scaled for each exhibit based on the zoom boundaries set for that exhibit.

Figure 4: The zoom status indicator tool

Non-registered users are provided with two additional simplified zoom icons in place of the roll/pitch tools. These icons allow for easy control of image zoom and are shown in Figure 5. Selections anywhere within each of the two icons result in the obvious inward or outward zooming.

Figure 5: Simplified zoom control tool for non-registered users

8.3 Rolling and Pitching

Rolling and pitching the camera are more complex actions and certainly a challenge to implement from within a two-dimensional HTML document. Therefore, these more advanced features are provided only to registered users (See Section 7.3). Although this may sound contrary to our goal of providing unrestricted global access to all users, we remind the reader that anyone may become a registered user.

The roll and pitch tools are composed of semi-arcs with attached pointers. Choosing a point on an arc will cause the camera to roll or pitch depending upon the selection and deliver the resulting image. The current roll and pitch values are displayed within the tool itself by relocating the corresponding pointers as show in Figure 6.

Figure 6: Two different views of the roll and pitch tool

When rolling or pitching, the focus is maintained on the surface in the center of the image. One can imagine this as walking along the surface of a semi-sphere with its center fixed.

8.4 Reloading

Often a user may be observing a dynamic event. To receive an updated image taken from the same position, the user selects the reload button provided within the document. The result is an updated image taken from the exact same camera pose. For example, a remote user observing a Leopard Gecko lizard consume a meal, may choose to reload during the sequence. A more applicable model for transmitting dynamic events is discussed in the next section.

9 Real-time Audio and Video

Future remote browsing system will allow for navigation of the remote spaces with real-time video and audio feedback. We wanted to begin preliminary research into this arena by allowing users to receive real-time video and audio from the remote browser without compromising our goal to provide real remote environment browsing without special equipment requirements. Multicasting, a form of intelligently routed real-time broadcasting is currently provided by utilizing protocols developed for the multicast backbone (MBONE) of the Internet [30] [31] [33] [34].

Essentially, the MBONE allows for broadcasting to multiple sites on the Internet providing a mechanism for real-time communications over wide areas, such as the world. This is possible because of the use of IP networks implementing a lightweight, highly threaded model of communication. The MBONE has only been developed over the last few years but is poised to soon become the status quo of routing on the Internet.

Even more promising to the universal adoption of real-time audio and video tools is the CU-SeeMe [35] software developed for Macintosh and PC systems. Using this software, anyone with a personal computer and a Serial Line Internet Protocol (SLIP) or Point to Point Protocol (PPP) connection can connect to MBONE reflector sites and receive live video and audio feeds. Even home users, connected at 14.4 kb/s, can receive adequate 16 shade greyscale images at about a frame per second.

In our remote browser we have setup two Connectix QuickCam [36] cameras, transmitting 16 shade greyscale images to a reflector site at Berkeley and out onto the MBONE. Due to bandwidth limitation, these transmission are only made at specific announced times. One camera is mounted at the end of the last link of the robot giving a view of the exhibits as they are browsed by the WWW browser. The second is fixed in the room and gives a view of the entire industrial robot and the exhibition table, including all of the exhibits. Plans are also in place to intermittently feed real-time audio so that users can listen to the sounds of the industrial robot or any creatures in the exhibit. All control is still carried out via the WWW interface.

10 Future Ideas and Discussion

There are many modifications, improvements, and additions that could be made to the present browser. We discuss a few of the more relevant ones here. Actual robot interaction with the exhibit should be possible. Currently, the risk of damaging the exhibits outweighs the benefits of implementing such a tool.

Mounting one or more additional cameras onto the robot, opens up several possibilities. 3D stereo vision images could be delivered to the user while scientists could benefit from additional high resolution cameras for research applications. The camera pair would be able to provide depth information to the user and aid in focusing. Besides extra cameras, a selection of colored lights with which the user can select from when viewing would be a useful tool. Encoding unique views of objects as links would allow experts and individuals to design tours of collections of exhibits, pointing out features and details of interest.

The adoption of a new dynamic, extensible WWW browser called HotJava [37], will allow for even more levels of interaction between remote users and with mechanical devices. HotJava can run executable content in the form of applets -- Java programs that can be included in an HTML page, much like images can be included. When you use the HotJava browser to view a page that contains an applet, the applet's code is transferred to your system and executed by the HotJava browser. This means that applets could be written to open a connection back to the machine controlling the actual mechanical device. Control signals to the devices can be generated and controlled directly by the applet. In addition the applet can execute a separate thread to handle receiving and updating of the continuously changing inline live image being fed back.

Using a similar interface, users could control other robotic systems, hopefully allowing browsing of much larger objects as well as microscopic ones. We are currently designing a helium blimp based mobile robotic environment browser for exploring larger spaces. Browsing and co-habitation of a smaller scale space filled with various reptiles is also in progress.

11 Acknowledgments

Many people were inspirational in helping with ideas, suggestions, comments, and feedback about Mechanical Gaze during its development: Ken Goldberg, Natalie K. Munn, Jim Beach, Jeff Wendlandt, Shankar Sastry, Ferenc Kovac, Robert Guralnick, Zane Vella, Mark Pauline, Christian Ristow, Mark Cox, David Pescovitz, and Tho Nguyen.


Tim Berners-Lee, Robert Cailliau, Jean-Francios Groff, and Bernd Pollerman. World-wide web: The information universe. Electronic Networking:Research, Applications and Policy, 1(2), Westport CT, Spring 1992.
Tamara Munzner, Paul Burchard, and Ed Chi. Visualization through the world wide web with geomview, cyberview, w3kit, and weboogl. In WWW 2 Conference Proceedings, 1994.
Sandy Ressler. Approaches using virtual environments with mosaic. In WWW 2 Conference Proceedings, 1994.
David Gelernter. Mirror Worlds Oxford University Press, 1992.
H. John Durrett. Color and the computer. Academic Press, 1987.
C.A. Lynch. The technologies of electronic imaging. Journal of the american society for informaion science, pages 578-585, September 1991.
M. Ester. Image quality and viewer perception. In SIGGRAPH 1990 art show, pages 51-63, August 1990.
J.L. Kirsch and R.A. Kirsch. Storing art images in intellegent computers. In Leonardo, volume 23, pages 99-106, 1990.
Raymond Goertz and R. Thompson. Electronically controlled manipulator. Nucleonics, 1954.
A.E.R. Greaves. State of the art in nuclear telerobotic: focus on the man/machine connection. In Transations of the AMerican Nuclear Society, 1994.
R. D. Ballard. A last long look at titanic. National Geographic, December 1986.
C. Ntuen, E. Park, and S. Kimm. A blackboard architecture for human-machine interface in mining teleoperation. In Human Computer Interaction, 1993.
C.R. Weisbin and D. Lavery. Nasa rover and telerobotics technology program. In IEEE Conference on Robotics and Automation Magazine, 1994.
P.S. Green, J.W. Hill, J.F. Jensen, and A. Shah. Telepresence surgery. In IEEE Engineering in Medicine and Biology Magazine, 1995.
J.V. Draper. Teleoperators for advanced manufacturing: applications and human factors callenges. In International Journal of Human Factors in Manufacturing, 1995.
R. S. Mosher. Industrial manipulators. Scientific American, 211(4), 1964.
R. Tomovic. On man-machine control. Automatica, 5, 1969.
Hans Moravec. Mind Children. The Future of Robot and Human Intellegence. Harvard University Press, 1988.
K. Goldberg, M. Mascha, S. Gentner, N. Rothenberg, C. Sutter, and Jeff Wiegley. Robot teleoperation via www. In International Conference on Robotics and Automation. IEEE, May 1995.
Steve Deering. Mbone: The multicast backbone. In CERFnet Seminar, March 1993.
H. Schulzrinne and S. Casner. RTP: A transport protocol for real-time applications. In Internet Engineering Task Force, October 1993.
M. R. Macedonia and D. P. Brutzman. Mbone provides audio and video across the internet. In IEEE Computer, volume 27, pages 30-36, April 1994.
S. Deering, D. Estrin, D. Farrinaci, V. Jacobson, C. Liu, and L. Wei. Protocol independent multicasting (PIM): Protocol specification. In IETF Network Working Draft, 1995.

About the Authors

Eric Paulos

John Canny

Department of Electrical Engineering and Computer Science
University of California
Berkeley, CA 94720-1776