Wirelessly Controlling a Robot with Speech Commands – UCLA Capstone – Part I

Woohoo! Only 3 months left before I finish my CS&E degree at UCLA!

I’ve been thinking a lot recently about what I learned over the last two-and-a-half years, and I think my degree capstone sums it up quite nicely. In this post, I describe how I used 2 FPGA boards, various peripherals like Bluetooth and microphone, TensorFlow, and the “iRobot” system to develop a speech controlled robot with wireless communication.

Note: This is the first of three articles. Part I provides an introduction to the project and lists hardware/software requirements and constraints. Part II is all about the architecture design (including how to configure MicroBlaze, the AXI peripheral bus, and more). Finally, Part III discusses how to use the DeepSpeech model via laptop “co-processor,” followed by a wrap-up and reflection on lessons learned.


To start things off, I should mention that my capstone project was actually the final project of a 10-week class (CS152B). The first three projects dealt with (relatively) basic concepts, such as FPGA design/development, finite state machines, FPGA peripherals, and using the MicroBlaze soft-processor. The only topic I had not been exposed to prior to this class was MicroBlaze, which is why I will spend a bit of time explaining how that works, what difficulties I had with it, and how things ultimately turned out. I will also briefly describe the iRobot Create system, which is a hobbyist version of the cleaning robot, complete with an API for doing things like actuating the robot and scanning/mapping a room using various sensor modules.

The basic idea is illustrated below. One of the Basys3 FPGA boards will listen for speech commands. When a command is detected, it will transmit that command to the second Basys3 FPGA via Bluetooth, which will then send byte-code commands to the iRobot over UART.

Figure 1: System Overview

One thing I learned early during my time at JPL is that good engineering starts with good requirements. So before diving into the details of our system, let’s put on our engineering hats and enumerate everything needed to make this project possible. It should be noted that my CS152B team and I built the project predicated on our use of the iRobot, since it was one of the few interesting devices available to us at the time. We also wanted to make the whole thing wireless, since there’s nothing fun about controlling a robot that can’t move more than 3 feet away from a computer. This meant we would need two FPGA boards, one for handling speech commands, and the other for physically controlling the robot.

Hardware Requirements

Since we used 2x Basys3 boards, it made sense to use off-the-shelf modules with existing libraries and drivers. The most practical setup we found was using Digilent PMOD modules. In particular, we used 2x PMOD BT2 modules for Bluetooth, a PMOD Mic3 for speech, and a PMOD RS232 for issuing iRobot commands. The BT2 module provides a nice abstraction in the form of UART. That is, for all intents and purposes, we could treat Bluetooth communication between the two FPGAs as a simple UART connection. Setting the JP2 jumper on both BT2s will cause them to auto-pair if powered on within a few seconds of each other. This made the wireless aspect extremely simple.

The Mic3, on the other hand, required a custom SPI controller. For that, we had to integrate a Digilent reference module into our MicroBlaze design and provide it a 50MHz reference clock, plus start and reset signals, all of which were implemented in C and will be explained later. Suffice to say, the microphone was actually more difficult to setup than bi-directional Bluetooth communication, which I found to be a bit surprising. Finally, the iRobot interface cable required either a RS232 or USB connection. Since we did not have a spare USB module, we opted to use RS232 instead. Finally, we needed to use a laptop to run our TensorFlow model on the speech waves from our mic. We used a spare laptop running Ubuntu 20.04 with 8GB RAM, a slightly outdated Intel i5 CPU, and no GPU. Here is a list of the hardware we ended up using:

The diagram in Figure 2 shows the hardware components and how they are connected to each other. Note that we did not have enough time to implement bi-directional communication between the two FPGAs, even though we would have liked to (since the iRobot has many different sensors and signals).

Figure 2: Hardware/Communication Diagram

Software Requirements

In order to facilitate speech recognition, we opted to use the publicly available DeepSpeech model, which runs on top of TensorFlow and is implemented in Python. DeepSpeech was the only external requirement we had. The majority of our software was either developed from scratch or built on top of the libraries and drivers included with the PMODs and MicroBlaze. We created our own PMOD Mic3 driver to make sampling easier, and wrote custom wrappers around the low-level UartLite APIs provided by Xilinx. The software portion will be explained in-depth in Parts II and III.

MicroBlaze Soft-Processor

Working directly in Verilog and programming at the logic-level can be a very gratifying experience. However, the sheer complexity of this project made using only Verilog rather impractical. As mentioned earlier, we opted to use the Xilinx MicroBlaze soft-processor, which is an open-source CPU-core based on the RISC architecture, and which can be synthesized and implemented directly into the fabric of our FPGAs. Using MicroBlaze meant we could also use high-level programming languages like C/C++ to control the robot, sample the microphone, and communicate back-and-forth between the FPGA boards. This made things simpler at the software level, but it also introduced new complications that would end up taking most of our development work to resolve. The following are some of the issues we had to resolve before the whole system would work properly:

  • How to generate correct control signals for the microphone in C
  • How to sample the mic without exhausting memory or introducing significant lag between speech command and robot execution
    • 16-bits per audio sample at 16,000Hz = 32KB/s
    • Basys3 has 128KB RAM, less memory reserved for instructions, memory-mapped I/O, etc.
  • How to debug a system at both the hardware and software level
  • How to connect peripherals to the MicroBlaze AXI-bus system
  • How to facilitate/utilize memory-mapped I/O with our peripherals
  • How to use DeepSpeech on a laptop with limited UART capabilities
    • UART at 115200 Baud, 1 stop-bit, and no parity gives a theoretical max transfer rate of 11.52KB/s

As we shall see, some of these problems required elaborate solutions, whereas others demanded a more brute-force, trial-and-error approach. These solutions will be detailed in Part II of this article, so stay tuned!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: