Skip to main content

Hacking a skeleton to detect kids costumes and greet them with a custom spooky message

How I rebuilt an animatronic skeleton's brain with an LLM and how you can too
Created on November 5|Last edited on November 5
This Halloween, I went as a Mad Scientist and being an AI evangelist, I thought it would be fitting to do something with AI to "complete my costume".
The Idea of building an AI "something" for halloween has been on my mind for a while and I'm very happy with the results, but most importantly, happy that the kids who yelled "trick or treat" finally actually got to get a... trick!
Today, I'll be walking you through how I took an animatronic skeleton's brain apart and replaced it with an LLM brain that could detect trick or treater's costumes and give them a custom, creepy spoken greeting at our doorstep.


Before we jump in, please note the code can be run on a Mac without needing a Raspberry Pi (or a Halloween toy), making it easy to experiment with the setup before committing to the full build. Give it a try, clone this repo on GitHub, install requirements and run main.py, and then just say "trick or treat!"
💡
Here's how I built it (and how you can too for next year, or maybe an AI Santa for Christmas?)

Part 1: Gathering your ingredients (the hardware and software shopping lists)

This project has both hardware and software components. Let's start with what you'll need to buy or find lying around your maker space:

Hardware

  • Raspberry Pi: Any model should suffice but I used a Pi 4 for its extra processing power. (If you have experience with Arduino that should work fine too.)
  • USB webcam: For capturing the approaching trick-or-treaters.
  • Animatronic Halloween toy: Crucially, this needs a movable mouth and light-up eyes. I used this specific talking skeleton from HomeDepot, but they have since sold out. This talking skeleton skull (and it's "brain") are exactly the same, so that should work as a replacement. It's important to have a spring-loaded jaw to make the greeting a bit spookier.
  • Bluetooth speaker: Until I figure out the amplifier situation, a Bluetooth speaker is the easiest way to handle audio output.
  • MOSFETs or transistors: For fine grained motor control)
  • Electronics Kit: Jumper wires (male-to-male, female-to-female), a screwdriver, wire strippers (very important), solder and soldering iron (optional, but cleaner), resistors (absolutely essential – learned this the hard way after burning out a few LEDs. Learn from my mistakes.)
If you're like me, and have been doing software most of your life, hardware may be surprisingly fun, but will come with a set of challenges. Stuff like the LEDs, or booting the RaspberryPI into a specific WiFi, or finding the right GPIO pins were a bit of a challenge but it was really fun to figure those out.
💡
Here's the skeleton I used:
Before the surgery
And here's what my work station looked deep into the third day of the project:
Bless this mess

Software

  • Python 3.8+: The project's code is written in Python.
  • Required libraries: Install these using pip install -r requirements.txt once you've cloned the repo. The requirements file includes: fastapi, google-generativeai, python-dotenv, weave, Pillow, pvporcupine, pvrecorder, pygame, rpi-lgpio (only if you're using a Raspberry Pi), elevenlabs and a few more supporting packages
  • APIs and services:
    • Google Gemini Vision API: For image recognition and generating prompts for the text-to-speech engine. Gemini Flash's speed and cost-effectiveness are ideal for this real-time application. You'll need a Google Cloud account and API key (the easiest way to get one is via Google AI Studio).
    • Text-to-Speech (TTS): While the code supports Cartesia and ElevenLabs, I now recommend ElevenLabs, particularly for its voice creation feature. This allows you to craft highly specific, spooky voices. You’ll need an ElevenLabs API key.
    • Picovoice Porcupine: This one's for wake word detection. It’s free for personal use and enables custom wake word training. I opted for "trick or treat." You can get an API key here.
    • W&B Weave: Weave is used throughout this application for observability, so grab your API key here.
  • ChatGPT (optional, but highly recommended): ChatGPT (especially with the advanced voice mode) proved an invaluable tool during development. I used it extensively for generating code, troubleshooting, and testing different approaches.
Now that you know what we need, let's do some brain surgery.

Part 2: Wiring up our skeleton

Dissection

Carefully open your animatronic toy. Identify the wires connected to the LEDs (eyes) and the DC motor that controls the mouth movement. Most of these toys will use a standard two-wire setup for each component. Note the wires that come from the toy’s internal battery pack—we'll use it as auxiliary power to avoid overloading the Raspberry Pi.
We can improve your brain, my friend

Raspberry Pi connections

Consult a pinout diagram for your Raspberry Pi model to correctly identify GPIO pins for power, ground, and signal control. Here's a rough diagram of the connections required.


Lighting Up the Eyes

We want our eyes to light up and our skeleton's mouth to move while it talks. Let's start with what you'll need to consider for the eyes:
  • Resistors: Connect a resistor in series with the longer, positive leg of each LED. This is crucial to prevent LED burnout. Choose an appropriate resistor value based on your LEDs and power supply (experiment if necessary and test on a breadboard if you want to reuse the lights after burning the onboard ones. Not that I did that. Not at all).
  • MOSFETs: The positive leg of each LED (after the resistor) connects to the output pin of a MOSFET. The MOSFET acts as a switch, controlling power to the LED.
  • Ground: Connect the LED's shorter leg directly to the ground rail.
  • Signal: The MOSFET's input pin connects to a GPIO pin on the Raspberry Pi. By setting the GPIO pin HIGH or LOW in your code, you'll switch the MOSFET and thus the LED on or off.

Making the mouth move

This is trickier and involves more than just switching a motor on and off. Here's the strategy:
  • MOSFET control: Like the LEDs, connect the DC motor to a MOSFET controlled by a GPIO pin on the RPi.
  • Real-time audio analysis: The core idea is to open and close the mouth in sync with the generated speech. Since I used streaming TTS, I perform real-time spectrogram analysis on the audio chunks as they arrive.
  • Sound-driven movement: The code analyzes the audio stream’s amplitude. When it detects a vowel sound (higher amplitude), the mouth opens for a set duration (e.g., 0.15 seconds). For consonants (lower amplitude), the mouth opens briefly. Silence closes the mouth. It's a hacked together approach, but it's surprisingly effective at creating a talking effect.
Some field testing with my focus group (child Pikachu)

Part 3: Coding the intelligence, or, our software set up

Here’s how to bring all those hardware components to life with AI:

Cloning the repository

First, run this:
git clone https://github.com/altryne/halloweave.git
cd halloweave

Virtual environment and dependencies

Setting up a virtual environment is best practice. I like conda but you can use virtualenv or other venvs. The whole idea here is that these modules conflict between themselves, don't globally screw with them in your computer.
Create an empty environment like a sandbox. Inside of it we will specify the modules. It will also install whatever the modules need to run (dependancies).
So the following code creates an environment named whatever you'd like (usually .venv, and you can this name this something like halloween_2024). This creates an empty sandbox for us to play in. Everything we'll install lives in this environment. This code on github has these instructions already:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Environment variables

You'll need a .env file in your project root to store API keys and other secrets. Note: never commit this file to version control. Here’s what it should contain:
GEMINI_API_KEY=your_gemini_api_key
ELEVEN_API_KEY=your_elevenlabs_api_key
PICOVOICE_ACCESS_KEY=your_picovoice_access_key
# CARTESIA_API_KEY=your_cartesia_api_key (Optional, if using Cartesia)
WANDB_API_KEY=your_wandb_api_key (If using Weave)

A high-level overview of our code

Here's what everything does:

main.py (the orchestrator)

This script sets up the FastAPI server, handles requests, manages the wake word detection, image capture, and the communication with both the LLM (Gemini) and the TTS engine (ElevenLabs). It also contains the core logic for controlling the skeleton's movements and integrates all the modules.

gemini.py (vision and prompting):

This module handles communication with the Gemini Vision API. It uploads the captured image and generates the text prompt that gets sent to the TTS engine and includes safety settings for the response.

elevenlabs_client.py / cartesia_client.py (speech synthesis):

These modules handle streaming audio from either ElevenLabs or Cartesia, and contains the logic for converting text responses into spooky speech output. You can add word timings if the api handles this (ElevenLabs).

skeleton_control.py (hardware control):

This module contains the functions for controlling the skeleton’s eyes, mouth, and body movements via the Raspberry Pi’s GPIO pins and MOSFETs. The core functions are eyes_on(), eyes_off(), start_body_movement(), stop_body_movement(), and move_mouth().
Make sure to check if you are running on a device like RaspberryPi before running. RPi.GPIO is needed only for these. on Macs use "pip install -r requirements_macosx.txt" to use simulation modes.
💡

camera_module.py (image capture):

Handles image capture with OpenCV.

sse_manager.py (real-time updates):

Manages Server-Sent Events (SSEs) to keep the web interface updated with the latest status and captured images (this is very interesting to try yourself locally).

templates/index.html (web interface):

A basic HTML file with Tailwind CSS and htmx for a control interface (on your mac or wherever you run this). Check how it updates every second with the /sse endpoint. You can trigger the skeleton locally there. There are endpoints and webhook for sending actions via a keyboard connected to a RaspberryPi. Those are handled via websockets. The rest should be available to use even via a browser on Mac/PC, etc.

Part 4: Running the show

If you run it on raspberry pi run halloween_bootstrap.sh script and it should "just work" (sadly, it doesn't!). It will try to connect to bluetooth, light up, test everything, and run its stuff. The best fix I thought of is to make this as a linux service, so I won't have to always connect via SSH and run commands and risk getting disconnected. Here's a guide how to do just that.
Some skeletons have hair

1. Raspberry Pi

Copy the halloween.service file to the appropriate systemd directory and enable the service to start on boot:
sudo cp halloween.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable halloween.service
sudo systemctl start halloween.service
The halloween_bootstrap.sh script handles activating the virtual environment, connecting to the Bluetooth speaker, and running main.py automatically upon startup. Make it executable using:
chmod +x halloween_bootstrap.sh
Use sudo journalctl -u halloween.service to check logs. Use sudo systemctl restart halloween.service to restart.

2. Anywhere else

Run python main.py. Wait until it prints "waiting for wake word" and then say "trick or treat" into your web camera.

Part 5: Weaving a web of observability (with W&B Weave)

A screenshot of my Weave dashboard for this project
Weave from Weights & Biases played a crucial role in this project's development, debugging and sanity checks. I specifically found these two things to be life-savers:
  • Multimodal logging: Because Weave now supports audio and images as well as other logging modalities, including them really makes your life easier. It allowed for inspecting not only code executions (like regular loggers or debugging logs) but the entire model flow and input/output. Since audio plays back via the BT speaker, the debugging for me always involves "is it working?" as opposed to "is something throwing random errors?"
  • Remote debugging: Being able to remotely access logs and debug the skeleton (located outside and away from keyboard) is a necessity. Having an observability framework for these real life projects becomes crucial when deploying in the "wild." Weave’s logging system makes real-time monitoring straightforward.
Here's the public Halloween Weave dashboard where you can see what the skeleton sees, and hear what he said (a.k.a. what was generated by Eleven Labs), including the spooky background music that was playing while kiddos were waiting on the LLM. This was not latency, but of course, crucial for the suspense aspect!
I'm not just being a hype man by the way. Here's a screenshot of my slack to the team, at 4:47 (that's like 10 minutes before trick or treaters show up!) thanking them for Weave helping me catch a last minute pesky camera bug.

This is the same video from above, but now that you know what went into this, it's here if you want to watch one more time.


Part 6: Some issues and thoughts for what's next

Like most projects of this nature, I ran into a few issues and a bunch of improvements I'd make next time around.
Focus group testing went quite well, thank you kindly

Mouth animation improvements

The current mouth animation is basic. Exploring more sophisticated lip-sync techniques or even using a servo motor for finer control could make the movements more realistic. I spent more time on this part than most any other part, how to analyze the audio streaming in chunk by chunk and do an analysis and figure out how wide to open the mouth. Given the motor is a simple DC motor, it's impossible to control with precision so I did the best approximation and iterated a couple of times.

Internal Speaker Integration

Eliminating the Bluetooth speaker is a major priority. This is because Bluetooth sucks.
Really though: the bluetooth speakers I had always shut off at the most uncomfortable times. I ended up plugging an Alexa device on the actual day, and connecting to that via the Bluetooth. Connecting to the onboard speaker could work, but involves reverse-engineering the toy’s audio circuit, finding the amplifier chip and it's wiring and then either soldering it or doing surgery on its internals and then soldering all these wires into its correct position. The onboard one will probably be too quiet.

Wake word and control issues

At first (as you can see in the video) I found out that the wake word mechanism doesn't work on kids voices! I don't know if it's the pitch or just the lack of kids voices in the training dataset, but it was really a drag for my project.
When I took the skeleton to my daughter's school, I had some fun with this by saying "OMG, he broke, let me fix him" and had a little screw driver, and then I said "trick or treat." But during the main show on Halloween night, I had wired up a remote control button, and also connected the Skeleton to the motion detection/door bell ring of my Eufy doorbell (via a Webhook in the FastAPI service) as I already had those things from previous Halloween installments.

Wifi issues

This was the bane of my whole project. Rpi connects to the Wifi you set it with when you first burn the image on the SD card, then if you (god forbid) try to take it to another place, you are going to have a really bad time, as there won't be any SSH for you to get into the system. I had purchased a portable router and taught the RaspberryPi to connect to that, and the router I kept connecting to different networks.
At school this worked, but this came to bite me in the butt during Halloween night, as that router was not behaving correctly—often doing some weird wifi reconnecting issues. That caused significant slow down in responses and some very much disappointed kiddos who saw the skeleton lights light up, music blasting, but a silent skeleton.

Battery issues

Check your batteries. Please. I'd worked on this Skeleton for a while, not understanding why it moves the arms just barely, as those DC motors are very cheap. And then it hit me that I haven't replaced the AA batteries it came with, like two years ago. Once I replaced those, he got to working correctly.
A young mad scientist approves of an older mad scientist's science

Conclusion

Most trick or treaters just expect the treats. This year, they got a trick too. And as an AI evangelist (and a dad), this brought me a ton of joy.
If you build something like this for your family, I'd love to see a link. I hope this in-depth breakdown inspires you to build your own AI-powered Halloween creations! Let me know what crazy ideas you come up with by dropping in a comment or finding me here on X.
Happy Halloween. And happy hacking. And happy hackoween.
Justin Tenuto
Justin Tenuto •  
all just print what it will do check
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.