So, you want to use the Kinect in Processing. Great. This page will serve to document the current state of my Processing Kinect library, with some tips and info.
Before you proceed, however, you may want to instead consider using the wonderful SimpleOpenNI library and Greg Borenstein’s Making Things See book! OpenNI has lots of features (skeleton tracking, gesture recognition, etc.) that are not available in this library.
I’m ready to get started right now
- Download library and examples: openkinect.zip
- Source code: https://github.com/shiffman/libfreenect/tree/master/wrappers/java/processing
- Open Kinect: openkinect.org
What hardware do I need?
First you need a “stand-alone” kinect. You do not need to buy an Xbox. If you get the stand-alone kinect listed below, it will come with a USB adapter and power supply.
If you have a kinect that came with an XBox, it will not include the USB adapter. You’ll need to purchase this separately:
Um, what is Processing?
I’m going to assume you are familiar with Processing, but just in case you are not, I suggest checking out: processing.org (Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.)
What if I don’t want to use Processing?
If you are comfortable with C++ I suggest you consider using openFrameworks or Cinder with the Kinect. At the time of this writing, these environments have some features I haven’t implemented yet and you also get a C++ speed advantage when processing the depth data, etc.:
More resources from: The OpenKinect Project
I’ve got Processing, how do I download and install the library?
By default Processing will create a “sketchbook” folder in your Documents directory, i.e. on my machine it’s:
/Users/daniel/Documents/Processing/
If there isn’t already, create a folder called “libraries” there, i.e.
/Users/daniel/Documents/Processing/libraries/
Then go and download openkinect.zip and extract it in the libraries folder, i.e. you should now see
/Users/daniel/Documents/Processing/libraries/openkinect/
/Users/daniel/Documents/Processing/libraries/openkinect/library/
/Users/daniel/Documents/Processing/libraries/openkinect/examples/
etc.
Restart Processing, open up one of the examples in the examples folder and you are good to go!
More about installing libraries:
http://wiki.processing.org/w/How_to_Install_a_Contributed_Library
http://www.learningprocessing.com/tutorials/libraries/
At the moment, my library only works with Mac OS X (intel, 10.5 or 10.6 should both be ok). Hopefully this will be remedied soon enough. However, if you are interested in working with Kinect on windows, I recommend SimpleOpenNI.
What code do I write?
To get started using the library, you need to include the proper import statements at the top of your code:
import org.openkinect.*; import org.openkinect.processing.*;
As well as a reference to a “Kinect” object, i.e.
// Kinect Library object Kinect kinect;
Then in setup() you can initialize that kinect object:
kinect = new Kinect(this); kinect.start();
Once you’ve done this you can begin to access data from the kinect sensor.
Currently, the library makes data available to you in four ways:
- RGB image from the kinect camera as a PImage.
- Grayscale image from the IR camera as a PImage
- Grayscale image with each pixel’s brightness mapped to depth (brighter = closer).
- Raw depth data (11 bit numbers between 0 and 2048) as an int[] array
Let’s look at these one at a time. If you want to use the Kinect just like a regular old webcam, you can request that the RGB image is captured:
kinect.enableRGB(true);
Then you simply ask for the image as a PImage!
PImage img = kinect.getVideoImage(); image(img,0,0);
Alternatively, you can enable the IR image:
kinect.enableIR(true);
Currently, you cannot have both the RGB image and the IR image. They are both passed back via getVideoImage() so whichever one was most recently enabled is the one you will get.
Now, if you want the depth image, you can:
kinect.enableDepth(true);
and request the grayscale image:
PImage img = kinect.getDepthImage(); image(img,0,0);
As well as the raw depth data:
int[] depth = kinect.getRawDepth();
If you are looking at the raw depth data only, you can turn off the library’s behind the scenes depth image processing to make it slightly more efficient:
kinect.processDepthImage(false);
Finally, you can also adjust the camera angle with the tilt() function, i.e.:
float deg = 15; kinect.tilt(deg);
So, there you have it, here are all the useful functions you might need to use the Processing kinect library:
- enableRGB(boolean) — turn on or off the RGB camera image
- enableIR(boolean) — turn on or off the IR camera image
- enableDepth(boolean) — turn on or off the depth tracking
- processDepthImage(boolean) — turn on or off the depth image processing
- PImage getVideoImage() — grab the RGB or IR video image
- PImage getDepthImage() — grab the grayscale depth map image
- int[] getRawDepth() — grab the raw depth data
- tilt(float) — adjust the camera angle (between 0 and 30 degrees)
Where’s the javadoc?
Stay tuned, I’ll get something up soon. All the source is here:
https://github.com/shiffman/libfreenect/tree/master/wrappers/java/processing
So now what?
So far, I only have three basic examples:
Display RGB, IR, and Depth Images
Code:RGBDepthTest
(NOTE: KNOWN BUG IN IR IMAGE RIGHT NOW)
This example does nothing but use all of the above listed functions to display the data from the kinect sensor.
Point Cloud
Code: PointCloud
Here, we’re doing something a bit fancier. Number one, we’re using the 3D capabilities of Processing to draw points in space. You’ll want to familiarize yourself with translate(), rotate(), pushMatrix(), popMatrix(). This tutorial is also a good place to start. In addition, the example uses a PVector to describe a point in 3D space. More here: PVector tutorial.
The real work of this example, however, doesn’t come from me at all. The raw depth values from the kinect are not directly proportional to physical depth. Rather, they scale with the inverse of the depth according to this formula:
depthInMeters = 1.0 / (rawDepth * -0.0030711016 + 3.3309495161);
Rather than do this calculation all the time, we can precompute all of these values in a lookup table since there are only 2048 depth values.
float[] depthLookUp = new float[2048]; for (int i = 0; i < depthLookUp.length; i++) { depthLookUp[i] = rawDepthToMeters(i); } float rawDepthToMeters(int depthValue) { if (depthValue < 2047) { return (float)(1.0 / ((double)(depthValue) * -0.0030711016 + 3.3309495161)); } return 0.0f; }
Thanks to Matthew Fisher for the above formula. (Note: for the results to be more accurate, you would need to calibrate your specific kinect device, but the formula is close enough for me so I’m sticking with it for now. More about calibration in a moment.)
Finally, we can draw some points based on the depth values in meters:
for(int x = 0; x < w; x += skip) { for(int y = 0; y < h; y += skip) { int offset = x+y*w; // Convert kinect data to world xyz coordinate int rawDepth = depth[offset]; PVector v = depthToWorld(x,y,rawDepth); stroke(255); pushMatrix(); // Scale up by 200 float factor = 200; translate(v.x*factor,v.y*factor,factor-v.z*factor); // Draw a point point(0,0); popMatrix(); } }
Average Point Tracking
The real magic of the kinect lies in its computer vision capabilities. With depth information, you can do all sorts of fun things like say: "the background is anything beyond 5 feet. Ignore it!" Without depth, background removal involves all sorts of painstaking pixel comparisons. As a quick demonstration of this idea, here is a very basic example that compute the average xy location of any pixels in front of a given depth threshold.
Source: AveragePointTracking
In this example, we declare two variables to add up all the appropriate x's and y's and one variable to keep track of how many there are.
float sumX = 0; float sumY = 0; float count = 0;
Then, whenever we find a given point that complies with our threshold, we add the x and y to the sum:
if (rawDepth < threshold) { sumX += x; sumY += y; count++; }
When we're done, we calculate the average and draw a point!
if (count != 0) { float avgX = sumX/count; float avgY = sumY/count; fill(255,0,0); ellipse(avgX,avgY,16,16); }
Why don't the RGB images and depth values correspond properly?
Unfortunately, b/c the RGB camera and the IR camera are not physically located in the same spot, we have a stereo vision problem. Pixel XY in one image is not the same XY in an image from a camera an inch to the right. I'm hoping to stretch my brain to try to understand this better and work out some examples that calibrate the data in Processing. Stay tuned!
If you are interested in more (and software that will do this very job!) check out Nicolas Burrus' amazing work:
Theory on depth/color calibration and registration
version 0.3 of RGBDemo
What's missing?
Lots! Open a github issue if you want to add an item to my to do list!
https://github.com/shiffman/libfreenect/issues
FAQ
1. What are there shadows in the depth image?
2. What is the range of depth that the kinect can see?
~0.7–6 meters or 2.3–20 feet. Note you will get black pixels (or raw depth value of 2048) at both elements that are too far away and too close.