Word Wrap in Processing

I’ve been meaning to add something to processing hacks for quite some time now. This morning, I needed a basic function to wrap text in Processing so came up with this snippet.

// Function to return an ArrayList of Strings
// (maybe redo to just make simple array?)
// Arguments: String to be wrapped, maximum width in pixels of line
ArrayList wordWrap(String s, int maxWidth) {
  // Make an empty ArrayList
  ArrayList a = new ArrayList();
  float w = 0;    // Accumulate width of chars
  int i = 0;      // Count through chars
  int rememberSpace = 0; // Remember where the last space was
  // As long as we are not at the end of the String
  while (i < s.length()) {
    // Current char
    char c = s.charAt(i);
    w += textWidth(c); // accumulate width
    if (c == ' ') rememberSpace = i; // Are we a blank space?
    if (w > maxWidth) {  // Have we reached the end of a line?
      String sub = s.substring(0,rememberSpace); // Make a substring
      // Chop off space at beginning
      if (sub.length() > 0 && sub.charAt(0) == ' ') {
        sub = sub.substring(1,sub.length());
      }
      // Add substring to the list
      a.add(sub);
      // Reset everything
      s = s.substring(rememberSpace,s.length());
      i = 0;
      w = 0;
    } 
    else {
      i++;  // Keep going!
    }
  }
 
  // Take care of the last remaining line
  if (s.length() > 0 && s.charAt(0) == ' ') {
    s = s.substring(1,s.length());
  }
  a.add(s);
 
  return a;
}

Alien vs. Predator

Alien vs. Predator

This is a quick visualization of data from the netflix prize. A vertical bar is drawn for every customer rating a movie. Ratings go from 1 to 5 stars (represented top to bottom.) Note how “Alien” (on the left) received many ratings of 4 and 5 stars, but “Predator” (on the right) mostly received ratings of 4 stars. This depicts approximately 50,000 customer ratings.

Netflix Challenge

Netflix recently released 100 million movie rating records as part of a contest to improve its movie recommendation system.

The problem:

I know how I rated a whole bunch of movies. I know how everyone else has rated a whole bunch of movies. For any given movie that I have not yet rated (but others have), predict how I would rate it based on my and everyone else’s rating history. Netflix uses the root mean squared error (RMSE) to evaluate results. In other words, let’s guess that I would give the movie Purple Rain a rating of 5, when in reality, I would only rate it a 4. And let’s also guess that I would rate Singin’ in the Rain a 3.5 when my true rating is a 5. Here’s how we would calculate the RMSE:

Purple Rain Prediction Error:  5 - 4 = 1
Singin' in the Rain Prediction Error: 3.5 - 5 = -1.5
 
Squaring each error:  1*1 = 1, -1.5*-1.5 = 2.25
Add the squares of all errors together = 3.25
 
MSE = Sum of Squares divided by Total Guesses = 3.25 / 2 = 1.625
 
RMSE = square root of MSE = sqrt(1.625) = 1.275

Let’s take a simple algorithm to solve the problem: for any user rating any movie, predict a future rating as the global average rating for that movie. This algorithm produces an RMSE of 1.05, not too shabby. The RSME for Netflix’s Cinematch system (which presumably employs collaborative filtering techniques) is around 0.95, a mere 10% improvement. The problem is indeed a difficult one. Netflix will award a one million dollar prize to anyone who can improve the system by an additional 10%.

I submitted my first prediction file today, mostly as a test, nowhere near the leaderboard, with the following algorithm:

A customer C will rating a movie M based on the following function:

rating(C,M) = 0.5 * (the global netflix average rating for movie M) + 0.5 * (the customer’s average rating)

My RMSE?

Your prediction file submitted 2006-10-10 21:31:56 has been decompressed and processed.
The computed RMSE for the quiz subset was 1.0147.

More to come. . .

Asynchronous HTTP Requests in Processing, now with callbacks!

I’m working on a new library that makes asynchronous http requests (web pages, xml feeds, etc.) in Processing without blocking possible. It runs its own thread and uses a callback (just like with the serial, video, etc. libraries) This developed out of a need that I noticed in student projects in my Introduction to Computational Media course at ITP.

It’s all very rough and could use some better documentation, but I thought I might let folks take a look, test it out, and provide feedback. Download and read about it. Source code is in the zip for the curious.

It needs a better name, clearly. . .