Week 9 -- String Parsing and Network Connectivity

back to syllabus

Throughout the semester, we've studied how to generate visuals using programming, specifically within the processing environment. This week, we'll look at the techniques available in Java to deal with text, and how we might use these techniques within processing to load data from a file or network connection.

The String Class

  • int indexOf(int ch): Returns the index within this string of the first occurrence of the specified character.
  • String substring(int beginIndex, int endIndex): Returns a new string that is a substring of this string starting at character number beginIndex and ending at endIndex - 1.
  • int length(): Returns the length of this string.
  • In processing, we have all the functionality of the java String class available to us. The reference page for Strings is available here:
    http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html

    (Also note the link to the main Javadocs: http://java.sun.com/j2se/1.4.2/docs/api).

    A String at its core is really just a fancy array of characters that we can manipulate like an array with some nice additions. Let's take a closer look at three String class functions: indexOf, substring, and length.

    indexOf locates a sequence of characters within a string. For example, run this code and examine the result:
    String sentence = "The quick brown fox jumps over the lazy dog.";
    println(sentence.indexOf("quick"));
    println(sentence.indexOf("fo"));
    println(sentence.indexOf("The"));
    println(sentence.indexOf("blah blah"));
    
    Note that indexOf returns a 0 for the first character, and a -1 if the search phrase is not part of the String.

    After we find a certain search phrase within a String, we might want to pull out part of the string and save it in a different variable. This is what we call a "substring" and we can use java's substring function to take care of this task. Examine and run the following code:

    String sentence = "The quick brown fox jumps over the lazy dog.";
    String phrase = sentence.substring(4,9);
    println(phrase);
    
    Note that the substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex.

    At any given point, we might also want to access the length of the String. We can accomplish this by calling the length function.
    String sentence = "The quick brown fox jumps over the lazy dog.";
    println(sentence.length());
    
    Note this is different than accessing the length of an array in processing. Here we are calling the length function available to us within the String class, and therefore must also have the open and close parentheses -- length() -- associated with calling a function that has no arguments.

    splitStrings, splitInts, etc.

    Links to processing reference pages
  • loadStrings
  • loadBytes
  • splitInts
  • splitFloats
  • splitStrings
  • Another function we have available to us in processing is splitStrings. splitStrings separates a group of strings embedded into a longer string into an array of strings. If no split character is specified, a blank space will be used as the split character, i.e. delimiter. splitInts separates a group of ints embedded into a string into an array of ints.

    Examine and run the following code:
    String spaceswords = "The quick brown fox jumps over the lazy dog.";
    String list1[] = splitStrings(spaceswords);
    println(list1[0]);
    println(list1[1]);
    
    String commaswords = "The,quick,brown,fox,jumps,over,the,lazy,dog.";
    String list2[] = splitStrings(commaswords, ',');
    for (int i = 0; i < list2.length; i++) {
      println(list2[i] + " " + i);
    }
    
    //calculate sum of a list of numbers in a string
    String numbers = "8,67,5,309";
    int list[] = splitInts(numbers, ',');
    int sum = 0;
    for (int i = 0; i < list.length; i++) {
      sum = sum + list[i];
    }
    println(sum);
    

    Loading text from a file or URL

    Converting from a String to an integer:
    A common problem is retrieving information from the web is getting a value in as a String, but wanting to use it as a number within processing. Following is an easy solution:

    String s = "135";
    int num = Integer.parseInt(s);
    In order to get some text that we can do something interesting with, we use the loadStrings function. loadStrings reads the contents of a file or url and creates a String array of its individual lines. If a file is specified, it must be located in the sketch's "data" directory/folder.

    In other words, say we had the following text file: file.txt.
    Download this file, place it in your sketch's data folder and run the following code:
    String lines[] = loadStrings("file.txt");
    println("there are " + lines.length + " lines");
    for (int i=0; i < lines.length; i++) {
      println(lines[i]);
    }
    
    (Also, try running the above code with String lines[] = loadStrings("http://www.yahoo.com");)

    Once we have the text, we sometimes don't want to deal with it as an array of Strings (each element representing one line from the source). Although this is extremely convenient for certain types of operations if we want to search for one keyword or phrase, it's useful to convert the array into one long string. This can be accomplished as follows:
    String onelongstring = "";
    for (int i = 0; i < lines.length; i++) {
      onelongstring = onelongstring + lines[i];
    }
    
    After this, we can start using our string parsing functions to extract specific pieces of information from the long string.

    Mining for Data

    We know that to look at a web page, we type the URL path into the location box of our browser window, i.e. http://www.google.com. However, we often want to pass information to a URL in order to generate custom results, such as typing in a search phrase into google's searchbox. This information is passed via the end of the url (after a '?') -- note this is similar to the idea of passing parameters to a function.

    These "parameters" are usually organized into "name value pairs" like this:
    http://www.someurl.com?nameofVar1=valueOfVar1&nameofVar2=valueOfVar2&nameofVar3=valueOfVar3.

    If we search on yahoo for "keyword", for example, you'll notice the path looks like this:
    http://search.yahoo.com/search?p=keyword&ei=UTF-8&fr=FP-tab-web-t-173&fl=0&x=wrt

    HTML forms are the usual device for generating and formating these variables at the end of the url but we will be using processing (loadstrings) instead so we need to be clever about reverse engineering this information. For example, let's say we want to get the temperature in a given zip code. We can go to:

    http://www.accuweather.com

    We then type a zip code into their form, and find ourselves at:

    http://wwwa.accuweather.com/adcbin/public/local_index.asp?zipcode=10013&partner=accuweather

    After analyzing the format of accuweather's URL parameters, we can then generate the following processing code:
    String zip = "10013";
    String url = "http://wwwa.accuweather.com/adcbin/public/local_index.asp?zipcode=" + zip + "&partner=accuweather";
    String[] lines = loadStrings(url);
    

    Sandbox

    When running your program within the processing development environment, you're free to reach out across ports and networks. We've seen this with serial I/O, video, etc. However, when running your program as an applet within a browser, there are certain security requirements (again, as we've seen with serial, video, etc.). If you want to connect to a URL on the same server, you're ok. For example, if your applet is on stage, the following code is ok:

    String[] lines = loadStrings("http://stage.itp.nyu.edu/ICM");

    The following is not:

    String[] lines = loadStrings("http://www.yahoo.com");

    A solution for this problem is to create a proxy script that lives on the server with your applet, connects to a URL and passes that information back to your applet. The easiest way to do this (for applets on stage) is to use this path, where I've permanently stored the PHP proxy. Here's what your code would look like:
    //enter whatever URL you want to load
    String url = "http://www.yahoo.com";
    //this is the permanent address for my php proxy
    String proxy = "http://stage.itp.nyu.edu/ICM/shiffman/proxy/loadstrings.php?url=";
    //put the two together and load
    String[] lines = loadStrings(proxy + url);
    
    If you prefer to create your own proxy script, follow these instructions:

  • Create a text file called 'loadstrings.php'
  • Copy this linked text into your text file
  • .
  • Before exporting to an applet, change your loadStrings code to include the proxy, with the following syntax:
    loadStrings("loadstrings.php?url=http://www.yahoo.com");
  • Copy loadstrings.php into your applet folder and make sure you upload it along with the JAR, pde, and HTML files!
  • .

    Another solution for this security issue is to sign your applet, which requires the user to authorize the applet, acknowledging that they "trust" you. For instructions on how to do this, visit Shawn Van Every's page here:

    http://stage.itp.nyu.edu/~sve204/wiki/wiki.pl?SigningAnApplet.

    Examples

    Bringing it all together, here are a set of examples that load data from a web page or file, parse it using String manipulation techniques, and create a simple visualization of the data.

    Simple loadStrings from URL
    Simple loadStrings from text file
    Parsing Accuweather
    Parsing CNN.com
    Advanced loadstrings from text file

    As you look at these examples, consider how you might make these more efficient and general as to work with many different inputs.

    Assignment

    Write a processing program that uses input from a text file or URL to generate a visual output. Feel free to use code from any of the above examples.

    back to syllabus