Week 9 -- String Parsing and Network Connectivity
back to syllabusThroughout the semester, we've studied how to generate visuals using programming, specifically within the processing environment. This week, we'll look at the techniques available in Java to deal with text, and how we might use these techniques within processing to load data from a file or network connection.
The String Class
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html
(Also note the link to the main Javadocs: http://java.sun.com/j2se/1.4.2/docs/api).
A String at its core is really just a fancy array of characters that we can manipulate like an array with some nice additions. Let's take a closer look at three String class functions: indexOf, substring, and length.
indexOf locates a sequence of characters within a string. For example, run this code and examine the result:
String sentence = "The quick brown fox jumps over the lazy dog.";
println(sentence.indexOf("quick"));
println(sentence.indexOf("fo"));
println(sentence.indexOf("The"));
println(sentence.indexOf("blah blah"));
Note that indexOf returns a 0 for the first character, and a -1 if the search phrase is not
part of the String.
After we find a certain search phrase within a String, we might want to pull out part of the string and save it in a different variable. This is what we call a "substring" and we can use java's substring function to take care of this task. Examine and run the following code:
String sentence = "The quick brown fox jumps over the lazy dog."; String phrase = sentence.substring(4,9); println(phrase);Note that the substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
At any given point, we might also want to access the length of the String. We can accomplish this by calling the length function.
String sentence = "The quick brown fox jumps over the lazy dog."; println(sentence.length());Note this is different than accessing the length of an array in processing. Here we are calling the length function available to us within the String class, and therefore must also have the open and close parentheses -- length() -- associated with calling a function that has no arguments.
splitStrings, splitInts, etc.
Links to processing reference pagesExamine and run the following code:
String spaceswords = "The quick brown fox jumps over the lazy dog.";
String list1[] = splitStrings(spaceswords);
println(list1[0]);
println(list1[1]);
String commaswords = "The,quick,brown,fox,jumps,over,the,lazy,dog.";
String list2[] = splitStrings(commaswords, ',');
for (int i = 0; i < list2.length; i++) {
println(list2[i] + " " + i);
}
//calculate sum of a list of numbers in a string
String numbers = "8,67,5,309";
int list[] = splitInts(numbers, ',');
int sum = 0;
for (int i = 0; i < list.length; i++) {
sum = sum + list[i];
}
println(sum);
Loading text from a file or URL
Converting from a String to an integer:A common problem is retrieving information from the web is getting a value in as a String, but wanting to use it as a number within processing. Following is an easy solution:
String s = "135";
int num = Integer.parseInt(s); In order to get some text that we can do something interesting with, we use the loadStrings function. loadStrings reads the contents of a file or url and creates a String array of its individual lines. If a file is specified, it must be located in the sketch's "data" directory/folder.
In other words, say we had the following text file: file.txt.
Download this file, place it in your sketch's data folder and run the following code:
String lines[] = loadStrings("file.txt");
println("there are " + lines.length + " lines");
for (int i=0; i < lines.length; i++) {
println(lines[i]);
}
(Also, try running the above code with String lines[] = loadStrings("http://www.yahoo.com");)
Once we have the text, we sometimes don't want to deal with it as an array of Strings (each element representing one line from the source). Although this is extremely convenient for certain types of operations if we want to search for one keyword or phrase, it's useful to convert the array into one long string. This can be accomplished as follows:
String onelongstring = "";
for (int i = 0; i < lines.length; i++) {
onelongstring = onelongstring + lines[i];
}
After this, we can start using our string parsing functions to extract specific
pieces of information from the long string.
Mining for Data
We know that to look at a web page, we type the URL path into the location box of our browser window, i.e. http://www.google.com. However, we often want to pass information to a URL in order to generate custom results, such as typing in a search phrase into google's searchbox. This information is passed via the end of the url (after a '?') -- note this is similar to the idea of passing parameters to a function.These "parameters" are usually organized into "name value pairs" like this:
http://www.someurl.com?nameofVar1=valueOfVar1&nameofVar2=valueOfVar2&nameofVar3=valueOfVar3.
If we search on yahoo for "keyword", for example, you'll notice the path looks like this:
http://search.yahoo.com/search?p=keyword&ei=UTF-8&fr=FP-tab-web-t-173&fl=0&x=wrt
HTML forms are the usual device for generating and formating these variables at the end of the url but we will be using processing (loadstrings) instead so we need to be clever about reverse engineering this information. For example, let's say we want to get the temperature in a given zip code. We can go to:
http://www.accuweather.com
We then type a zip code into their form, and find ourselves at:
http://wwwa.accuweather.com/adcbin/public/local_index.asp?zipcode=10013&partner=accuweather
After analyzing the format of accuweather's URL parameters, we can then generate the following processing code:
String zip = "10013"; String url = "http://wwwa.accuweather.com/adcbin/public/local_index.asp?zipcode=" + zip + "&partner=accuweather"; String[] lines = loadStrings(url);
Sandbox
When running your program within the processing development environment, you're free to reach out across ports and networks. We've seen this with serial I/O, video, etc. However, when running your program as an applet within a browser, there are certain security requirements (again, as we've seen with serial, video, etc.). If you want to connect to a URL on the same server, you're ok. For example, if your applet is on stage, the following code is ok:String[] lines = loadStrings("http://stage.itp.nyu.edu/ICM");
The following is not:
String[] lines = loadStrings("http://www.yahoo.com");
A solution for this problem is to create a proxy script that lives on the server with your applet, connects to a URL and passes that information back to your applet. The easiest way to do this (for applets on stage) is to use this path, where I've permanently stored the PHP proxy. Here's what your code would look like:
//enter whatever URL you want to load String url = "http://www.yahoo.com"; //this is the permanent address for my php proxy String proxy = "http://stage.itp.nyu.edu/ICM/shiffman/proxy/loadstrings.php?url="; //put the two together and load String[] lines = loadStrings(proxy + url);If you prefer to create your own proxy script, follow these instructions:
loadStrings("loadstrings.php?url=http://www.yahoo.com");
Another solution for this security issue is to sign your applet, which requires the user to authorize the applet, acknowledging that they "trust" you. For instructions on how to do this, visit Shawn Van Every's page here:
http://stage.itp.nyu.edu/~sve204/wiki/wiki.pl?SigningAnApplet.
Examples
Bringing it all together, here are a set of examples that load data from a web page or file, parse it using String manipulation techniques, and create a simple visualization of the data.Simple loadStrings from URL
Simple loadStrings from text file
Parsing Accuweather
Parsing CNN.com
Advanced loadstrings from text file
As you look at these examples, consider how you might make these more efficient and general as to work with many different inputs.
Assignment
Write a processing program that uses input from a text file or URL to generate a visual output. Feel free to use code from any of the above examples.back to syllabus