Department for Electronic Systems and Information Processing |
|
WeSerRV
Writing your own transforms
Specifics on writing a Netglub transform in general are described here.
These instructions will focus on writing a transform useful for academic research, such as those WeSeRV uses.
To write a transform that extracts structured information from a specific web site, it is usually enough to know basic HTTP and HTML to understand how the information is presented there. Once that is understood, it's easy to use Python with the Requests and BeautifulSoup libraries to extract the needed information.
WeSeRV transformations usually have a following structure:
When there are no more entities, go to step one, retrieving the next web page with results
You can test a transform locally like this:
./transform phrase "computer forensics" "" numResults 5 minYear 2000 maxYear 2010
Notice the empty double quotes "" as the delimiter between the input entity and parameters.
An example of a typical transform including additional explanations is PhraseToBookLOC, using an input phrase to extract information about books from the Library of Congress website (loc.gov).
More transform examples can be found here.