ByssheBot

ByssheBot

face

ha'penny will buy a bunch of stairs." "Just like a cottage in a goat. I was an hour. THE CLOCK There's a tree that Jack built. This is none so rosy and the cow's in the ice Upon my lord A little girls of France went into each hand Five, six, Pick up with a farthing a don key!" THE WATER Over the sea; Swim, swan, swim! Swan, swan, over my riddle, I loved by the bells of a little one sea, With pretty cock, I've laid up my thumb, And over a thing in a letter; Sneeze on a clean away. Six, he can compare With a shoe To buy him die." Who wooed a gay lady. Build it by myself As the day." Jenny Wren Was a robin's son Around the garden gate, And straightway went to dance; But when she said. "May I bade her cake and good old sow is the farmer's wife, Who visited in Surrey There was an old soul was an amble! The groat it up, All the ground, Ten O'Clock Scholar Cock-a-Doodle-Do An owl came it up in the baker's To make your children sliding on Thursday, something better. Sneeze on a gold pin; In other men's ditches. THE OLD WOMAN OF TOBAGO There was a slice of Harrow. YOUNG LAMBS TO BABYLON How many were so rich in his nose in the siege of Whitechapel. "Halfpence and cheese, What did go? Because it up two a ha'penny, God bless you. WILLY Willy,

What's a ByssheBot?

ByssheBot is a blog to keep track of my attempts to create a poetry-writing robot. Markov Chains, Back Propagating Neural Nets, Genetic Algorithms - Oh! We have such fun!

Quickie Thread Demo

Simple python threading demo using a Bounded Semaphore. Threading is funner.

Mon Jun 30 07:22:55 2008 - Python Threads: More than One Way to Skin an English Major

blog image

Many of my pipe-dream features for bysshebot's poetry generator (I'm still working on it, I swear) involve feeding it lots of query-specific data for the chain generator. I.e. the human building the poem wants it to be about a specific subject, bysshebot will then query a few different places for like subject matter and feed them into the dictionary and generate the inital poem from there. The human then trims off and replaces the embarrasing portions, thereby critiqueing a poem into existence.

This could concievably take a long time. :-)

One way to speed the process up would be to perform each outside query in it's own thread, simultaneously (*chortle*) grabbing the feed data from all the different sources, then continuing with the (*snort*) creative process.

I've actually had a lot of fun coding this part, so I thought I'd share what I learned with anyone who cares.

Bysshebot is written in Python, which has supported various types of threading for a good long time. Python is my language of choice, mainly because I'm a bit too dense to handle anything lower-level, but also because nearly everything I have ever wanted to do generally has a pretty well-thought-out module that covers most of the situations I might run into. This keeps me from unveiling yet another square wheel. Threading is no exception.

Defining the Problem

I want to gather data from several websites. Depending on network traffic, the speed of the servers and the sites themselves, the data will be returned at various rates of speed. I don't want to wait around while www.lugubrious_sloth_dump.com coughs to to life and spins the dust off of it's disks while several other, faster sites sit idle. MOVE BOY!

The answer is to query all the sites as quickly as possible in no particular order, gather the various data as it returns, then, when all have finished, process it. That way all our suppliers deliver at their own pace, and only the final processing is waiting for the slowest site.

So our goals are:

For purposes of demonstration, I'll leave out the actual querying of various websites and replace it with the generation of a random wait time to simulate the query. That way, one can see how the threads are actually doing their work and returning data as it is finished. We'll use the random number as our "reply data".

This particular threading implementation will use Bounded Semaphores to both keep track of when the threads have run out of jobs to do, and to avoid more than one named instance of a thread from running at a time.

If that is confusing, it may help to think of the semaphore as the speaking feather and your threads a little Rainbow Gathering and the jobs as slices of clock-tainted reality which must be divided among the Kind People Here Today. This way, every thread may speak their peace and return to the circle to grok the fullness until the natural unfolding of the universe places the feather in their hand again, at which point more conciousness raising happens. Except threads smell pretty good and you don't lose your job.

The Code

#!/usr/bin/env python

from threading import Thread
from threading import BoundedSemaphore
import random
import time

# We are subclassing Thread in DataGrabber. DataGrabber can now do whatever
# Thread can do, plus whatever we add.
# If this is totally obvious, and beneath your abilities, why are you reading 
# this sily thing? Hmmmm?
class DataGrabber( Thread ):
	# Each thread will know it's own name, and have a tuple full of args to 
	# execute when run
	def __init__( self, name, args ):
		Thread.__init__( self )
		self.name = name
		self.args = args

	# run() is the default name for the 'working' function. When .start() is 
	# called, run() is run. :-)
	def run( self ):
		try:
			print '%s executing arg: %s'%( self.name, self.args )
			# here is our faked-up action. We just wait a random number of 
			# seconds, then return the number as our 'results' 
			result = random.randint( 0, 15 )
			time.sleep( result )
			print self.name, 'has finished, returning to namePool!'
			# stick our results in the results container.
			results[ "%s"%( self.name) ].append( ( self.args, result ) )
		finally:
			# Now that we are done, put our name back in the namePool and 
			# release the semaphore.
			namePool.append( self.name )
			semaphorePool.release()

if __name__ == "__main__":
	# We'll name our threads, so we can see what is going on. If you want to 
	# run more threads, add them here.
	namePool = [ 'Kukla', 'Fran', 'Ollie' ]
	# limit the number of threads that may be executed at any one time
	limit = len( namePool )
	# A thread must have a "semaphore" before it is allowed to run, and returns
	# the semaphore after it is finished. If you don't do this, all jobs are run
	# as quickly as they can be supplied, and you'll not have an accurate way to
	#tell when they have all finished.
	semaphorePool = BoundedSemaphore( limit )
	# Make a container to hold the returned results. In this case, a list for 
	# each name in namePool, inside a dictionary called 'results'
	results = {}
	for name in namePool:
		results[ name ] = []
	
	# The "jobs" to process
	for i in range( 10 ):
		# thread grabs a semaphore
		semaphorePool.acquire()
		# we .pop() the name off the front of the namePool list, since it is now
		# going to be busy.
		name = namePool.pop()
		print '%s got semaphore at %s'%( name, time.ctime() )
		grabData = DataGrabber( name, i )
		# Start our work! Note that the actual work is in a function called 
		# run()	
		grabData.start()

	# Now we keep processing jobs until they are done. We will know all jobs are
	# done when the namePool has filled back up.
	while len( namePool ) < limit:
		pass
	# if you are here, namePool is full, now we can display our results
	for result in results:
		print result, results[ result ]
		

Download it here.
Each job is submitted sequentially, each job is performed as it is supplied, each thread returns data as it is finished, we cannot run more threads than we have names for and most importantly, we know when the threads have run out of jobs.

This is just about the minimum you can do to have a fairly safe bunch of thread calls. There is, of course, no error reporting whatsoever and there is no scheme for resubmitting failed jobs ... but hey, it's a demo! :-)

Let's have a look at the output:

Ollie got semaphore at Mon Jun 30 11:22:40 2008
Ollie executing arg: 0
Fran got semaphore at Mon Jun 30 11:22:40 2008
Fran executing arg: 1
Kukla got semaphore at Mon Jun 30 11:22:40 2008
Kukla executing arg: 2
Kukla has finished, returning to namePool!
Kukla got semaphore at Mon Jun 30 11:22:40 2008
Kukla executing arg: 3
Kukla has finished, returning to namePool!
Kukla got semaphore at Mon Jun 30 11:22:49 2008
Kukla executing arg: 4
Fran has finished, returning to namePool!
Fran got semaphore at Mon Jun 30 11:22:50 2008
Fran executing arg: 5
Ollie has finished, returning to namePool!
Ollie got semaphore at Mon Jun 30 11:22:51 2008
Ollie executing arg: 6
Ollie has finished, returning to namePool!
Ollie got semaphore at Mon Jun 30 11:22:58 2008
Ollie executing arg: 7
Fran has finished, returning to namePool!
Fran got semaphore at Mon Jun 30 11:23:00 2008
Fran executing arg: 8
Kukla has finished, returning to namePool!
Kukla got semaphore at Mon Jun 30 11:23:00 2008
Kukla executing arg: 9
Kukla has finished, returning to namePool!
Ollie has finished, returning to namePool!
Fran has finished, returning to namePool!
Fran [(1, 10), (5, 10), (8, 15)]
Ollie [(0, 11), (6, 7), (7, 9)]
Kukla [(2, 0), (3, 9), (4, 11), (9, 4)]

In case the results lines look like a confusing mass of parentheses: The first number in each tuple is the job, the second number is the number of seconds the job took to complete.

Fran [(1, 10), (5, 10), (8, 15)]
 ^     ^   ^
 |     |   |
 |     |   +------seconds to execute job
 |     +----------job number
 +----------------thread name

Note that even though thread "Ollie" started first, and thread "Kukla" started third ...

Ollie got semaphore at Mon Jun 30 11:22:40 2008
Ollie executing arg: 0 <-- He's so swift and wiley!
Fran got semaphore at Mon Jun 30 11:22:40 2008
Fran executing arg: 1
Kukla got semaphore at Mon Jun 30 11:22:40 2008
Kukla executing arg: 2 <-- suckin' the hind ta-ta ...
... "Kukla" ended up performing one more task than either of the threads initially preceeding it.
Fran [(1, 10), (5, 10), (8, 15)] <-- Meh.
Ollie [(0, 11), (6, 7), (7, 9)] <-- Whaddeva.
Kukla [(2, 0), (3, 9), (4, 11), (9, 4)] <-- Oho! Only 24 secs total for Kukla's jobs
Which makes sense: "Kukla's" tasks took the least time.

Your mileage should always vary! :-). Play with the numbers in the jobs range to make the effect more pronounced, but this demonstrates it pretty well. It is instructive to set the number of 'jobs' to around 100, then increase the number of thread names (thereby increasing the number of 'simultaneous' jobs being run). If you'd like to add a mess of threads without coming up with names, just replace the namePool list with

namePool = [ str( i ) for i in range( 10 ) ]
and set your range accordingly.

One still gets variations due to different job durations, but a larger number of threads will cut your overall time very nicely. Of course, if you your jobs happened to be doing some hairy calculations locally, this approach is not as useful as all of these threads are run within one 'system' thread. This is a very specific example of when more threads will equal better performance. Unfortunately, this is not always the case!

Other Approaches
If you are interested in maxing every core in your processor, you might want to give the excellent Parallel Python a try. If you are pondering truly massive amounts of threads ( > 1000 ) I hereby suggest Stackless Python.

Hope this helps someone else!

Certfied by TimeCert.org

Mon May 5 07:38:00 2008 - Markoff Chaney Remix of Little Brother

blog image

The least intelligent, most self-aggrandizing and totally asinine remix of Little Brother yet today.

LICENSE: Shaky ... It's not exactly a translation, so I'm not sure if it falls under the "Translate this puppy and I'll be happy" clause. Only the most masochistic will read past the first paragraph anyway.

Click the link above, or the little image to the left here for a nice one generated for you, or download MarkovChaney from the Code section and make a nice one locally.

Local (unixish): ./MarkoffChaney `wc -c Cory_Doctorow_-_LittleBrother.txt` < Cory_Doctorow_-_LittleBrother.txt > LittleBro-Markoff.txt

I can't wait to read the unadultrated text myself! I'm probably going to Amazon one to my niece this afternoon. (Last time I saw her she had me cracking up on the fake Face Book/My Space/Whatever profiles she and her friends are having fun generating from other YASNSesses. They just thought it was funny to make up ridiculous backstories about people with offensive hair, but I found it important to explain the concept of noise and the significance of eating up adbot cycles ... I'm sure she'll care in a few years.) Certfied by TimeCert.org

Tue Feb 19 21:09:06 2008 - CAAAAAAAAMMMIIIIGGGZZZ!

blog image

From the bronzed and oiled torso of Stanley Lieber comes more two-fistulaed, hard-broiled tales of phonemenal upfuckage, technological [ana|kata]chronism, disturbingly stretched prophylactics and bitchin' visors.

Cowboy Actor Archives vol 1 and Massive Fictions are Mr. Lieber's most recently finished funnybooks that deal with the nature of god, language and the lies we tell each other by continuing insistence on the utility of the former and the solidity of the latter.

Text for CU/Farley and others have appeared on Lieber's Livejournal page in one form or another. Cowboy Actor has been a favorite of mine for a while. I have seen two of the stories in earlier, less well-rendered publications, but the new (to me) work showing President Hillary Clinton and Vice President Kerry handling the aftermath of 9-11 is sublime. Not to be missed!

Certfied by TimeCert.org

Mon Feb 18 21:42:06 2008 - One from the Grooveyard ...

blog image

One of my co-workers mentioned wanting to write a pop lyrics generator. I allowed as that was a good idear ... then remembered this silly thing I'd written a couple of years ago for Ian Christie (of Sound of the Beast fame). Ian never used it, I'm guessing because it looks like an entire ass, and his web person probably has those "standards" I keep hearing about.

It is pretty funny sometimes, though.

Amusing or not, the code is absolutely horrible! I had just banged it out in a few minutes and didn't really pay much attention to making it readable or useable for anything else. In fact, I think I put more time into coming up with funny function names than I did to actual code construction.

It works by interspersing lines from Dante's Inferno, with "Death Metal Words" and phrases. Cheap hack, fairly amusing, but just too danged ugly to foist off on others.

So I'm not making the source available. I may incorporate the idea into another aspect of the Markov Chaney game, when I actually get time to write it.

So anyway, have fun and rock for the Fly Lord and stuff. (Just keep hitting reload for the full random effect.)

Certfied by TimeCert.org

Fri Feb 1 21:59:01 2008 - Initial Markoff Chaney Release (From README)

blog image

Prints Markov Chains from stdin. If no length is specified, prints a 21-word chain.

LICENSE: Oh, come on ...

USE: MarkoffChaney int <file

EXAMPLE: "MarkoffChaney 100 < file" prints a 101 word Markov Chain.

EXAMPLE: "MarkoffChaney < file" prints a default 21 word Markov Chain.

REQUIRES: python2.2 or better. Has not been tested with 3k.

BUGS: Probably won't be very fun with non-text input. If stdin is not terminated with a linefeed, it'll most likely just hang around until doomsday.

INSTALL: Somewhere in your path.

WINDOWS USERS: You'll probably need to add the .py extention to the file name to make it work. Sorry, I haven't tested it on any windows version yet.

SUGGESTION: get Ayn Rand's "Anthem" from Project Gutenberg for a 'seed' text. It makes some pretty neat cut-up poetry. Then go and read Zamyatin's "We". Grind your teeth.

What is a Markov Chain, and why should you care?

A really nice explaination of Markov Chains and all their weird and interesting properties is available at http://en.wikipedia.org/wiki/Markov_chain If you are still interested in what this particular Markov Chain script does, keep reading!

Markoff Chaney
-----------------

This particular program differs a little from usual Markov Chaining programs. The next 'link' in our chain is chosen, not from the single most common word that immediately follows the preceeding word in the sampled text, but from a random choice from a set (actually a python list, not to quibble ...) of all words that immediately follow.

For example, if we were starting a chain with a random word chosen from a sample text (say 'A'). We'll call this the "present" word in the chain. The words that come immediately after the present word in our sample text we'll call 'future' words.

Our program looks through the text and finds that each occurence of the word 'A' is immediately followed by these words:

['cat', 'dog', 'dog', 'dog', 'bird', 'cat']

The usual method is to take the word that appears most often in the list (in this case 'dog'), use it as our next "link" then perform the same action on it.

This program instead chooses a random word from the list as our next "link". Since there are more occurences of 'dog' in the set, 'dog' is the most likely word to be chosen, but not the only choice. This still fits my understanding of a Markov Chain, since future states are dependent only on the present state. Mathematicians are invited to send clarifications and critiques.

I actually chose this method because the output seemed to be funny more often.

Other (read: better) programmers will look at this method and point at me while hoo-hooing ape noises. Agreed, its a total Flintstone hack on my part that probably wastes memory and causes goiters. Mea Culpa. Mea Maxima Culpa.

The idea for writing this script came from Magnus Wurzer's Jens Ohlig's very nice "Candyland" explanation called Markov Chains Monkeys, typewriters, and the Markov chain reaction presented at RoboExotica 2007. The script is named for the Markoff Chaney character in Robert Anton Wilson's "Schrodinger's Cat" books.

"Teddy Snow Crop! Childhood! Innocence! ..."

Certfied by TimeCert.org