Bulding a Word-Based Generator II: Using the "Page of Generators" standard PHP Code.
by Steven Savage
This document will walk you through the actual creation of the "Space Phenomena" generator. The idea being to create a generator to make interesting ideas for unusual space phenomena - maybe not scientific, more for Space Opera type stories ala Star Trek.
This is what I call a "Mid-Range" generator. It's neither a simple (since we're going to deal with some complex names) nor hideously complex (there won't be any truly complicated dependencies and rules). It's a good example of how to create something useful that also doesn't involve microscopic rule management
(I'm not against microscopic rule management, but there are times it gets old and times you don't need it).
We'll be using my standard basic word generator code. This code is suited for combining words based on classicication and in certain relevant orders. It does not involve self-aware data - but if we wanted more than a name, it would have.
Sadly, I can't guarantee anything about this document or code or what it will do - if you use any of it, you're resonsible for the results or lack of the same. With so many different browsers, servers, versions of PHP, etc. I can't make any guarantees.
Now, let's get going.
PART 1: Example Data
So, let's take some example phenomena from real life and science fiction:
Temporal Rift - We've seen
enough of these in "Voyager" to last a lifetime.
Black Hole - The Old Classic.
Unstable Wormhole - Delta Quadrant, anyone?
Subspace Disturbance - Always interferring with communications.
Stellar Nursrey - Those places stars come from.
Neutron Star - It's a real phenomena.
NOTE: I find, that to make a good generator, you need a minimum of five to ten examples to analyze, depending on complexity, and you need a representative sample. This will usually let you get a basic starting point - but prepare to alter and expand what you learn.
Analyzing this we find a very common structure - what I usually call Descriptor/Object and Actor - something to describe or associate with the the actor, the "centerpiece" of the term or combination.
But, if we look, we see some interesting things:
1) The word "Unstable" could describe a describer, such as an "Unstable Black Wormhole." So some descriptors can describe descriptors. Let's call them Metadescriptors.
2) The word "Nursery" could quantify a regular descriptor-actor pairing, such as "Black Hole Nursery." So you have an "Actor" that modifies an existing Actor. Let's call this a "Metaactor."
NOTE: It's important when analyzing your data to be aware that some classifications of words can actually be split in two. This kind of deep analyses allows you to make richer generators.
NOTE: Don't be afraid to revise your ideas while designing. You rarely get it the best the first or even second time.
So, let's look at our new classifications
PART 2: Patterns of Data and testing them
It's not too hard to look at our examples and find patterns the data can appear in:
So let's pick words at random and see what we get:
Unstable Stellar Wormhole
Unstable Disturbance Nursery
Stellar Star Nursery
Unstable Neutron Rift Nursery
Now that we've created some examples via "mental randomization," let's look them over.
Most of these are decent, but the Metaaactor "Nursery" makes them sound kind of lame. However, the basic idea of a Metaactor works - so let's come up with another Metaactor or two and see if they sound better.
Well, a Stellar Nursery is a place stars are born. So Metaactors basically represent groupings, relations, etc. So let's add two more Metaactors - Nexus and Confluence. Sounds science-fictiony.
And let's pop these names in place of Nursery in our above examples
Unstable Disturbance Nexus
Unstable Disturbance Confluence
Unstable Neutron Rift Nexus
Unstable Neutron Rift Confluence
OK, these sound better. So the idea of the Metaactor classification works, it's just that what inspired it, the word "Nursery" sounds kind of lame. But let's keep it in the vocabulary. It'll be complete and it may work.
NOTE: If you come up with a concept that sounds bad with the words that led you to develop it, see if it gets any better by adding similar words. It could be your limited data set.
NOTE: Some words are questionable in their usefulness. My basic rule is that unless it appears that a word will never be useful, keep it in, especially if it was in your original sources. You can always remove it later.
PART 3: Re-evaluate data
Now, we've seen our ideas in action. We've got a plan. We've got a structure that works reasonably well. Is there anything we may be forgetting?
Well, one hears about things like "Protostars." Maybe we need a "prefix" we can put in front of Actors like Proto, Quasi, etc.
Let's add "Proto" to the front of the Actor in the above examples that use Actors
Unstable Stellar Proto-Wormhole
Unstable Proto-Disturbance Nursery
Stellar Proto-Star Nursery
Unstable Neutron Proto-Rift Nursery
Looking it over, this should be another option - prefixes, just to make things "extra cooler. So now we've got a whole new section of data. However, as this is technically a descriptor, it probably should replace descriptors, not work with them
Our kinds of data include:
Our combinations of data now include:
That's also twelve possible combinations of words. If we get a good set of vocabulary, this amount of combinations will mean producing a lot of mixes.
So, now what? Well, we develop it.
PART 4: Test Run
Using the Basic Word generator code (well, actually, a copy of it from another project), I changed the files and entered some basic data - both what we had above and a few more words. Here's an example of what was generated. I put an asterisk (*) by ones I particularly likes and felt inspired by.
*Galactic Vortex Node
Induced Hole Nexus
Rotating White Planet
Stable Un-Cloud Vortex
*Unstable Antimatter Wormhole
Looking these over, the worst one is "Stellar Star" (sort of redundant, isn't it?) There are five good ones out of ten, and the rest aren't egregeous (though "Rotating White Planet" sounds like some kind of Generic World). That's actually not bad - a fifty percent "cool" rate is pretty good first thing out with a limited vocabulary.
But let's do one more run:
*Galactic Stellar Cloud Vortex
*Rotating Galactic Cluster System
*Rotating Planet Node
*Stable Black String
*Unstable Cluster System
This one has eight decent out of 10. There's nothing as bad as our friend the Stellar Star, though we do see a pattern here - our worst problem isn't wordiness (which is often a problem in generators like these), but generic-sounding terms popping up. The complex things manage to sound actually interesting ("Captain, it's the first Rotating Galactic Cluster System ever encountered! Before now they were only theoretical!"). Our potential flaw seems to be generic and poorly-defined phenomena.
I generated several more examples, and found that this pattern held - on average half or even more were good, and the problems were rather generic sounding terms and the occasional odd match.
So the question arises, can we do anything about this.
In this case, the generic phenomena seem to happen when we have a simple description of an actor. Rather ironically, this was the exact pattern of many of the phenomena we analyzed to create our base data and pattern.
Largely, this can't be avoided, I've found. This is a very basic pattern of data, so removing it could actually skew the results towards being more complex, and thus potentially too complex, and eliminates a simple and effective combination of data. So we should hold onto it.
NOTE: Test your generator a few times to make sure that its producing useful results. See if there are any patterns that give you ideas to improve things.
NOTE: In combinations of data, in general, removing even one possibility can have wide repercussions. Removing one word may have little effect, but removing a combination of data is removing ways words can be joined.
So, what's next?
PART 5: Finishing Up
At this point, I just began fleshing out vocabulary and adding it into the program.
Usually I do one of two things:
1) List all the words then break them into categories.
2) Generate the categories separately.
As you add data, run the generator - I usually give it a few runs for every category I add to. This is a good final test, just in case.
My major finding as I added data was that some things could be Metadescriptrs OR Descriptors, and that some could be either Actors OR Metaactors. Thus I made sure they were tagged to be both. However, I had to think over some very carefully to make sure they fit.
When you're happy, it's done. Release it to the world!