Archive for the 'Linguistics' Category

New Studies

Tuesday, October 13th, 2009

I am finally registered for an MA in Applied Linguistics.  I started this process back in February or March and it was getting to the stage where I thought I’d never get through the administration process.  I did miss the course start date but passed all the hurdles on the last possible day for registration.  I was worried that my enthusiasm had been sucked dry but now that I’ve seen the required book list I’m starting to get interested again.

Perl Collocates: Google Suggest

Tuesday, February 5th, 2008

I am also interested in collocates of “Perl” that come from sources outside the Perl community. Google Suggest aims to make a best guess as to what should come next in a search and it doesn’t tailor these results based on my previous searches.

Google Suggest Results

I wasn’t surprised that people are searching the web looking for help on how to use various functions and data structures but I was really surprised that their highest search is “perl for windows”.

Perl Collocates: Preliminary Results

Monday, February 4th, 2008

Marty has started to analyse the blog data he retrieved from

At the minute we are just looking at the collocates for “Perl”. Unsurprisingly “use perl” came out at the top but given the data source we are going to ignore that.

  • Perl 6 – 9,455 collocated occurrences
  • Perl code – 6,392 collocated occurrences
  • Perl source – 6,301 collocated occurrences
  • Wall Perl – 5,650 collocated occurrences
  • Larry Perl – 4,109 collocated occurrences
  • Perl 5 – 3,852 collocated occurrences
  • Perl unfortunately – 2,936 collocated occurrences
  • Perl Mongers – 2,769 collocated occurrences
  • Perl bug – 2,736 collocated occurrences
  • Perl Foundation – 2,732 collocated occurrences
  • Perl TODO – 2,722 collocated occurrences
  • Perl journal – 2,469 collocated occurrences
  • Perl course – 2,355 collocated occurrences
  • Perl programmers – 2,123 collocated occurrences
  • best Perl – 1,859 collocated occurrences
  • Perl6 Synopsis – 1,451 collocated occurrences
  • Perl6 doc – 1,450 collocated occurrences
  • Perl Horrors -1,332 collocated occurrences
  • Perl community – 1,005 collocated occurrences

The results for “Perl community” are being skewed because nearly 60% of the occurrences are from acme’s blog. I have no idea what “Perl Horrors” refers to and Marty is postulating that the occurrences of Perl 5 are low because that’s what people usually mean when they refer to “Perl” on its own.

There is still lots to do before I have a sensible way to display the results and also before I can graph their development over time.

Perl Collocates: Finding Data

Monday, February 4th, 2008

I haven’t forgotten my earlier post where I stated that I wanted to find out what collocates of “Perl” were being used by the Perl community. Stray posted a comment asking me how I planned to define “community” in this context.

“How do you define the perl *community*? Those with the loudest voices? The self-aggrandising, self-publicising, sell-appointed spokespeople?”

At this stage I don’t plan to try to define the community and I am not just looking for those with the loudest voices. But they do need to have a voice as I want to analyse what they have written.

I have decided to start with the blog posts on Once I’ve done that I may take a look at the archives of some of the Perl mailing lists.

Defining Collocates

Sunday, January 13th, 2008

Now that Marty has made the decision that he will write a simple Perl script to pull collocates out of data for me I need to give him a more precise specification of a collocate. Carmen Dayrell wrote a paper on “A quantitative approach to compare collocation patterns in translated and non-translated texts” which contains a detailed section on how to decide what a collocate is.

The first step is to work out which words should be taken as nodes – but as I am interested in specific nodes, like the word “Perl”, I will not be doing this. Then we need to decide how we will define a collocate. Dayrell suggests that the collocations should occur at least 4 times to be significant with a span of up to 4 words on either side of the node. Structural boundaries in the text should also be ignored.

While Marty does this I am going to read the work that Church and Hanks did on word association norms and mutual information to see if any of that will help me get better results.

Perl Collocates

Saturday, January 12th, 2008

My linguistics course contains lots of really interesting material but unfortunately has really boring assignments. The last assignment was so awful that I considered giving up the course as I didn’t want to spend my spare time on something I wasn’t enjoying. To help with the tedium I decided to find something to do with the new knowledge that actually interests me.

I have been reading about collocates – words that are typically grouped together such as “law and order” and “fish and chips”. What interests me is the introduction of new collocates. I read a study by Fairclough who had analysed 53 speeches given by Tony Blair. The word “new” occurred 609 times and the most frequent collocates were “new labour” and “new deal”.

I am also interested in the Perl community, how it is perceived and how it perceives itself. If I analyse the blogs of various members of the community what are the collocates of “Perl” going to be? Some are going to be obvious – “Perl community”, “Perl 6” – but what unexpected ones will I find? And what has changed in the last few years? What did we talk about in the past that is no longer important to us and what is the latest thing to be linked with Perl?