Revisiting My 2010 NLP Experiments: What I Got Right and What I Completely Missed

date: June 30, 2026 | reading_time: 3 min read

Recently, I revisited a collection of NLP and sentiment analysis experiments that I built around 2010–2011 as a computer science student. The code itself reflects the era: procedural PHP, mysql_* functions, and plenty of duplicated logic. However, looking beyond the implementation, I found the underlying ideas surprisingly interesting.

At the time, I wanted to answer a simple question:

How can a computer determine what a person is feeling from written text?

Without access to modern machine learning frameworks, I built a rule-based system that attempted to classify sentiment using dictionaries, scoring algorithms, and simple learning techniques.

Text Normalization

One of my earliest observations was that language needed to be normalized before analysis:

$post = strtolower($post);
$post = str_replace('.', '', $post);
$post = str_replace(',', '', $post);

I also implemented custom stemming rules:

$post = str_replace('ness','',$post);
$post = str_replace('ing','',$post);
$post = str_replace('ies','y',$post);

The objective was to reduce multiple forms of a word to a common representation. In hindsight, this was essentially a simplified implementation of classical stemming techniques.

Context Matters

I quickly discovered that sentiment could not be determined from isolated words alone.

if($previousword != 'not')
{
    $score++;
}

This was my attempt to handle negation, recognizing that “good” and “not good” express different meanings. While limited, this introduced the idea that language interpretation depends on context.

Feature Weighting

Another realization was that not all words contribute equally to meaning.

if($k == 1)
    $score += 50;
else
    $score += 1;

Some words appeared to carry stronger emotional significance than others, so I introduced weighted scoring to influence classification outcomes.

Learning Through Observation

I also experimented with storing observations and increasing confidence based on repeated occurrences:

UPDATE emotionsub
SET count = count + 1

The underlying assumption was straightforward:

Repeated experience should improve future predictions.

Although this was not machine learning in the modern sense, it represented an early attempt to model learning and confidence accumulation.

Evaluation

Perhaps the most valuable lesson was realizing that a system needed to be measured rather than merely appear intelligent. I built evaluation routines to calculate classification accuracy and compare different approaches.

Looking Back from 2026

What surprised me most when revisiting this project was not the implementation, but the thought process behind it.

My approach in 2010 was effectively:

Input
  ↓
Normalize
  ↓
Extract Features
  ↓
Apply Context
  ↓
Assign Weights
  ↓
Learn
  ↓
Classify

Modern AI systems use fundamentally different techniques, including embeddings, neural networks, and transformer architectures. However, many of the underlying problems remain the same:

How do we represent meaning?
How do we incorporate context?
How do we determine importance?
How do we learn from experience?
How do we evaluate performance?

The code itself belongs firmly in 2010. The questions it was trying to answer, however, remain highly relevant today.