Wonderful World of Words...and an Overworked Mac
- Ramya Namuduri
- Nov 9, 2020
- 3 min read

I stared. Everything suddenly disappeared, leaving an Apple Restart screen. The thin, white line was stuck loading, the half-eaten apple staring viciously at me, as though blaming me for its struggle. It’s not like I had done anything for it to simply give up, crash and restart...except perhaps I had tried loading a 50,000 long dataset of IMDb reviews and processing it, creating a word dictionary to represent each word with a number.
I suppose my Mac did not appreciate me overworking its threads, but it unfortunately cannot be helped, especially not while experimenting. Besides, my beloved 6-year old Mac, hobbling on its last legs may give up on me, but I will not - give up, I mean. It has beautiful memory (mostly unused), and can be fast (at times). It just has a little trouble with multi-tasking, and threads that tend to eat away at its core-processing units, such as TensorFlow-related activities, running simultaneously with the hundred other windows and applications.
Image Recognition is fun, but Natural Language Processing is another story - completely. I try to think of a day when I can build a deep learning model that can read and analyze poems for AP Literature for me, and analyze books for me. Actually, I would scratch the second part of the previous sentence because I enjoy reading, and would not want probability to quantify a work of art woven with words. In truth, Natural Language Processing, in my opinion, is extremely ‘cool’.

Languages have always fascinated me. The way we learn them, dividing work to each hemisphere of our brain. As our left hemisphere decodes word-to-word meanings based on a complex built-in dictionary, our right hemisphere simultaneously deciphers the overall meaning, taking into account tone, sentiment, purpose, syntax, context. Together, they bring meaning and expression to the world we see, automatically processing information into phrases. This amazing tangle of language is highlighted with culture, changing the way we view the world and make sense of it. Language, I think, is so human, but how do we teach a computer a language when all it knows are zeroes and ones? How does a cold hunk of metal understand emotions? How can a frustratingly patient Siri or Alexa understand if we are exasperated, in a hurry, or sarcastic? As the saying goes, it is not what we say, but how we say it that matters more, and if a computer is simply given a dictionary, expected to make sense of it, it only imitates half of our brain, which is not very productive. Therefore, understanding and analyzing the sentiment behind the words and phrases is clearly crucial, but just as complex of a challenge.
This week, I was able to scratch the surface of Natural Language Processing with converting text to numbers using certain techniques. Translating a concept as abstract as language to numbers is no easy task, with no perfect algorithm either - a work in progress. However, it was interesting how I discovered repetition itself creates meaning. The words we say more often perhaps are more important, more significant, more crucial to understanding context and sentiment. Using this theory, sentences can be quantified, one word at a time.

What is incredible is the step that comes next - word embedding. In statistics, we recently covered analyzing scatter plots and how to describe their shape, including whether they have clustered data. The idea is similar, from my understanding, with converting this matrix of quantified words into meaning. Words with similar meaning are closer together, creating clusters of meaning or sentiment. So...what was I doing with IMDb reviews?
In the Natural Language Processing course I started, our challenge is to take IMDb reviews and classify their sentiment as positive or negative. This has direct applications in how positive a review might impact how many people view the movie and vice versa, for instance. The idea is to cluster words that convey positive or negative sentiments, and grab meaning for the entire phrase as a whole. So, if my Mac can muster the strength to not restart again, I am extremely excited to create word clusters, and introduce a computer to our wonderful world of words.
Comentarios