AppliedMachineLearningPublic

Using NLP to build a sarcasm classifier

Question One: Pick two or three news sources and select a few news titles from their feed (about 5 is likely enough). Run your sarcasm model to predict whether the titles are interpreted as sarcastic or not. Analyze the results and comment on the different news sources you have selected.

The Onion Headlines: “Toddler Feels Somewhat Torn About Pretending To Te A Policman in Current Climate” , “More Cities Offering Drive-Thru Covid Injection Sites To Put Citizens Out of Misery” , “Closed Ballpark Forces Thousands of of Phillies Fans To Be Content Verbally Threatening Friends and Family” , “Queen Elizabeth II Worried She’s Next On The Chopping Block If Beefeaters Laid Off” , “Defensive Chicago Police Officer Perfectly Capable Of Diasappearing Protestors Without Help From Homeland Security”

Scores: 0.58188504, 0.9894364, 0.9989072,0.96631986, 0.9884687 Sum: 4.5250172 Average: 0.90500344

LeMonde Headlines: “It’s the healthcare system, stupid” , “The meritocrats shall inherit the earth” , “Where are the Women Peacemakers” , “Saudi Arabia’s Holy Business” , “The US’s new evil empire”

Scores: 0.10120084, 0.00018812, 0.00015933, 0.02182123, 0.17829928 Sum: 0.3016688 Average: 0.06033376

AP News Headlines: “4 Big Tech CEOs getting heat from Congress on competition” , “Trump dismisses virus aid for cities, lashes out at GOP” , “Early in pandemic frantic doctors traded tips across oceans” , “As crime surges on his watch, Trump warns of Biden’s America” , “Garth Brooks doesn’t want to win CMA entertainer award again”

Scores: 0.00668844, 2.9093371e-06, 0.74666834, 2.7510084e-06, 0.25262085 Sum: 1.0059832903455002 Average: 0.20119665806910003

News Source Average Sarcasm Score
The Onion 0.905
LeMonde 0.060
AP News 0.201

For my news sources I decided to test out the Onion (which I known to be highly satirical), LeMonde (A french news site that is meant for a general audeince and is widely read) and AP News (which is touted as one of the least biased/most reliable news outlets). I thought this list of news sources would give me a good range to test fro my model and the model predictions lined up largely with what I was expecting; The onion had an average score of .90 which is very close to 1 ( a score closer to 1 in this case meant more sarcastic, whereas a score closer to 0 meant less sarcastic), and I think it is clear from the titles that it is, in fact, a very sarcastic source. The AP News headlings had an average score of .20 which is inline with it being a very serious, well-regarded news outlet. LeMonde was a bit of a suprise for me. It had an average score of .06, which was less than that of AP News. LeMonde had (at least in my opinon) a few sarcastic sentences in the bunch of headlines that I chose (for example: “It’s the healthcare system, stupid”) and yet the model thought that LeMonde’s headlines were less sarcastic than the AP News headlines which were literally just statements of facts.

Text generation with an RNN

Question One: Use the generate_text() command at the end of the exercise to produce synthetic output from your RNN model. Run it a second time and review the output. How has your RNN model been able to “learn” and “remember” the shakespeare text in order to reproduce a similar output?

The RNN model is character based so it is constinuously predicting the next character to create longer sentences. The model does not understand words or the correct grammatical strucutre, but it is able to effectively learn and mimic the structure of the input text. The way it works is as follows: the chosen words to train the model with are vectorized so that each character in a word can be represented by an integer, then when the RNN model is running it looks up the embedding of a character to make its prediction for the next character. When making its prediction the RNN considers both the previously seen element/ the previous step as well as the current input value/character which is how the RNN is able to learn a corpus of text to reproduce a similar output. It remembers by passing the RNN state (which was initialized in the stateful RNN layer) from timestep to timestep so that the RNN state is continuously being passed back into the model to be used in its next prediciton.

First Run:

*ROMEO: I warrant you! Then shall what thou know’st us blows on me. If therefore, nevel had straight for ho? Can two halp proce; have come much other that would have And more than who live reduge honour shapewh: The wallors shall please you here remorse it; It shall the garded blood descend me to my soul; And, brother to the Tower, and heaven’sest heart You shall be made, but hearing percitent, Some set of you, my departor’s death?

*(MENENIUS: There’s a gate To just and tidul deeds am clears most sweet To saving Kate to best: it i’ the law, Standing him woman in thine emil.

*YORK: Speak still my daughter soul, I will not Upon my vouches gazesh ere’s steel dreammetten yet: put on their native Hath slain as chief her good, to made made beaties All: some of your mistress’d day not them Against the sea. Shake hen, Because my mouth i’ the northern gentleman That guessitus Lucentio? where e seen shall bound; this ill my sons. Sirrah, fear thee strife? what shall’st tempt and stegs, Seeming at, lordsh

Second Run:

*ROMEO: the love of York.

*KING RICHARD III: And given her looking shall one pitied point, Her run pays some near spoke, saving with that now upon your chamber; He did speak of you with tredion the was ark both a crudge, And who he did repose you thankful.

*ISABELLA: His meatine king, he’s for the day, And misushames, where he is doth have made Your high’d pmoce’s body, or any man?

*LUCENTIO: Ay, my good worship doubs; for it is worn with holy Clarence mose that, that wo mun barried ancestage To make my marshan hath turn by see ‘no’er then. For Caribit a subject, as they offence, The proceake hath Duke of York That he doth naked, Kate of his death.

*PETRUCHIO: Romeo camb these father to the queen: But not instanticome, give her do more Than do your hearts: be with cried On Rome, the less like a spirit That cannot sound thus fashion. Gry it even sweary that whom I but a noble death.

*First Murderer: A lork of our souther. Gress I, spired mothers! Charge me well: good Christian father, now Do mo

Question Two (Stretch goal): replace the Shakespeare text with your own selected text and run the model again. Modify the source text as needed in order to generate_text() from the newly trained model.

I replaced the Shakespeare text with the French text of Émile Zola’s “Le rêve.” I have studied French throughout my life and as such have had to study many of Zola’s texts (with this particular text being a personal favorite) and Zola is an incredibly well-known writer even outside of France so I thought this would be a fun text to explore with this model. Additionally, I was curious about how this model would handle another language - put the difference in language did not impact the output, which I guess just underscores the fact that the model is character based and does not actually understand the words/grammer. I tried two different runs. The first run I used the word “cependent” to start the sequence which translates as “however” and is commonly used to begin sentences. The format was a little odd, many of the sentences were cut off or stopped on articles that were meant to be connected to nouns (but were not) or meant to connect two ideas so each line was sort of left hanging - overall it read more like a (largely non-sensical, but there were some snippets of coherent word strings) poem.

First Run:

*Cependant, d’une par encore, toujours énermes blancsoires voisines, continuelles et les brureaux, seules, révait pris ils n’ayant la mitre, à l’aimer des orgues le plaisabre des avoir les flers, ilil d’e forçon voilà, ja dutôeil dans un facelle d’être ingle appurer de ses indélicieux….

*Ils voulaient comme à réposait. Sans des employaient les obits: withine ne s’en sacjestait pleureur de Voi, il y a reparaître. Dans des tatinces firances, parma les maînieuse, les ornements.

*–N’expliquation, avec ses segnées continuait redue la récina suit la mante, aux coeurent sur les choses plein, si besse suancemontale, Hubertinemirment seré de consinter…. Mais l’or vivait, et au dernet droite, la jeune fixure, sècheuple, Agnès, penancer disporançale que le beau le derrière en plis. Et c’elle n’était-comment dans les mains allaient respéra ratinue droptement de l’an traveillait-elle, commençait teur hamme, comprendre le telle.

*–Jardinez-moi! n’expart, que, de frès gmanche.

*Mais i

I think the model performed much better on the second run. For this run I used the word “Reve” or “dream” in English to start the sequence. This time the format of the output was much more inline with the formatting of the original text and there were many more complete sentences (in the sense that the line did not randomly stop with and article, not in the sense of the sentences actually making sense and being complete ideas). However, I found this version rather hilarious because while the model was remarkably good at correctly outputting real french words or getting really close to real words (sometimes it would just be missing one last letter to make the word correct) it also made up a lot of absolute nonsense words like “nti”, “rous”, “l’e” or my personal favorite “uy” (I’m not even quite sure how one would pronounce that), which it did not do in the first run. I can’t deny that some of those words seem like they should be French words, but in reality they are completely made up!

Second Run:

*Reve devune? Mais, je l’aderie restencement à s’étendait pas dans léché.

*–Jetie, parlivée, ceraissa la ville d’une fois l’aient-il forculion, j’ai res vous finir.

*Hubert, il semblait quelle.

*Mais je peut-être si elle la tête la bruje luite, aux heures, une rendecte roide les chassèrent leâceaux coffilisté d’orgueil uy Ancéle que, pleurant en hut, te tombéreraite en semblait allée sousile, agreiteau d’une blanche supplée journée accupé coeur, esprie dans le joie.

*–Tout se l’adoilissement faignéfieuxible que je nti?

*–Rous non minute le bride, trop moins.

*Pour la herre, lui l’épop entait d’emperter, par l’eschâtèrent, les yeux sommes de moment le toute d’où, l’appe sentire s’apalordait par touteurs en elle, s’envoyait Annéleai ne pente, si bénus jetait eu les devoire. À l’e faucumin, la voit être lèvrait.

*Ansoire dernière en de grande, jamais elle ne peint délivre, et en courrait à côté la cloir apporté par ce qu’esc par les tendres, Angélique sa venait à

Neural machine translation with attention

Question One: Use the translate() command at the end of the exercise to translate three sentences from Spanish to English. How did your translations turn out?

I had my family give me three random sentences in spanish that I had my model translate without knmowing myself what the sentences meant beforehand. The three sentences my family gave me were: “hola, mi amiga” , “Donde está la biblioteca?” , “los huevos son muy buenos”. Now I should mention that my family members only have a high school level understanding of spanish and so the input sentences themselves did not necessarily make any sense. That being the case, the model did remarkably well. It correctly translated the first sentence as “hello, my friend” and it got pretty close with the second sentence translating it as “the birds are very well” though the correct translation was actually “the eggs are very good” but it was pretty close. The model struggled though with the second sentence which was “where is the library.” At first I wrote in the sentence without the question mark and it returned “where s the last” and the second time I ran the sentence with a question mark it produced “where s the last the phone?” clearly it was struggling with the library part of the sentence but it was right about the question word.

Question Two (Stretch goal): pick a scene from a movie that is acted out in a language other than English. Download the appropriate language dataset and translate the scene. How did your translation turn out this time?