Saturday, November 16, 2013

Help the NSA Read Your Mail

Search Engine Overload
Helping the NSA read your mail


By John P. Smith
Freelance Journalist

(Watch the introductory video for this article.)

We know the NSA and other government agencies are reading our mail and our social media posts. They do this in the name of the "War on Terror." The more the government tries to clamp down on everyone as a potential terrorist, the more they become part of the problem and in the end, they’ll be exactly what they’re supposed to be fighting against. Our own government has become more of a threat to us than any terrorist.

They are reading our email. Searching our Social Media posts. Highjacking our private messages. Our government appears to think that's okay, despite being a complete violation of our Constitutional right to privacy. 

What can we do about it?  We can do three things to limit their ability to read our correspondence:  We can stop using email and social media, text messaging and private blogs. As far as I’m concerned, that’s not even a real option.  We can deny them the information using encryption technology such as PGP.  (All governments hate PGP.)  Or, we can give them more than they can handle.  

I think search engine inundation is the way to go. Let's spam their search algorithms with key words until they can't possibly read it all. It will take some effort from everyone who’s willing to fight back.

FBI CIA ATF NSA DIA DHS TSA OBAMA BIDEN KERRY SECRET SERVICE SENATE CONGRESS TERRORIST BOMB ASSAULT KILL DESTROY BURN GUN PRESIDENT GAS ANTHRAX EXPLODE

I created the above line to put in the text of every one of my messages, emails or social media post. The idea is to overwhelm them with their own crap. Make EVERYTHING in the world flag in the search. There's no way a limited staff can sort through all that. But it won’t work that simply. For instance, the line is all CAPS. That’s easy to flag as a planted line and simply discard it.

This is not a new idea, spamming the government’s search routines.

Troll the NSA had a similar idea back in June, but they wanted everybody send out the same text.  That’s fairly easy to program around.

Motherboard.TV has a more random idea: nsa.motherboard.tv  But I clicked their random phrase generator about 20 times and hit the exact same phrase two times.  Not nearly random enough.

And there’s Scaremail which is a great idea.  But like the motherboard.tv generator, if it’s a computer program, it’s going to leave a signature of some kind and that will make it easy to filter out.

Very similar to Scaremail is Flagger.  This adds keywords to the urls visited by your browser.  It’s an addon for Google Chrome.  But, again, a computer program to place keywords is going to leave a signature.  It’s easy to filter out.

Here’s the list of words:  From Rense.com

Here’s the list of words:  From Reddit via Business Insider.

Here’s the list of words: From a site called 42X.

These lists do not exactly match. Also interesting: “Homeland Security” is not in either of the first two lists and “DHS” doesn’t show up in any of them, nor do the words “tactical” or “strategic.” The folks at 42X said the list was obtained through a Freedom of Information Act request. I’m not certain any of these are even authentic lists. Maybe they’re plants from the NSA to give us all something to focus on while they keep searching our mail. 

Make of that what you will. Also of note, none of the British alternative spellings are on the list. Nope. This list contains no French, Russian, Chinese, Arabic, Hebrew, Etc., so I can logically establish that these list are for only American English speakers. We can also assume the NSA has hit lists for other languages. I see that PGP makes the list five times, once on its own and then four specific versions of it.  We know how the government hates PGP, but why would they target specific versions of it?  That kind of thing makes me suspicious.

Plenty of information is available on how the government is getting our private correspondenceAnd who’s letting them have it. So I won’t even go into that. The following is mostly speculation on my part, based on how I, as a programmer, would look for specific words in a gazillion intercepted messages, posts and emails.

How I think a typical search would work:

Information is collected.  First thing we need to do is strip out all the superfluous crap. A single line of text sent via email or posted on social media carries a long trail of data about servers, fonts, routes and addresses in the background.  Most of this data is useless to humans and so would be stripped (or simply skipped) so only the pertinent content data is sent on to the parsing routines.

Information is parsed, searching for keywords. This can be done one character or one word at a time, or by phrase or even by whole sentences, as required, to flag significant combinations of letters, words, etc.

Information is then scored based on the number and strength of hits when the information is compared to the database of keywords. The scoring algorithm assigns each keyword a value. The higher the score, the more likely the information is flagged as a possible threat, and the more likely the information is to be read by a real human. The value placed on a word will be based on the importance of the word itself, its placement in the text and the emphasis placed on it by the writer.

Let’s take a sample phrase: “So, I thought I’d kill some time watching the president of the Rotary Club speaking on the sex orgy tape found in the state Congress meeting room.“ The word “president” will have a high value while “sex” will have a much lower score. Once we see some of these keywords, we parse the text again for placement and emphasis. The words “kill”, “president”, “Congress” and “meeting” all being in the same block of text adds strength to the score. The word order is also important, so I would have a routine to adjust the score based on placement relative to other keywords: before or after, etc. Those words being in the same sentence, in that specific order, will probably raise a red flag somewhere in a DHS cubicle.

At this point, someone will actually read the text only to discover it’s someone speculating on how they might be looking at our correspondence and postings.  (Hello there, cubicle agent!  Is this going to get me another visit from the FBI?)

I would also think it possible, but not prudent, to score for grammatical emphasis. For instance, “President” versus “president” to distinguish exactly who is being talked about. This kinda assumes a potential terrorist would also be an English grammar jerk and is probably unlikely. I believe the same would be true of other emphasis, such as quotes, bold, italic and underline; they could be used, but would probably add more confusion than they’d clear up.  If I was writing these search routines, I’d strip the formatting to make the raw text easier to search.

But, there’s going to be some problems.  And what programmers do is make computers solve problems.  They can't possibly look at everything that flags due to keywords.  Their next logical step is to stop flagging, just ignoring, those SPECIFIC strings known to be planted. They can be skipped just like formatting.. This is where most of the plans and programs noted above ran into problems.  We have to break the strings up into non-uniform snippets.  There's nothing more random than several million people putting a string of data in whatever random order they prefer.  Still, some programming techniques employ fuzzy logic routines to look at something that's almost the same and say, yeah, that's close enough, we'll call it a non-hit.  It’s really a matter of playing the percentages.

Here's where anybody can get involved and help.

We humans have to be more random.  And isn’t that what we do best?  At the end of your email, post or message, add a line or two of keyword text from 2 to 2000 characters.  Copy and paste from the lists in the links or write your own.  Next time you post or mail, use the same text but add a period or comma, add another acronym, remove one, chance some capitalization.  Next week, use another random keyword string. Anything to make the strings of text dissimilar enough that a fuzzy logic routine won’t see it as part of a specific class of planted data.

We can also get a little more creative and use some of these words in our main body text.  Tell your friends that the “movie was a bomb,” or that you had a “blast at the lake.”  We all need to do whatever we can make it harder for them to read our mail.

For some more fun with NSA keywords, there’s the NSA Haiku Generator.

And if you need a laugh: This NSA list is kinda funny.


No comments: