Big Data Ambitions



“...a wealth of information creates a poverty of attention...”
Herbert Simon, American Political Scientist, 1971
On June 5, The Guardian, a British newspaper, began publishing accounts of the American government’s surveillance of domestic communications.  According to leaked documents, the National Security Agency was collecting metadata on telephone and Internet communications as part of its counter-terrorist security efforts.  The individual behind the leaks, Edward Snowden, disclosed the documents to thwart what he perceived as an assault on privacy and liberty.  Harmonizing security and liberty has always been incredibly difficult and few leaders or governments have struck the appropriate balance – President Abraham Lincoln’s precedent of suspending habeus corpus during the Civil War remains controversial to this day. Twelve years after the tragedy of September 11, security appropriately remains a priority and the scale and scope of what has been undertaken now prompts a (very) overdue debate.  Broad network surveillance is justified; however, the government’s ability to understand what is surveilled is questionable and this administration’s trustworthiness is, at best, dubious.
Inherently Government Work
As reported by Government Executive, in the wake of September 11, the Senate Intelligence Committee concluded the National Security Agency’s (NSA) technological shortcomings had undermined the vulnerability to terrorism. To remedy the agency’s deficiencies, the executive branch requested and the Congress supported greater resources as well as additional authorities to enable broader surveillance.  
NSA’s early attempts to capitalize on its mandate included a $1 billion-plus program called Trailblazer to accomplish pattern analysis and data mining, but the effort resulted in a “boondoggle,” attempting to do too much at once.
Later, NSA succeeded by relying on a software tool employed in Afghanistan as well as a recent innovative piece of free software developed in Silicon Valley, Hadoop, which allows users to distribute data-intensive projects across hundreds or thousands of computers.
The capability will be requisite as the volume of data to be collected is staggering.  
The NSA is currently constructing a one million square foot data center in Utah to facilitate the compilation and analysis of this data.  The center is estimated to cost $2 billion and will feature four 25,000 square foot facilities housings rows and rows of servers.  As noted by Wired magazine, the data being collected will be measured in yottabytes -- a septillion bytes, an amount “so large that no one has yet coined a term for the next higher magnitude” -- and the computations undertaken will be calculated in exaflops -- one quintillion operations a second.
The center will be operational three months from now.
Such collection and computation is not solely the domain of the government as “big data” has become the next major focus of the information age.  The combination of increasing computing and internet connectivity speeds have permitted calculations, and by extension, insights, previously unavailable to researchers.  As noted in Foreign Affairs, researchers are no longer limited to small sampling data sets; analysts can tolerate a greater degree of inaccuracy in the data set; and investigators can forgo causation in favor of correlation.  Quantity, possessing a quality all its own, will purportedly compensate for diminished quality in other aspects.
Well, quantity will only compensate somewhat.
(Barely) Good Enough For Government Work
The ambition to collect vast sums of data will invariably be negated by the inescapable fact that the U.S. government simply lacks the capacity to analyze and interpret data all this data in a timely fashion.
Headlines since the beginning of this year confirms the government is ill-equipped to handle the enormous amount of information related to mundane functions, much less the challenge of finding a “needle in a stack of needles”.
The Department of Defense is appropriated one-half trillion dollars a year, but cannot adequately account for the money.  The department relies on thousands of data systems and is incapable of integrating the information to provide an accurate picture of how the money has been used.  The department has not been successfully audited in two decades and cannot project being able to pass an audit before 2017.
The Department of Defense’s newly established Cyber Command is currently 3,700 short of its 6,000 manpower requirement.
News accounts immediately following the Boston Marathon bombings revealed that the perpetrator, Tamerlan Tsarnaev, had visited an online jihadist magazine, had maintained a “Terrorist” playlist on YouTube, and had communicated with a Dagestan-based extremist.  Cruelly, Tsarnaev’s activities somehow eluded government digital surveillance and he went on to commit his attacks and subjected an American city to lockdown.
The Department of Homeland Security publishes and relies on only one number to measures the prevention of illegal immigration, but cannot attest to its accuracy.
The Department of Veterans Affairs has been struggling to manage the caseload for an estimated 851,000 veterans who have outstanding compensation claims for wounds, illnesses or injuries incurred during their service.  Approximately two-thirds have been waiting more than 125 days for follow up.
Far Superior to Government Work (When Motivated)
Perhaps successfully analyzing and interpreting all this data is just a matter of will.
Perhaps the American people can wait until an Administration decides to get organized and purposeful.
Like during a re-election campaign.
In the immediate post-mortems following Obama’s stunning 2012 re-election, many observers focused attention on his campaign’s innovative data analytics team.  While the 2008 election drive had been run relatively smoothly and established itself as the first “Facebook” campaign, the 2012 team knew better than to be complacent.  The first area to be addressed was the multiplicity of voter databases.  

Over a period of eighteen months, the Obama campaign created a massive integrated data set as the foundation for voter outreach.  As recounted by MIT Technology Review, the massive database permitted repeated and endless analysis – “"We ran the election 66,000 times every night. And every morning we got the spit-out — here are your chances of winning these states. And that is how we allocated resources."
The Obama campaign was able to pursue an objective beyond registration and mobilization – “the most vexing problem in politics:  changing voters’ minds.”  And the breadth and depth of data collected made this possible.  The advantage left the Mitt Romney campaign in reaction mode and unable to counter Obama’s strategy.
On Election Night, Obama’s poll numbers closely matched internal models; after clinching victory, the only question was how accurate the models would end up being.
As always, the Daily Show brutally dissected the issue by commenting problems like the Veterans Affairs backlog would be rapidly solved if the Administration committed as much time and energy to governing as it did to campaigning.
Unfortunately, the Obama Administration is uninterested and, at times, seems prepared to repudiate the actual information presented to it.
Government Work -- With Blinders On
On November 5, 2009, a single gunman killed 13 people and over 30 people were injured at Fort Hood, Texas. The gunman was Nidal Malik Hasan, a 39-year-old U.S. Army major serving as a psychiatrist. He was shot during the attack and taken into custody.

Days after the shooting, reports in the media revealed that a Joint Terrorism Task Force had been aware of e-mail communications between Hasan and the Yemen-based cleric Anwar al-Awlaki, who had been monitored by the NSA as a security threat, and that Hasan's colleagues had been aware of his increasing radicalization for several years.  

The subsequent Department of Defense investigation published an 86 page report identifying missteps on the part of the Army but did not once mention Major Nidal Hasan by name or even discuss whether the killings may have had anything to do with the suspect's view of his Muslim faith, even though Hasan reportedly shouted "Allahu akbar" (God is great) as he gunned down his fellow soldiers.

The report instead discussed the attack as an episode of workplace violence.

On September 11, 2012, a series of terrorist attacks in Benghazi, Libya, resulted in the deaths of Ambassador Chris Stevens, Sean Smith, Tyrone Woods, and Glen Doherty and the destruction and abandonment of the U.S. Special Mission compound.

The Administration immediately claimed an anti-Muslim video had prompted the attack; President Obama asserted the same to relatives of the slain Americans days later and to the world in a speech to the United Nations fourteen days later.

As noted previously, subsequent investigative reporting, whistleblower testimony before Congress, and released internal emails have shown, all of the principal decision-makers knew the assault on Benghazi was a terrorist attack within the first 72 hours.

Yet, the interagency system in place has been more “tenacious” in explaining away the erroneous talking points than addressing the initial requests for security at the compound.

For a President, who once denounced such comprehensive surveillance, and an administration who have been asserting such monitoring has been critical to preventing “dozens” of attacks, it has been this insistent disregard for objective and incontestable fact that makes these revelations so unsettling and complicates the debate on the matter.

Potentially Malevolent Government Work

Some thoughtful observers have asserted the revelations are not that disconcerting.  Either privacy in the era of Google and Facebook has already been eroded or oversight has been adequate.

If transparency is the problem, then at least the government is more accountable than private corporations, even though all venues for redressing government wrongdoing are acknowledged as hypothetical -- “there needs to be a mechanism to remedy the damage … there should be a prompt, transparent, and fair means.” [Emphasis added]

To those comfortable with NSA surveillance, one hopes they are never the victim of identity theft.

In dealing with the Internal Revenue Service alone (much less law enforcement, credit card, banking, and health care providers), identity theft victims routinely wait more than six months to have their issues resolved.  

If the threat to liberty is the problem, then such concerns are the result of “very confused thinking”. After all, the executive branch informed and received the consent of the legislative branch, including the opposition leadership.

To those comfortable with NSA surveillance, one hopes you are never targeted by the Internal Revenue Service for your political activities.

The opposition leadership may have endorsed NSA surveillance, but it is ordinary citizens who have been harassed by the government on the basis of word searches across databases.

A Poverty of Attention and Eventually A Poverty of Trust

To close, the merits of such extensive network surveillance remains questionable.  The U.S. government has barely demonstrated its ability to handle data already collected, much less the sensitive data of 300 million citizens' daily communications.  Moreover, the current Administration has shown a disturbing ideological disregard for reality and cannot be trusted with such data.  Lastly, to echo Christian Caryl's rebuttal to Daniel Ellsberg, the NSA is not the Stasi of the former East Germany -- but their methodologies are nearly the same -- everyone is suspect so everyone must be surveilled.  Although the communist dictatorships undertook such surveillance for evil ends, the result was the same -- the decimation of trust knowing that one part of the population was spying on the other.

As it has always been -- it’s a matter of trust.

And the president knows it.  Well, almost.


Epilogue

The collection of ones and zeros have already overtaken by events and other technologies.

On June 3, 2013, the U.S. Supreme Court ruled the police may take DNA samples from people arrested in connection with serious crimes.

When reprinted and paraphrased -- replacing DNA with digital metadata -- the dissent cuts to the core of the matter:

The issue before us is not whether DNA digital metadata can some day be used for identification; nor even whether it can today be used for identification; but whether it was used for identification here.

Solving unsolved crimes Preventing terrorism is a noble objective, but it occupies a lower place in the American pantheon of noble objectives than the protection of our people from suspicionless law-enforcement  searches. The Fourth Amendment must prevail.

The most regrettable aspect of the suspicionless [digital metadata] search that occurred here is that it proved to be quite unnecessary.  All parties concede that it would have been entirely permissible, as far as the Fourth Amendment is concerned, for Maryland to take a sample of King’s DNA digital metadata as a consequence of his conviction for second-degree assault. [Emphasis added]

So the ironic result of the Court’s error is this: The only arrestees to whom the outcome here will ever make a difference are those who have been acquitted of the crime of arrest (so that their DNA digital metadata could not have been taken upon conviction). [Emphasis in the original]

In other words, this Act manages to burden uniquely the sole group for whom the Fourth Amendment’s protections ought to be most jealously guarded: people who are innocent of the State’s accusations. [Emphasis added]

May the debate be respectful, enlightened, and conclusive.

No comments: