About this

Writings so far

2.10.2014

Large-scale NSA Internet surveillance and bulk traffic collection as an IT challenge

Saturday's article in the New York Times about the low-cost, readily available tools that Snowden used to index and collect NSA documents inside the NSA ("Snowden Used Low-Cost Tool to Best N.S.A."), also points to how large-scale surveillance and traffic monitoring can be done today using low-cost, freely available commercial or open-source software, and commercial Internet or cloud infrastructure services.

In many ways NSA faces the same IT challenges as most large corporations, not the ideal place to be the CTO or CIO probably (or in charge of internal IT security...), but if looking at NSA surveillance and Internet traffic collection as an IT challenge, what are some of the available options?

Most of the following involves some, or a fair bit!, of speculation and what-if-maybe for "Large-scale traffic collection for Dummies", but most of it should be fairly evident if one has followed the Snowden revelations over time.  So how could one go about doing large-scale Internet traffic collection and bulk usage logging?

Firstly, if one remembers the old Roman saying "who watches the watch-guards" (or, ""Who will guard the guards themselves?"), it looks like the NSA should have implemented, across all locations and stations, with contractors and internally, a form of change management and tracking system to track excessive document indexing and downloading, even if it's done with different system admin accounts.  Apparently PRISM could have had some use on the NSA LAN...

Secondly, while this isn't a low-cost option, looking at a map of worldwide submarine fiber cables (for instance http://www.submarinecablemap.com/), there isn't that many key submarine cable landing areas in the US, Europe, Asia and Africa one needs to tap into, splice and copy to gain full access to nearly all Internet traffic by most network operators, telcos and ISPs, with traffic encryption so far seemingly more of a nuisance than a real obstacle to reading traffic and content over this cable.  Fiber splicing by uboat seems like an approach here - i.e. with something like the USS Jimmy Carter.

Thirdly, one odd thing about the XKeyscore revelations last year was the low numbers of servers the NSA employed worldwide, even way back in 2008, to index and/or collect Internet traffic, content and meta-data; only some 700 servers in 150 or so locations.  And able to hold traffic collections only for some days, meta-data for 4 weeks or so.

Already in 2008 of course there were many large scale CDN deployments on the Internet by different commercial operators, having web cache servers deployed in DCs near the major Internet exchanges or with local ISPs worldwide, and directly in the pathway of end-users Internet access and service traversals. Tapping into and using mirrored cached content of 3rd party, commercial CDN services and operators that already then held the majority of users Internet service and content consumption, seems like a way easier and more scalable approach for traffic collection than deploying and maintaining log servers of one own.

Incidentally, that would also give the NSA, as well as the content originators and service providers the opportunity, to deny that they are accessing operator so and so servers directly as has been done - they are working on an mirrored, off-site copy of the servers in question.

Since then cloud computing has come to life in a significant way, and easy to rent IaaS compute, storage and networking resources in almost any country, making it even easier to install and manage XKeyscore related SW and indexing on VMs in country of choice (not that I'm an expert on IaaS-availability in Africa!).  But main thing is - instead of installing and running a battery of servers worldwide for traffic capture and content indexing, it's easier to acquire cache copies of network traffic and content from CDN-operators and/or rent IaaS compute resources worldwide that in most cases are located in most end-users pathways. Or, put differently (slide 20 in the "An NSA Big Graph experiment" presentation: "Cloud architectures can cope with graphs at Big Data scales".

This move to the cloud was also highlighted in the 2011 Defence News pre-Snowden article "Securing the cloud" - highlights: "The National Security Agency has been working on a secure version of the cloud since late 2007. By the end of the year, NSA plans to move all its databases into a cloud architecture while retaining its old-fashioned servers for some time.  "Eventually, we'll terminate the other data base structures," said NSA Director Army Gen. Keith Alexander...  The intelligence community proposal would expand the cloud approach across the community to take advantage of the cloud storage and access methods pioneered in the private sector... "I went in [and] talked to our folks who are on the offensive side" and asked them what would make a network "most difficult" to crack, he said. "And the answer was, going virtual and the cloud technology". See also this one.

The fourth measure to collect user traffic and meta-data is more on the speculative end of things, but since 2008 we have seen the birth of mobile app-stores and mobile users downloading apps for their smartphones in the millions - almost daily.  Instead of creating highly secure and zero-day malware exploits, why not create an app, or at least at app development library, that everyone wants to use, and that people download and interact with daily. And get the app to collect location data, address book access and export, message store access and export etc?  Much easier than doing obscure malware developments and then trying to get the SW-package into peoples devices.

An extension of this is to listen in on leaky apps and the ad networks that most developers of free apps use to place ads in the app and get some sort of kick-back for app generated ad streams.  See "NSA using 'leaky apps' like Angry Birds, Google Maps to siphon user data" for an example.

Moving on to a possible 5th measure or IT approach to do large-scale user tracking and surveilance, Internet traffic collection, we have another area that has "exploded" since 2008, and that is the low-threshold availability of big data log collection systems, distributed storage & processing systems like Hadoop and data analytics tools for laymen (i.e. business analytics).

Once an fairly obscure area of IT administration, who reads and understand router and server OS syslogs anyhow?, analysis and visualization of systems logs turned out to provide valuable business information - usage patterns, performance issues, faulty applications, break-in attempts, what not, system or business wide logging and analysis even more so.  Commercial software coming from these IT amin areas and developments have been identified in the NSA leaks by Snowden, and are now firmly established in the Big Data business.

Large scale storage and processing, or Hadoop by most accounts nowadays.  Grapewine says 9 out of 10 shops looking at Hadoop doesn't actually need Hadoop - but it's Bg Data!, and looks good on the CV - with NSA data collection and processing the one shop being the one that actually could put it to good use (with a Splunk connector or two to get data into the Hadoop cluster, but seems getting data sources for NSA Hadoop clusters isn't the main challenge...).

NSA activities with Hadoop and related technologies are well documented, see for instance

OK, several more main street IT services and options still to go, but will come back to that in a later posting.



Erik Jensen, 09.02.2014

No comments:

Post a Comment