Home
Up

 

CIA Using 'Data Mining' Technology to Find Nuggets

  By Tabassum Zakaria

  LANGLEY, Va. (Reuters) - The CIA (news - web sites), faced with a daily avalanche of information, is using new ``data mining'' technology to find useful nuggets within thousands of documents and broadcasts in different languages.


  One computer tool called ``Oasis'' can convert audio signals from television and radio broadcasts into text.

  It can distinguish accented English for greater accuracy in the transcription, whether the speaker is male or female, and whether one male or female voice is different from another of the same gender.

  At the left of the screen of a transcribed broadcast are labels ``Male1,'' ``Female 1,'' ``Male 2,'' next to sentences.

  If one voice is labeled with a name, the computer from then on will put that name on anything else with that same voice.


  Machine Translator

  If the machine translation appears off, the user can with a mouse click hear the actual broadcast. For example, the demonstration showed a transcription that read ``latest danger from hell'' but the audio said ``latest danger from el Niņo.''

  The computer cuts down on the time it would take a person to transcribe a
 half-hour broadcast to 10 minutes from up to 90 minutes, a CIA employee
 conducting the demonstration said.

  The CIA is planning to have Oasis developed for different languages such as Arabic and Chinese.

  It also finds similar meanings of words being searched, for example a broadcast might not mention ``terrorism'' but might say ''car bombing,'' which he computer would tag as ``terrorism'' so that anyone searching for that category would find it.

  Another computer tool, ``FLUENT,'' enables a user to conduct computer searches of documents that are in a language the user does not understand.  The user can put English words into the search field, such as ''nuclear weapons,'' and documents in languages such as Russian, Chinese and Arabic pop up.    The system will then translate the document and if it is seen as useful, the analyst can send it to a human translator for more precision.


  Languages that FLUENT can translate into English include Chinese, Korean, Portuguese, Russian, Serbo-Croatian and Ukrainian.  ``Data mining'' tools are used to extract key pieces of information from a variety of intelligence traffic such as on the flow of illegal drugs and also to keep track of illicit financial transactions.

  The Text Data Mining tool extracted and indexed all words in the data so for example if an analyst was asked whether Iraq ever used anthrax as a weapon, the analyst could open the tool and find anthrax in the automatically generated index.

  There is also ``gifting technology'' which gives the flavor of the key information of a document in a short paragraph, Fairchild said.  Another intelligence official, on condition of anonymity, said: ``If they have this kind of technology to plumb the depths of open sources, you can imagine what kind of technologies they have to track down spies.''

 

 


Home | The Comm Center | The OPSCOMM | Hall of Honor | Photo Album | Stories | AIA's History | Links | Shameless Plugs

 
For problems or questions regarding this Web site contact [ProjectEmail].
Last updated: 01/26/07.