|

| |
CIA Using 'Data Mining' Technology to Find Nuggets
By Tabassum Zakaria
LANGLEY, Va. (Reuters) - The CIA (news - web sites), faced with a daily
avalanche of information, is using new ``data mining'' technology to find useful
nuggets within thousands of documents and broadcasts in different languages.
One computer tool called ``Oasis'' can convert audio signals from
television and radio broadcasts into text.
It can distinguish accented English for greater accuracy in the
transcription, whether the speaker is male or female, and whether one male or
female voice is different from another of the same gender.
At the left of the screen of a transcribed broadcast are labels
``Male1,'' ``Female 1,'' ``Male 2,'' next to sentences.
If one voice is labeled with a name, the computer from then on will put
that name on anything else with that same voice.
Machine Translator
If the machine translation appears off, the user can with a mouse click
hear the actual broadcast. For example, the demonstration showed a transcription
that read ``latest danger from hell'' but the audio said ``latest danger from el
Niņo.''
The computer cuts down on the time it would take a person to transcribe a
half-hour broadcast to 10 minutes
from up to 90 minutes, a CIA employee
conducting the demonstration said.
The CIA is planning to have Oasis developed for different languages such
as Arabic and Chinese.
It also finds similar meanings of words being searched, for example a
broadcast might not mention ``terrorism'' but might say ''car bombing,'' which
he computer would tag as ``terrorism'' so that anyone searching for that
category would find it.
Another computer tool, ``FLUENT,'' enables a user to conduct computer
searches of documents that are in a language the user does not understand.
The user can put English words into the search field, such as ''nuclear
weapons,'' and documents in languages such as Russian, Chinese and Arabic pop
up. The system will
then translate the document and if it is seen as useful, the analyst can send it
to a human translator for more precision.
Languages that FLUENT can translate into English include Chinese, Korean,
Portuguese, Russian, Serbo-Croatian and Ukrainian. ``Data mining'' tools
are used to extract key pieces of information from a variety of intelligence
traffic such as on the flow of illegal drugs and also to keep track of illicit
financial transactions.
The Text Data Mining tool extracted and indexed all words in the data so
for example if an analyst was asked whether Iraq ever used anthrax as a weapon,
the analyst could open the tool and find anthrax in the automatically generated
index.
There is also ``gifting technology'' which gives the flavor of the key
information of a document in a short paragraph, Fairchild said. Another
intelligence official, on condition of anonymity, said: ``If they have this kind
of technology to plumb the depths of open sources, you can imagine what kind of
technologies they have to track down spies.''
| |
|