reply to post by PrisonerOfSociety
I will refer to your last paragraph first,
"running database scripts to interpret info, just seems nigh on impossible". You are quite
correct as by implication the data that you run your scripts against is defined by a finite dataset and a rigid interpretation.
Classically, Relational Database Management Systems (RDBMS) are the product of a strict method of categorising data with relational links between
entities to indicate association. There are, essentially, only 3 types of relationship, "1 to Many", "Many to 1" and "Many to Many". The rules
of normalisation generally group "1 to 1" relationships into single entities so this can be ignored.
The point is, the structure of the data base is defined early on and to incorporate "whimsical" associations between data means using "Many to
Many" relationships in abundance. However, this too is restrictive since it is still encapsulated by the context of the original relational model.
Generally, most police systems use this method since they support the relatively straightforward methods of police investigation (i.e. Here is a
"criminal", here are attributes that we are interested in).
When you have a much more amorphous dataset (i.e. Here are "people" and we're not sure if an attribute is useful or not) then it gets much more
difficult. The key is to reduce data entities to "types", this is where a canonical data architecture comes in. An attempt is made to convert all
data into types, at a simple level starting off with non-restrictive labelling: you can identify a
location as such but you do not restrict its
interpretation to an actual place name, it is simply an amorphous data entity that
may be interpreted as a location. In terms of speech, this
requires grammatical analysis to determine the use of words to identify their contextual meaning.
There are a number of complex processes that are undertaken to categorise data entities but the key is that they may occur under more than one
category depending on the analysis weight for grammatical context. Secondarily comes the interpretive phase whereby data is contextual analysed
according to a specific scenario - this essentially means that the rigid categorisation of data present in an RDBMS is applied at retrieval time
rather than at data storage time. This maintains the fluidity of data and does not compromise the ability to select pertinent data during database
trawls.
If you are familiar with databases, it is the equivalent of creating a table of every
column and only applying relationships at analysis time.
Pretty neat in terms of conceptual delivery but quite a art to implement in reality.
Also, remember that most often, data is trawled to determine patterns rather than just keyword occurrence. We've all heard the stories of
"mentioning" a keyword and have our call/text/email intercepted "just in case" but this is only effective when used in the context of other
available attributes - that is to say, cross-referencing.
This is where the importance of data collection comes in. In order for data analysis to be truly effective you have to maximise the hit rate for
positive identifications via the use if appropriate cross-referencing. By incorporating data sets from all walks of life, from credit card sales and
movement tracks, etc., it is easier to discern whether so-called "keyword" occurrence is actually relevant.
If you have a clear history of extreme politics, your purchases include the ingredients of a bomb and you start talking about bombs then analysis
could be termed positive. If you are a farmer with a need to buy fertilizer and you are telling your farm manager that the barn looks like a bomb has
hit it then this would be termed negative.
The key factor is context and a canonical architecture allows you to store data in a "context free" environment with prejudicing downstream
analysis.
A combination of canonical feeds into relation models that dictate scenarios is a powerful tool indeed. On the upside (depending on your point of
view) the UK government does not have a good history of implementing complex data analysis engines and the phrase "arse from elbow" springs to mind.
Some of the data architecture would make you weep, surely, but that is largely because we have systems that have evolved from different purposes and
so are not fit for the complex requirements we have now - hence the need to build a new system from scratch.
My day rates can be provided upon request, haha!
[edit on 7-6-2009 by SugarCube]