After reading "Statistical Signiﬁcance Tests for Machine Translation Evaluation" I have some que, page 1

posted on Jun, 14 2013 @ 02:10 AM

link

Hello, I was wondering if anyone could explain how automatic machine translation could be used to forecast political events at a regional level? Are there any good books/papers that anyone could suggest?

BayesLike

posted on Jun, 14 2013 @ 03:20 AM

link

You can forecast anything using anything as a source. The problem is how well the uncertainty of the forecast is understood. Any statistical forecast should be provided along with confidence bounds on the uncertainty. If that is not done the question should be raised as to whether or not the forecast is based in valid statistical methods or is merely invoking the word statistical because it uses data of some type.

Books about this? I don't think there are any technical texts in the statistics world. There may be a few papers in very applied journals where this has been attempted, but I'm not personally aware of any in the major statistical journals a degreed statistician would use as a reference. You might try political "science" and comp science journals which would tend to publish attempts to predict and/or progress on "machine learning" concepts.

Nevertheless

posted on Jun, 14 2013 @ 03:41 AM

link

Why (and how) would you use automatic machine translation to forecast anything, except perhaps as a tool in forecasting possible.. translations?

edit on 14-6-2013 by Nevertheless because: (no reason given)

teachtaire

posted on Jun, 14 2013 @ 03:47 AM

link

reply to post by Nevertheless

You treat it as a component of a neural network, derp.

BayesLike

posted on Jun, 14 2013 @ 03:48 AM

link

reply to post by teachtaire

The most likely method that would be attempted is to set up a translation dictionary of at least nouns and verbs with perhaps qualifiers as to degree. Examples might be words like: protest, anger, violent, peaceful, crowd, group, party, meeting, victim, authority)

At the first level, one could count the occurrence of the various nouns and verbs or noun-verb pairs in the News, Web pages, tweets, or any other common source of interest. One could use the modifiers as a weight where it makes sense (eg few, thousands, many, etc). A deeper level, which would probably be more productive, would be to establish "vectors" of factors which indicate mood, concern, lack of interest, and similar abstract concepts. The associated data would be the political actions or lack thereof which occurred in a time window after the data window, say maybe a week later. The vectors could be developed using this type of data plus the events.

As long as the underlying relationships and quantities hold, it would be possible to make predictions and probably include confidence bounds via simulations. I'd probably try a weighted probabilistic decision tree or something like a random forest. Confidence bounds could be added through stochastic simulation.

There are limitations which are pretty serious: The underlying relationships which determine probability that individuals act out may be driven by economic settings, current social issues,transportation availability or cost and so forth - causing a once predictive model to become non-predictive as the situations change. Then again, once predictions become possible (even if tenuous) some will try to influence the results by altering the content on the Web or News if they can afford to do so. This too would end up altering the predictability.

Hope this helps.

teachtaire

posted on Jun, 14 2013 @ 03:54 AM

link

reply to post by BayesLike

Exactly, but that theoretical idea was put to use by darpa with programs during the egyptian riots. And prior to that, developed to that state in the first place.

What you are explaining is the primitive, small-scale version? Like what youtube uses or google translate?

teachtaire

posted on Jun, 14 2013 @ 03:58 AM

link

I guess, how big of a "brain" could you build?

Nevertheless

posted on Jun, 14 2013 @ 04:12 AM

link

Originally posted by teachtaire
reply to post by Nevertheless

You treat it as a component of a neural network, derp.

How?
How does translating from one syntax to another help predicting anything?

Nevertheless

posted on Jun, 14 2013 @ 04:14 AM

link

Originally posted by teachtaire
What you are explaining is the primitive, small-scale version? Like what youtube uses or google translate?

YouTube and Google Translate, translate.

What the poster explained was an idea, and also took up the problems.

BayesLike

posted on Jun, 14 2013 @ 04:20 AM

link

reply to post by teachtaire

What was done in Egypt was more than predicting -- there was very good evidence on ATS and in other places that agents were on the ground shaping results. Not predicting them. Predictions BTW are a lot easier if you are actively shaping the outcome.

Where Darpa would go is similar to what I outlined -- but they have a lot of resources which cuts the time down. Neural nets, BTW, are not likely to do as well in this setting as random forests. Should be obvious if you know what each really does.

The "dictionary" is not at all like google translate. It has to map text, verbal, or action into the data which can be used in the analysis. The predictions will be no better than this mapping -- most of the effort will be to discover the right mappings. You can pare it down some if you have too much detail, but if you have much too much the mathematical methods to pare it down would not work well -- there would be too much noise.
I used english words like anger as a label, but the data element would be more abstract than that in reality and include some forms of anger but not others. You could label this the letter A or AAba, mathematically it is all the same thing as long as it is a unique label for the concept.

teachtaire

posted on Jun, 14 2013 @ 04:23 AM

link

You remove syntax and create a "raw" language that doesn't lose fidelity/doesn't # up all the time. Sorry, I'm not going to be elegant with this. That data can now be narrowed with knocking out specific values, and then data can be further focused by creating certain optimal beam search-style relationships?

The artificial language that you translate to could translate data with no errors.

That is to say there would be no wrong words.

Does that make sense? Or is that incorrect. These papers are pretty tough reading man, they don't go easy on the reader. They could at least make the pages high contrast to lower eye strain. I mean, they're genius rich guys that are too stupid to use a high contrast dark background format. DERP!

teachtaire

posted on Jun, 14 2013 @ 04:25 AM

link

reply to post by BayesLike

Well, yeah. Of course we used it to shape results. Hell, even regular people use the same tech to shape results. It isn't "us" or "them" it takes two to tango. It isn't like our press is totally controlled, or we wouldn't know what sigint and stuff is. People are just to lazy to go read it all and recognize the value.

You're really just talking about series of information cascades, right?

teachtaire

posted on Jun, 14 2013 @ 04:31 AM

link

reply to post by BayesLike

I thought breaking it up as part of a larger network to represent the sum total of information a country produces by every interaction made would be better thought of then a forest; what is the difference between the two? Like I said, I'm ignorant, I'll go google it as well..

homepages.inf.ed.ac.uk...

^the one of the numerous scholarly articles I'm referring to. Another mentioned SPHINX, that used Hidden Markov Models and seems to be what you're talking about.

acl.ldc.upenn.edu...

^That seems to be an update, read through the references it uses, I hate it when ppl don't actually check the references and suck at reading.

edit on 14-6-2013 by teachtaire because: (no reason given)

BayesLike

posted on Jun, 14 2013 @ 04:31 AM

link

Originally posted by Nevertheless
Why (and how) would you use automatic machine translation to forecast anything, except perhaps as a tool in forecasting possible.. translations?
edit on 14-6-2013 by Nevertheless because: (no reason given)

The translation is just a step which gleans data from the sources on the Web or elsewhere. It has to have some context sensitivity -- way more than google translate.

The idea behind the simple decision tree (which would not be good enough most of the time) is that the tree defines a state-space condition that is paired to a classification of a type of outcome. To build the tree, you need sufficient data both for input and for the outcomes. Random forests are almost exactly what they sound like -- lots of trees.

Here's a nice little article: decision tree

teachtaire

posted on Jun, 14 2013 @ 04:41 AM

link

reply to post by BayesLike

www.youtube.com...

^that.

I'm thinking of HIdden Markov models, how are they different? Could you clarify that for me in simple terms?

edit on 14-6-2013 by teachtaire because: (no reason given)

BayesLike

posted on Jun, 14 2013 @ 04:45 AM

link

Originally posted by teachtaire
You remove syntax and create a "raw" language that doesn't lose fidelity/doesn't # up all the time. Sorry, I'm not going to be elegant with this. That data can now be narrowed with knocking out specific values, and then data can be further focused by creating certain optimal beam search-style relationships?

The artificial language that you translate to could translate data with no errors.

That is to say there would be no wrong words.

Does that make sense? Or is that incorrect. These papers are pretty tough reading man, they don't go easy on the reader. They could at least make the pages high contrast to lower eye strain. I mean, they're genius rich guys that are too stupid to use a high contrast dark background format. DERP!

Something like that. The principle method of forming vectors of content is essentially factor analysis, which rotates the natural vectors and allows you to toss out what isn't useful content. If you read comp-sci papers on this general method, the information is probably obscured in half-baked, poorly defined descriptions. It's simpler to go through the mathematical side. There's really nothing new in comp sci wrt these methods. Neural nets = logistic regression, SVMs = fancy discriminant analysis.

This is all pretty easy to do -- and there is a lot of cheap software to make it happen rapidly. The main issue is the dictionary. You can't include what you don't capture in the data. If you did capture it, and if it is useful, you will be able to get it into the models.

People are a lot more predictable than anyone imagines.

teachtaire

posted on Jun, 14 2013 @ 04:50 AM

link

reply to post by BayesLike

Predictability is a good thing, seems to work for most people. Anyway, back to the markov models, as there's nothing good on TV and summer is boring.

BayesLike

posted on Jun, 14 2013 @ 05:26 AM

link

Originally posted by teachtaire
reply to post by BayesLike

www.youtube.com...

^that.

I'm thinking of HIdden Markov models, how are they different? Could you clarify that for me in simple terms?
edit on 14-6-2013 by teachtaire because: (no reason given)

That video procedure is based on what we call "idiot Bayes." It assumes independence, at least the portion I watched. It isn't real world and won't work nearly as well as claimed.

I would assume if you wanted to do this model with stochastic processes, it might be possible but I think it would be hard and very slow to implement. Most of this field is involved with convergence properties of chains of probabilities, which can greatly shorten simulation time. This area isn't one of my core strengths although I do have some background in it. For example, a simple case would be a model which indicates an individual of Type G moves from state(a) to state(b) with prob 0.1 and state (c) with prob 0.2 or stays in state (a). An individual of Type G' moves from state(a) to state (b) with prob 0.11 and state(c) with prob 0.18 or stay in state (a). There would be simialr rusles for states(b) and (c). What might be studied is the convergence over infinite time to an outcome of G an G' distributions through the different allowable states. Of course, you can simulate that, but simulation does not prove convergence. Which is where stochastic process analysis comes into play. Hidden markov models would just have hidden states in addition to the observable states.

To make these models, you have to go discover all the possible states, their possible transitions, and assign the probabilities. Oncve you have that, you can analyze for convergence.

The decision tree / random forrst tackles this at the other extreme. It says, if the people are in this configuration, they have x% end up in state X, y% end up in state Y and so on. It's simpler but requires data. The stochastic model allows you to introduce theoretical states and transition probabilities and see what happens. So it can deal with unobserved settings -- if the introduced states are set up properly.

Lacking the full set of analytical tools (it is a deep specialization) I'd just go ahead and simulate the state space and assume it really did converge when it appears to converge. It might not really converge because simulations are digital and include roundoff...... Probably a stochastic analyst would normally do this too as a deeper proof of convergence is generally difficult.

I've analyzed a few systems this way, including (once) how to take down banks of servers at Google or Yahoo -- technically the reverse: there was in increasing frequency of outages of banks of servers and they wanted to know the cause. It took less than an hour to figure it out. I won't go into it, but let's say a very stupid assumption had been built into the code and they never bothered to monitor it.

teachtaire

posted on Jun, 14 2013 @ 05:31 AM

link

So... how would that relate to the monte carlo method? Couldn't you use the monte carlo method to test for values? Or is this another faux pas on my part -_- thanks for the patience.

BayesLike

posted on Jun, 14 2013 @ 05:40 AM

link

I'll check back tomorrow if you are still interested in discussing. This type of work would be almost as interesting as what I normally do. One of my acquaintances would like to monitor stocks (predictively) using similar concepts and he has been trying to get me involved. I told him it would be much easier just to detect changes in program trades. We don't need to predict the market, we just have to detect when it has changed.

Detecting program trading changes is MUCH simpler and I do have a method for that which will usually pick up the change in 3 to 5 data points for any given high volume stock. What I don't have is a fast enough data feed. I'd prefer almost trade by trade but could probably work with something as coarse as every 100ms.

Well, I do have to take a nap before going to work.... it's been enjoyable!