Curiosity/MSL: Status Updates on Technical Condition and Functional Capabilities

page: 1
4
<<   2 >>

log in

join

posted on Feb, 28 2013 @ 11:55 PM
link   
According to NASA/JPL, Curiosity is currently (01 March 2013) operating in "safe-mode", after engineers activated the back-up systems following a memory issue on the main computer. It was just a tweet from Curiosity's Twitter account, but it got me thinking:



Curiosity Rover @‏MarsCuriosity

Don't flip out: I just flipped over to my B-side computer while the team looks into an A-side memory issue Link to article on the NASA/JPL website


In the article referenced above, JPL further states:


The intentional swap at about 2:30 a.m. PST today (Thursday, Feb. 28) put the rover, as anticipated, into a minimal-activity precautionary status called "safe mode." The team is shifting the rover from safe mode to operational status over the next few days and is troubleshooting the condition that affected operations yesterday. The condition is related to a glitch in flash memory linked to the other, now-inactive, computer.



Actually, this does sound a bit disturbing, whereas I remember that MER-A Spirit had a similar (?) problem back in 2004 (flash memory management anomaly on Sol 17), which was resolved without problems within a few days. So, I actually don't worry too badly about Curiosity's condition and hope that the original system can be fully restored. Yet, it's kind of a pity that something like this occurs after 200 sols, but I guess that's exactly why these back-up systems are in place, just in case.

By the way: Does anybody happen to know (technical) details about the above mentioned swapping? And/or the potential implications? Any thoughts on this?

P.S.: Since this is an out-of-plan event, I thought it's worth a new 'general' thread on potential issues related to Curiosity's status and functional capabilities.
(without any intention to paint things black)




posted on Mar, 1 2013 @ 02:14 AM
link   
The "Curiosity" Kill........ "The Rover"....



posted on Mar, 1 2013 @ 03:05 AM
link   
From astrogeology.usgs.gov...


28 February 2013
During Sol 200, MSL was unable to save data to part of its memory, so the rover stopped what it was doing and waited for more instructions. The engineering team at JPL is analyzing the available telemetry to determine how to recover from this anomaly, and the Sol 201 plan was cancelled. The problem does not sound very serious, but I'm not an engineer and don't know much of the details. As more telemetry is received, the experts will probably figure out what caused the problem and how to avoid it in the future. But for now we have to be patient.



posted on Mar, 1 2013 @ 03:31 AM
link   

Originally posted by Arken
The "Curiosity" Kill........ "The Rover"....


Yeah, wouldn't that really be a shame! She's been such a 'good rover' up to now!!

(They talk of rovers in feminine over at JPL, don't they?!)

And besides: there's so much more imaging to do over at Gale Crater (and not to forget the drive up to the top of Mount Aeolis, of course).

So that thing's still gotta keep going for a while!!
I keep my fingers crossed!



posted on Mar, 1 2013 @ 03:34 AM
link   

Originally posted by jeep3r

By the way: Does anybody happen to know (technical) details about the above mentioned swapping? And/or the potential implications? Any thoughts on this?

P.S.: Since this is an out-of-plan event, I thought it's worth a new 'general' thread on potential issues related to Curiosity's status and functional capabilities.
(without any intention to paint things black)


They test the B side computer to make sure it's working, then command the A side computer offline into a diagnostic mode, the system will automatically swap.

They don't tell you enough about the flash problem in the article to know if it's an overwrite, a wear leveling issue, a bad block that's not being flagged or what. There's a lot of things that can go wrong with FFS in general.

edit to add: I guess the crappiest results would include: the B side says it's ok but isn't, and they end up with both sides shutting down; the B side has the same issue as the A side and the FFS caves in after a few days before they fix the A side; or the B side has a random fault before they fix the A side. You're operating with no safety net either way.
edit on 1-3-2013 by Bedlam because: (no reason given)



posted on Mar, 1 2013 @ 03:49 AM
link   

Originally posted by Bedlam
I guess the crappiest results would include: the B side says it's ok but isn't, and they end up with both sides shutting down; the B side has the same issue as the A side and the FFS caves in after a few days before they fix the A side; or the B side has a random fault before they fix the A side. You're operating with no safety net either way.


Thanks for your take on this, Bedlam. The above scenario would indeed be a catastrophe, to put it bluntly. It would also be a pity to be equipped with an RTG that could provide 60+ years of energy for Curiosity ... and then, all your plans get messed up by your FFS!
edit on 1-3-2013 by jeep3r because: formatting



posted on Mar, 1 2013 @ 05:22 AM
link   
Interesting...

There was the hoopla at NASA when scientists reported a find on Mars that would be one for the history books. That was followed up by NASA downplaying the information while walking it back at the same time.

Then we get the NASA announcement, not to long after the above, of a change in course for Curiosity.

New destination - The area Hoagland referred to as the Mars apartments (if I remeber right). Pictures were taken by the rover and released to the public showing what looks like vertical support beams.

Maybe they found something that could not be photo manipulated.. Maybe the area is so full of civilization evidence that its just impossible to hide them.

Also it reminds me of the other probes sent to Mars that suddenly lost contact or suffered some malfunction.

India announced they will be sending their own probe to Mars. Maybe, with all the interest on the moon and mars, its becoming more and more problematic on hiding the evidence?

Just thought I would throw out some conspiracy theory angle..



posted on Mar, 1 2013 @ 06:04 AM
link   

Originally posted by Xcathdra
Interesting...

New destination - The area Hoagland referred to as the Mars apartments (if I remeber right). Pictures were taken by the rover and released to the public showing what looks like vertical support beams.

Maybe they found something that could not be photo manipulated.. Maybe the area is so full of civilization evidence that its just impossible to hide them.


Hi, just wondering if you have links for these pics... would be very pleased around here.



posted on Mar, 1 2013 @ 11:52 AM
link   
reply to post by Xcathdra
 


Curiosity was already at Hoagland's "Apartments", and got some images taken from only 30 meters or so away.

Personally, I think it looks simply like layers of shale that has been exposed to wind erosion. There may be some vertical pieces in there too, but that could just be broken pieces of the shale that have fallen.


Here is one image of Hoagland's "apartments", taken from a distance relatively close to the formation, and taken with Curiosity's MastCam on Sol 113:




Here are all of the MastCam images taken of that area on sol 113:

Sol 113 MastCam Raw Images



edit on 3/1/2013 by Soylent Green Is People because: (no reason given)



posted on Mar, 1 2013 @ 01:16 PM
link   

Originally posted by jeep3r
By the way: Does anybody happen to know (technical) details about the above mentioned swapping? And/or the potential implications? Any thoughts on this?
Believe it or not, cosmic rays can even affect computer memory on Earth's surface, though it's rare due to protection from Earth's atmosphere.

Since the atmosphere of Mars is only about 1% that of Earth, it offers little protection against cosmic rays, so engineers try to build in protection in hardware and software, but it's not 100% effective as explained here:

news.cnet.com...

Cook said the memory in question is "hardened" to resist upsets caused by cosmic rays or high-energy particles from the sun. But it is possible an energetic particle hit in a particularly sensitive area -- the directory that tells the computer where data is stored.

"In general, there are lots of layers of protection, the memory is self correcting and the software is supposed to be tolerant to it," Cook said. "But what we are theorizing happened is that we got what's called a double bit error, where you get an uncorrectable memory error in a particularly sensitive place, which is where the directory for the whole memory was sitting.

"So you essentially lost knowledge of where everything was. Again, software is supposed to be tolerant of that. ... But it looks like there was potentially a problem where software kind of got into a confused state where parts of the software were working fine but other parts of software were kind of waiting on the memory to do something...and the hardware was confused as to where things were."

Cook said the odds of a cosmic ray or solar particle causing a problem like that were remote, but similar events have happened before.
If the problem was caused by a cosmic ray event, chances are pretty good that a power cycle will fix the problem, kind of like rebooting your PC at home fixes a memory problem caused by a cosmic ray.



posted on Mar, 1 2013 @ 01:27 PM
link   
reply to post by Arbitrageur
 


That sort of thing can be minimized by using longer syndromes, so that you can correct more bits. If they can only handle single bit errors, they're doing it on the cheap.

edit to add:

back in the old days when dynamic ram was particularly susceptible to cosmic rays, you could often tell when we were getting hit by the parity check error rate at the office.
edit on 1-3-2013 by Bedlam because: (no reason given)



posted on Mar, 1 2013 @ 01:52 PM
link   

Originally posted by Bedlam
reply to post by Arbitrageur
 


That sort of thing can be minimized by using longer syndromes, so that you can correct more bits. If they can only handle single bit errors, they're doing it on the cheap.
I got the impression they can only handle single bit errors, from this:


we are theorizing happened is that we got what's called a double bit error

I don't know much about curiosity's memory but I do know a little about memory in terrestrial applications, and typical ECC (Error correcting) ram can correct a single bit but not a double bit error:

DRAM

An ECC-capable memory controller as used in many modern PCs can typically detect and correct errors of a single bit per 64-bit "word" (the unit of bus transfer), and detect (but not correct) errors of two bits per 64-bit word.
I'm not surprised there are ways to correct double-bit errors, but I suspect they aren't too commonly used. But if engineers were going to use such an error correction system anywhere, I would have thought a Mars rover would be a good candidate.



posted on Mar, 1 2013 @ 02:54 PM
link   
Can someone help me here. It sounds as if Curiosity has a back-up computer, but does it also have a back-up for this flash memory?

If there is a back-up for the flash memory, is the information regularly (or even constantly) being written to the back-up, or is there a way they can shuttle information to the back-up only in times of need?



posted on Mar, 1 2013 @ 04:22 PM
link   

Originally posted by Xcathdra
Maybe they found something that could not be photo manipulated..

Anything can be photo manipulated.



posted on Mar, 1 2013 @ 04:23 PM
link   
reply to post by Soylent Green Is People
 


No idea, but I think Unmanned Spaceflight forum is a good place for such questions. www.unmannedspaceflight.com...



posted on Mar, 1 2013 @ 04:26 PM
link   

Originally posted by Xcathdra

There was the hoopla at NASA when scientists reported a find on Mars that would be one for the history books. That was followed up by NASA downplaying the information while walking it back at the same time. Then we get the NASA announcement, not to long after the above, of a change in course for Curiosity (...)

Maybe they found something that could not be photo manipulated.. Maybe the area is so full of civilization evidence that its just impossible to hide them.


For a millisecond or so, I pretty much had the same idea when I read that announcement yesterday. On the other hand, it did happen before and Spirit, for example, went on to do some amazing imaging over a period of some 2000 sols or so. And at least in that case, there didn't seem to be a cover-up but who can tell for sure ...

However, should the engineers not manage to fix that bug, I think we urgently need to reconsider your theory!



posted on Mar, 2 2013 @ 12:13 AM
link   
reply to post by Soylent Green Is People
 


Hey Soylent, many thnx for that info. BTW, these formations seem very common in certain areas. I'm not sure what the 'apartments' thing is about - they'd be pretty tiny by human standards... none-the-less, I do accept that these small structures do look odd. Whether they are natural or not is interesting.

Thanks again



posted on Mar, 2 2013 @ 01:23 AM
link   

Originally posted by Soylent Green Is People
Can someone help me here. It sounds as if Curiosity has a back-up computer, but does it also have a back-up for this flash memory?

If there is a back-up for the flash memory, is the information regularly (or even constantly) being written to the back-up, or is there a way they can shuttle information to the back-up only in times of need?
I'm not sure why you're asking or what you're getting at. There's an A and a B computer and each has its own memory. I don't really know for sure, but the ideas I got from the sources I read including the above link in my previous post is that no, the information is not regularly written to the B computer memory, when the A computer is in use.

The way they write information to the B memory is by switching to the B computer, is my understanding. So the entire B computer and its B memory is the backup.

They plan to switch back to the A computer if it was just a temporary glitch, and are still analyzing the failure to determine that, while they run on the B side computer in "safe mode".



posted on Mar, 2 2013 @ 04:24 AM
link   

Originally posted by ArMaP

Originally posted by Xcathdra
Maybe they found something that could not be photo manipulated..

Anything can be photo manipulated.


Within reason sure...

What I am saying is maybe the area contains so many artifacts / what have you that any photo manipulation would be detected.

Either or it was just a conspiracy theory i posted..



posted on Mar, 2 2013 @ 06:25 AM
link   
reply to post by Arbitrageur
 


The number of bits you can correct depends on the syndrome size. Bigger syndromes allow more entropy but you end up with more bits to store the syndrome, there's a crossover point where you're flogging a dead horse because your syndrome storage can also take a hit and make it look like an error when there's not one.

However, for high-rad environments (i.e. space) we always designed in two bit correction three bit detection. If they're using off the shelf flash and FFS for this (which typically only provides one correct,two detect), they really ought to have gone fully redundant on the storage ala RAID. That's not off the shelf for FFS, but it can be implemented pretty easily, we've done it for NASA high altitude balloon sonde systems where the FFS was going to be aloft in the squeaky edges of the upper atmosphere for months. You pay a bit of latency but it's worth it.

Tele showed we had a lot of corrections both internal to the FFS (single bit) and the occasional internal storage-system correction from our module where you got an uncorrectable block fail in one storage that was overwritten with good data immediately from the two redundant stores. Never had a system wide error, got an attaboy on the wall.

Here's a somewhat interesting pdf from Military Embedded Systems.





new topics
top topics
 
4
<<   2 >>

log in

join