Studying Temperature Data Using the Language of Science

Discussion in 'Environment & Conservation' started by PeakProphet, Dec 24, 2014.

  1. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    What follows is my preamble to meeting the challenge I made to Poor Debater. Poor Debater claims that climate “science” properly quantifies the uncertainty within their estimates and results, and I deny that ANY of them do it. Oh sure, they make claims of this, as evidenced by Poor Debater providing only papers that have the word “uncertainty” in the title. However, it is not the title that matters, but the actual WORK that was done, and in particular the assumptions made within that work, or referenced by that work. Shoddy reading and no in-depth understanding of either the language of science or how science functions appears to be the norm in these back and forth tit for tats in internet forums, so it seemed reasonable to lay out a challenge and see what happened next. I can certainly be wrong, and when I’ve given the other side the ability to choose ANY article in the entire climate science world to prove I am wrong, well, let me just say that I don’t mind playing against that kind of handicap, if only based on Poor Debaters posting history. All I can really do is what I’ve been trained to do, read the provided material, do my best to understand it, check the underlying assumptions and data, and then let the actual analysis itself show that those who fall for point estimates without asking about the underlying distributions are fooling themselves with their “knowledge” on the topic. This is a probabilistic world, and scientists not providing the proper quantification of estimates and conclusions are not doing anyone a service. This sets me diametrically opposed to the statements made by Schneider back in 1988.

    Decide what is right between being effective and being honest? In science, you tell the truth as best you know it, your job isn’t to sell, it isn’t DECIDING to be honest, there is no alternative, you don’t prostitute your ideas for media coverage and to be honest you shouldn’t even care about media coverage unless you picked the wrong vocation and instead of scientist want to be a bobble head media mouthpiece.

    You lay out your references and arguments and then hand your idea to your worst enemy armed with a bazooka to poke holes in it…because if he/she can, YOU NEED TO MAKE YOUR SCIENCE BETTER. You don’t avoid 3rd party reviewers, you relish them, you trumpet their rebuttal from the rooftops (and your counter response), and you don’t hand your paper based entirely on statistical aggregations to a meteorologist. You want the most knowledgeable and informed ENEMY to review your work, you don’t hand it to another person who already has the same preconceived notion you do of how HONEST you need to be.

    I recently challenged Poor Debater to find any, ANY climate article/peer reviewed science of his choice, properly quantifying underlying uncertainty within it. This isn’t my first rodeo, and after 15 years of experience as a staff scientist with a scientific organization that most people on this forum would recognize if I named it, and some time spent reviewing prior “peer reviewed” work I had more than a few hints of what might be hiding in the “climate science” woodpile. If you had asked me what I did primarily during those 15 years, such as when my mom asked, I would often answer “I quantify uncertainty in complex systems”. Forensic statistics a better description for those with more understanding of the science world than dear old mom.

    As a bit of background, I have had 3 careers in my lifetime. Career #1 was specializing in oil and gas field work, culminating in Supervising Engineer for Special Projects. Career #2 was as a staff scientist, initially brought in as a subject matter expert in oil and gas operations and expanding later into stochastic modeling and quantifying resources and reserves, sub-surface and reservoir dynamics, uncertainty of rock properties across fields, statistical methods and models to quantify just about everything related to oil and gas exploration and development, including the economics of finding and producing unknown and as yet undiscovered amounts of oil and gas. I am author or co-author on 5 Natural Resource Research articles since 1998, more than 100 publications related to various other resource and geoengineering issues, published internationally in English and Spanish, in journals such as Geological Survey of Spain, SPE Reservoir Evaluation and Engineering, International Journal of Coal Geology, and GeoArabia.

    I have only perused a single article of climate science to date, and that was to thoroughly read how it was possible for Hansen to “correct” historical data to create a warming profile in the US where one had not existed before. I objected to this “correction” on the grounds that when I was publishing science, the substantial quality controls on the work done and conclusions reached (internal and external to the organization I worked for) would have precluded modification of data, or worse yet, drawn a conclusion or estimate without quantifying the uncertainty in the estimate, or worse yet had the entire “correction” to the data itself be the conclusion. In the science world I am familiar with, you can easily be fired for not ignoring basic scientific quality control measures.

    This then led me to search for the 3rd party independent review of the statistical aggregations, correlations and assumptions within the work and lo and behold…the only thing anyone could provide was this article was “peer reviewed”. Sorry, but my science work was quality controlled by outsiders to make sure that EXACTLY what Hansen had done could not happen without a far more thorough vetting. Every scientist knows that when your “correction” to data creates whole cloth a trend or effect, it must be viewed askance right from the get go. The single time that I ever stumbled into this problem those 3rd party reviewers blew my work right out of the water, requiring it to be deep sixed and completely restarted. 18 months of work….gone in 5 minutes in that environment. When I looked for the American Statistical Association (ASA) stamp of approval on Hansen’s work to make sure it had been vetted approved as statistically valid, it was not to be found.

    I will apply analytic methods of studying data to some basic data sets in this post, having already established that the underlying paper provided by Poor Debater relies on another paper, which readily admits that it can’t quantitatively quantify uncertainty, because…gee…it is unknown! Doing no more than some light reading to win the original challenge was hardly a surprise, these little science assumptions dodging the hard questions are actually quite common. Someone makes an assumption that says, as this one did, that “gee we can’t do it but we’ll do this other stuff” and then the next paper references the first, and you can’t find the assumption or sensitivity it is all based on without digging back through the sequence. Not an uncommon thing among those who wish to tout “science” as some cure-all for their inability to think and read for themselves in internet debates.

    It is not an opinion that we live in a probabilistic world. There are huge uncertainties in everything we do as humans. Point estimates of temperature being the example we will poke around in here. Issues of uncertainty appear to be handled by assumptions, with occasional cuts at instrument or weather bias. I will demonstrate that there is data and measurable uncertainty that is being avoided, assumed, expressed as ranges without comment on proper shape of the distribution, and aggregated in ways that create one answer, as opposed to another.

    The next post will contain the original challenge and provided reference, and the location of the text within further referenced texts that reveals, in “science-speak”, why they aren’t really doing what one might think based on the title.

    I apologize if this runs a little long, but I plan on explaining some particulars along the way, should a statistics whiz wish to duplicate or follow along (to validate or refute, in my science experience, both are acceptable) and tell me why I am wrong.

    It is hoped that the next time some science-ignoramus claims “the science is settled!” in the forums this type of uncertainty analysis can be referenced so that those who don’t have a CLUE how the language of science works will at least know how easy it is to ask these questions, and why 3rd party independents help keep science clean, why the language matters, and ask the question why this type of destruction testing isn’t being done with every published climate “science” report.
     
  2. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    The original challenge.

    And my response.

    I have a much more substantial bet for you. Find any ONE that properly encompasses the uncertainty contained within the data, instruments, discusses the correct methods of assembling correlations, lays out the aggregations in a way that we can evaluate. I haven't seen any such article yet, but certainly I don't spend my time within the climate science field, based on the quality demonstrated to date not sure I would want to.. If they actually do quantify their uncertainty properly, maybe they have done it. Once. Somewhere. Pick your favorite, on any climate topic you'd like, and we can discuss the uncertainty. Together hopefully, I would hate to discuss science in the appropriate and correct language all by myself, that would just be showing off.

    And after a hiccup on providing articles from nature that required a subscription, and my education on the quality of nature recently in a field I am very familiar with, Poor Debater came up with this:

    Quantifying uncertainties in global and regional temperature change
    using an ensemble of observational estimates: the HadCRUT4 data set
    Colin P. Morice1, John J. Kennedy1, Nick A. Rayner1 and Phil D. Jones2,3

    http://precis.metoffice.gov.uk/hadcrut4/HadCRUT4_accepted.pdf

    To some extent this was unfortunate right off the bat. You see, in science, you can count on folks referencing prior work when they write things down. Including where you got the data you then build some cool idea on. This referencing of other works is very important in the internet forums because it appears to attach credibility to a work. Unfortunately, and this is also common, there is ZERO requirement that it be done correctly. And it raises the possibility that you might build a body of work that all might rest on a single, really shaky assumption, but by referencing someone ELSE who made it, you can escape having to tackle this potentially tricky issue yourself. An example would be that someone writes that 2+2=5, and builds some ridiculous scenario on it, and then another author comes along and references the original work without mentioning that all of his work is dependent on 2+2 equaling 5.

    In science, you tend to find these hiccups in the assumptions. Here is an example.

    From the title of the original referenced work, based on the title (which appears to be all the farther that Poor Debater researched it), you would think these guys are doing whiz bang with uncertainty!! Great!

    This is my original response upon finding exactly that sort of dodge in the article provided by Poor Debater. No point in writing it again, I will just link to it.

    http://www.politicalforum.com/showthread.php?t=387665&page=3&p=1064563178#post1064563178

    The fundamental point being, that by just looking back a single reference that this reference is built on, you discover that these poor guys are already up a creek without a paddle, they just say "gee...we can't quantify things definitively...so we won't!!" and any further conversation on uncertainty becomes irrelevant. The work referenced previously punted...and if they coudn't do it, the point of the original work WASN'T to fix that issue...they simply took the work done prior and used it in some other way, playing around with some uncertainties here and there, but certainly not actually quantifying uncertainty in the underlying assumptions or data UNLESS THEY WANTED TO. The example being changes in buckets for sea water temperature, piddling little things compared to the TRUE uncertainty contained within their work. Certainly accurately quantifying differences in temperature because of some sampling measure is completely reasonable, but what it ISN'T is QUANTIFYING THE UNCERTAINTY IN THE DATA ITSELF.

    Let me explain. Let me take a water sample with a normal bucket, and an insulated one. I do this instantly from the same location, both buckets. The reading from the insulated bucket is 31.2 and the uninsulated bucket, which may have cooled or warmed slightly as it was withdrawn from the water, 31.3. They both sampled from the SAME water temperature in our example, one is just different because of the sampling method itself. Any correction arising from changing the uninsulated bucket temperature to get back to a supposedly "true" temperature is a small piece of error. Our young gentlemen in Poor Debater Reference #1 are worried about this error, they know it is there, they can see it, they can run measurements, and they write this HUGE paper on explaining how they correct for it. Let us assume that the correction is to substract 0.1C from all uninsulated bucket readings, and then we proclaim that we now have this slight error in our temperature estimates, because really, some buckets might be a little warmer, some cooler, so we do horrendous academic gyrations and come up with an error of +/- 0.1C on all our readings. We write a paper using the word "uncertainty" in the title, and move on to something else.

    But here is the REAL problem. If I measure the same water tomorrow, the temperature has warmed or cooled slightly. And the day after. And the week after. Storms come through. Something freezes. The sun is behind a cloud. Let us say that we sampled every minute of every day for an an entire month. Is there ANY expectation that the temperature was 31.2C on each and every occasion? Of course not. Sometimes it might have been 25.0C. Sometimes 40.0C. The range of uncertainty without even trying hard across some small period of time might span +/- 15C. Tell me, which is more important to quantify the uncertainty about any single temperature measurement....the difference between insulated and uninsulated buckets, or the uncertainty of the distribution of answers possible within some body of water, and where and when that measurement was taken?

    Until you understand the fundamentals of your uncertainty, you don't even know what uncertainties matter. Once upon a time they taught significant digits in high school mathematics. They matter for a reason...if your measurement precision is +/- 15C, your fantastic paper quantifying your +/- 0.1C correction is meaningless. I wonder if it requires a PhD nowadays before folks are taught such basic concepts.

    Next post I'll post up some charts and figures, so far I've got a 100 year old time series from a single Idaho temperature station, and a TMin series for the entire country of Germany.
     
  3. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    Germany as a country, Part 1.

    This data comes from a dataset I obtained while rooting around in the original provided reference. Imagine this…take all the data stations in Germany….average them all together…and you have a sequence of average temperature data points per month. The reason I choose this dataset is because it comes with a confidence interval. This is good….and we can actually use even this small amount of data to test to see if that interval makes any sense, or implies a different level of “confidence” than a 3rd party reviewer might be comfortable with.

    The data set averaged by month from many locations across Germany.

    http://berkeleyearth.lbl.gov/auto/Regional/TMIN/Text/germany-TMIN-Trend.txt

    Now remember the water temp issue mentioned earlier? What the folks who build this database are banking on is the law of large numbers. If I can just add up enough data, then the mean of the distribution should be revealed! The more numbers I have, the more likely it is I will be right! It sort of does work, but as with all things, the devil is in the details.

    http://en.wikipedia.org/wiki/Law_of_large_numbers

    However, there are issues here that are not discussed, and would be among the first asked by a 3rd party reviewer. But first let us start with the basic data. This dataset uses the word “anomaly” as do other climate papers, they baseline the data for some time period (if I recall, this one uses 1951-1980 mean) and then measures all temps as degree C above and below that baseline. If you even eyeball this preliminary graph you can see that things tend to be above and below the 1950-1980 line, by fractions of a C, which makes sense if you are just discussing a mean temperature difference to some given number. The absolute numbers really don’t matter in this case, we are interested in whether or not these guys are as certain as they want us to believe. After all, they have a “confidence interval”, just gives you shivers doesn’t it, the scientific-ness of it? Certainly those who don’t know the language seem to think so.

    Here is a basic graph of that data.

    [​IMG]

    Looks a bit on the “warming” side doesn’t it? But notice WHY it looks warming….looks like a pretty steady temperature trend through the 1970’s perhaps, and then a gradual increase. Just looking at this graph I can see the furious nodding among the faithful..science at work…yep…those poor Germans, gonna burn up any second. Now let us add how CONFIDENT the folks are about these numbers. Realize, this is a completely reasonable exercise as well, picking some interval and proclaiming, through one mechanism or another, your “confidence” in it. Confidence intervals can be tricky, because they are not required to be an empirical measure, there can be a measure of author built into this. Notice there is an EXPECTATION involved, and in the language of science, you need to be very, VERY careful with expectations. People looking for warming expect to find warming, people looking for cooling expect to find cooling, and any good scientist is always testing themselves with those previously mentioned enemies to make sure they aren’t falling into this one, intentionally or unintentionally.

    http://en.wikipedia.org/wiki/Confidence_interval

    This is the quote from those who provided the data:

    Interesting…so even here we have a “confidence interval” but we haven’t taken the time to study other forms of systemic bias. Again...these assumptions matter...because if you have a 5C uncertainty range somewhere inside the data, who CARES about a 0.5C "anomaly", you are well inside the "noise" of your ability to measure anything. Well, how about we do a little of that ourselves, shall we? Here is that same graph with the confidence interval (expressed as an over under number) shown above and below the provided data.

    [​IMG]

    Notice the wider spread of temperature anomaly in the past, this makes perfect sense if you think about temperature instrumentation from back to the early 1800’s. I mean really, does anyone think that 1830 thermometers were as good as the ones we make today? Me neither. Therefore, it makes perfect sense that folks are a bit more leery of the accuracy of those early readings (don’t forget the significant digit issues). At first glance, what REALLY stands out to a forensic statistician is the CONFIDENCE they have of the near time data. I mean 0.3C across the entire country? REALLY? We’ll test that in a minute.

    But first, would anyone like to see what story I can tell about THIS graph, as opposed to the original one that seems to represent only warming? Here are statements I can make, completely “confidently”, but only AFTER you can SEE the uncertainty. And let us not forget, this is THEIR uncertainty, I haven’t teed up quite yet. Consider this demonstrative of how a 3rd party reviewer would start in reviewing the information provided.

    I can make both of those statements within the 95% confidence interval provided by the Berkeley folks. Nice of them to allow the numbers to speak for themselves, unfortunate that the climate enthusiasts don’t demand and discuss this additional information on every damn thing they are ever told because with it ALONE…suddenly….Germany has two stories to tell, and both of them your friendly objective climate scientist can be “confident” in. To a scientist trying to sell you something, "you" being the ignorant amateurs who do not speak the language, this is a disaster. How can I sell global warming when CO2 hasn't even caused Germany to warm up as much as it was in the early 1800's? I mean, just read the press, all this "warmest month in 5000 years!!!" stuff....what happens if the people suddenly figure out what is REALLY happening? How can I ever pitch solar panels! AAHHHH!!!

    To an objective scientist examining data, we stare at that second chart and say...damn....we need to find a way to narrow those old uncertainties because it COMPLETELY matters. Advocates don't want you asking questions...just nod at what they present...and..more importantly...DO WHAT THEY TELL YOU TO!

    Next post…I will do some independent analysis on this German data, and we’ll examine some real uncertainty..in other words..we won’t be buying the story from forum “experts” that just because the word “uncertainty” is in the title, it means anything at all.
     

    Attached Files:

  4. Poor Debater

    Poor Debater New Member

    Joined:
    Sep 6, 2011
    Messages:
    2,427
    Likes Received:
    38
    Trophy Points:
    0
    Note that you failed the original challenge. In fact, you failed so miserably you didn't even try to respond to it.

    Score: Science 1, Denierstan 0.

    And do you have actual evidence that the references in Morice et al. contain flawed data? No you do not. All you have is baseless, groundless speculation which happens to be politically convenient for you to believe. So on the basis of your hoping and wishing for unicorns, you simply assume that the science is wrong.

    If you were a real scientist, you would know that if you actually found such flaws, you would instantly have a paper that would not only be publishable, but highly cited as well. But of course you don't have real science. All you have is wishes.

    Score: Science 2, Denierstan 0.

    It's instructive to look at that quote (from Brohan et al. 2005) and see what it was that set PP over the edge. Here's the quote:

    In other words, Brohan et al. are saying that it's always possible there may be sources of error that have not yet been discovered. And PP is jumping up and down and pointing while yelling "SEEEEE!"

    Well, okay, Mr. Science, I'll bite. Why don't you tell us all how you, in your job, (a job which you assure us is all about error correction), go about correcting all those errors from sources that have not yet been discovered and may not actually exist. Go right ahead. We're all ears.

    What's that, Mr. Science? You can't tell us that vital bit of information? You can't, you can't you can't? Gee, that's too bad. Because that makes the score:

    Science 3, Denierstan 0.

    So you're saying that known systematic error is so unimportant it should NOT have been corrected? Gotcha. Or perhaps you're just totally ignorant of the difference between a change in a measured quantity and a change in measurement error. Either way ...

    Science 4, Denierstan 0.

    And since a thermocouple thermometer typically measures in .1 C increments, I guess you're the one who needs to go back to school.

    Score: Science 5, Denierstan 0.
     
  5. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    The exercise was not to find flawed data. It was based on quantifying uncertainty properly.

    The challenge wasn't to find flaws. I am no longer publishing peer reviewed science,and could care less what YOU think on doing peer reviewed publications until you can discuss your experience doing it. Science wanna-be's might think being cited is some Holy Grail, the quality of my career has nothing to do with just my science work, but the practical application of that science. I am not some permanent student/professor/academic and do not judge quality based on such a ridiculous metric.

    In the original response, I told you that they were both right..and wrong. Do you even know HOW it was possible to make such a statement, or is your ignorance on uncertainty as absolute as you are making it appear?

    I will get to it grasshopper. You have already demonstrated in this post alone that you don't even know the difference between measurements of uncertainty, and "flaws" in data. What in the world makes you think that someone incapable of processing fundamental definitional differences will understand the more complex process of measuring what is not currently known?
     
  6. Poor Debater

    Poor Debater New Member

    Joined:
    Sep 6, 2011
    Messages:
    2,427
    Likes Received:
    38
    Trophy Points:
    0
    Now wait just one cotton pickin' minute there, pardner. Where's your comprehensive error analysis of the Berkeley Earth dataset, required before usage? Because I have it on good authority that any scientific dataset that hasn't undergone strict quality control is totally worthless and utterly unreliable.

    Have I been misinformed?

    That's not the way it looks to me at all. It looks to me like fairly steady warming throughout the 20th century, perhaps accelerating after 1970. And indeed, one has to wonder why Mr. Science is relying on his eyeballs rather than actually doing a regression, or a running regression such as a Loess smooth. I guess that's just too science-y for this audience, eh, Mr. Science?

    [​IMG]

    Nonsense. Nineteenth century thermometers might (in some cases) be less precise than modern ones, but they are no less accurate. The freezing point and boiling point of water has not changed since then, and calibration techniques are not difficult. The reason uncertainties are greater in the 19th century has everything to do with the number of thermometers, and nothing to do with their accuracy.

    I'm sure you can make any statement you like, but that doesn't mean it's true. In fact, it is quite likely that neither of those statements is true, and neither one can be made with 95% confidence, nor anything like it. Mr. Science seems to not know the difference between the edge of a confidence interval and an actual measurement. Mr. Science is a very, very confused individual.

    For example, while Mr. Science claims it's possible to state with 95% confidence that it has warmed 6° in Germany since the early 19th century, that's not at all what that confidence interval implies. While it's true that the lower edge of the 95% CI reaches down to that level, that doesn't mean that the temperatures did. In fact, what that really means is that there is only a 2.5% chance that temperatures were that low (or lower) at that time. So anyone who makes the same statement Mr. Science did would have a 97.5% chance of being wrong. Which, in fact, he is. And the same argument also applies to the upper CI as well, and to that statement.

    What we can conclude from the data is:
    1. We can't tell much useful about the 19th century temperature trends in Germany from the thermometer data; and
    2. During the 20th century, Germany has been warming at an accelerating pace.
     
  7. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    A deterministic versus probabilistic world

    Let us discuss for a minute the world that people THINK they live in (a deterministic one) and the one they DO live in (a probabilistic one). It is important to understand this difference if you want to understand science, because while science can be amazingly precise, it can also be amazing precise in a way that might drive some people batty.

    Here is an example drawn from my professional expertise. I wish to calculate the volume of pore space in rock under a county, that might contain oil or gas. We all know that volume for some rectangle is Height X Length X Width. The pore volume contained within this rectangle is just the amount of void space in the rock, called porosity, that can contain fluids. So my simple equation is just H X L X W X Poroity (as a %) = cubic units. When measuring these properties in the subsurface, I have estimates of H, L and W and porosity from different spots around the county. Similar to temperature sampling, a key difference being that my rock doesn’t change with respect to time in that same spot, whereas temperature does (adding another level of underlying uncertainty to the mix). I average these H,L, W and porosity values (10 feet, 10,000 feet, 10,000 feet and 6%...it is a small county..) and I have 60 million cubic feet of volume.

    When adding together distributions (which is what I ACTUALLY did), the mean is one of those statistics that I can carry easily through an operation (add, multiply, subtract, divide), but this does not apply to all statistic metrics. But the mean of any one distribution can be multiplied, divided, added or subtracted and will carry through the calculation (as long as these distributions are considered independent).

    What using the mean DOES do is create a single answer we can all discuss. Just like we can say we know what the mean temperature is, we can say we know what the mean pore space is. Everyone is happy, because….most people are very deterministic..and this includes scientists. I've seen them get downright cranky when someone shows up and tells them that their point estimate mean is crap because it is misleading…they will shout and get angry, tell you to buzz off, I've seen it. This fancy pants statistical stuff is ridiculous, HOW DARE YOU TELL ME THAT THIS ISN'T A PERFECT ANSWER!!! Think about it folks, how would YOU like to live in a probabilistic world when it comes to…say…your paycheck? You know you will get paid next week, how would you like it if your boss could always change your hourly rate? Maybe you’ll get $200 after taxes next week…maybe $4000!! Nope..most people would not be happy about this one, me included.

    But this is exactly why statistics is the language of science, the physical world is so complex it encompasses near infinitely changing conditions, and our sampling into it is finite. So we describe these changing conditions and circumstances with distributions rather than point estimates, and now several other issues come into play.

    Most people will interpret a point estimate mean as a more “precise” an answer than a range of answers, and nothing could be farther from the truth. The mean temperature or pore volume is just a number, and what it doesn't tell you is the shape of the uncertainty around that metric. It doesn't tell you the probability of achieving that point estimate, because a mean is not a measure of probability. I can easily give you a mean estimate, but for a given sample the odds of achieving that answer, or something higher, can be far above, or below, what you expect…if your expectation is the mean. Suddenly, the SHAPE of your answer becomes important.

    Let us assume 4 uniform distributions values we used to calculate pore volume. To save space I will only show 1, but all 4 would look the same except with different maximum and minimum values.

    [​IMG]

    What this graph says is that I have a symmetrical range of uncertainty around a mean AND median value of 10, and there is an equal chance when sampling the rock involved of that value being anywhere between 6 and 14’ thick. But when I multiply out all 4 distributions, this is the corresponding SHAPE of the answer that we have been previously referring to as just “60 million cubic feet”.

    [​IMG]

    This is actually a far more precise probabilistic answer than the deterministic one most of us would be happier with. Except…much like the German temperature example, suddenly not only can we see the point estimate but we can SEE other information. For starters, even though we used all uniform distributions…because we multiplied the distributions together, our answer is actually skewed. A normal distribution (one that appears to be assumed within the climate world) is NOT what we have (notice skewness value <>0). Notice also where I put the probability marker…right on the mean answer we had previously calculated. Yet, because the resulting answer is skewed, it turns out that you can only achieve the actual value of the mean (or greater) 44.6% of the time. So while the mean is a measure of central tendency, it is NOT a measure of equal probability. It COULD be under special circumstances, but it is the median that is defined as the 50/50 point of probability. The mean is not tied to a probability by definition, and in this case a value equally or exceeding it comes out at less than 50/50 coin toss….5.4% less…to be exact. This is important…without understanding the shape of the answer, you don’t know if the mean is something that can be reached and exceeded 2 times in 4….or 2 times in 100. Imagine…if someone tells you the mean world temperature I projected to go up a mean of 1C…the horror! But if also tell you that there is only 1 chance in 100 of that actually happening…well….you might have a completely different response.

    How about another complication within the language of science? This answer makes an important assumption, one that may or may not exist within the climate science world. What, one might ask, is the relationship between the four variables? Does higher porosity correlate with higher thickness? Does overall width correspond with length? Does it matter?

    The above graph assumes independence among the variables. Certainly this is NOT true in the climate science world. Or maybe even any world within the physical sciences. So we’ll experiment with high levels of positive correlation among these 4 variables, and high levels of negative correlation to see how even such a simple model changes.

    [​IMG]

    This chart shows the uncorrelated probability density function (green), the positively correlated density function (red), and the negatively correlated density function (blue) for pore volume. Is there anyone who wants to maintain that something as small as correlations like this SHOULDN’T be explained, documented, and researched to the hilt before anyone thinks that the mean is itself much of a meaningful answer?

    I mean, just LOOK at what correlation does to the mean. In our original green example, we get the mean. When we negatively correlate the variables the mean moves downwards, yet when we positively correlate the variables the mean moves upwards by 15.6%!! But it is those shapes that are important, because each shape tells you about the probability of your answer as a whole, all the information hidden from you when you settle for a point estimate.

    For those who, upon investigating the thorough nature of documenting these correlations and sensitivities of the results on them in the climate science world and not finding it, then proclaiming that obviously this can’t be done because it is hard or unknown (as those authors implied in post #2) I can only recommend that they haven’t read widely enough, because certainly the scientists of the U.S. Geological Survey are doing just this kind of work in their science specialty. So others don’t get to proclaim how hard it is, or why they aren't doing it, because some scientists know darn well how much this matters.

    http://www.sciencedirect.com/science/article/pii/S187661021300667X
     
  8. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    The testing of the assumptions and data of others starts with their data and assumptions, and not mine. This is done so that when further analysis is performed, the two can be compared. Perhaps the next challenge can be you finding even any of THAT, but for now we are concentrating on the lack of incorporation of uncertainty in the answers, and what this signifies in terms of the validity of the result.

    If there is anything incorrect based on the dataset used, I will be more than happy to correct it, and revise any conclusions based upon it. That is what is done in science, as opposed to seeing only what one wishes to see in the data.
     
  9. Hoosier8

    Hoosier8 Well-Known Member Past Donor

    Joined:
    Jan 16, 2012
    Messages:
    107,541
    Likes Received:
    34,488
    Trophy Points:
    113
    One of the major failures in tabloid science is communicating the uncertainty. It is an ethical failure.
     
  10. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    If you believe what Schneider wrote in 1988, then you can consider this act a feature, not a failure.
     
  11. Poor Debater

    Poor Debater New Member

    Joined:
    Sep 6, 2011
    Messages:
    2,427
    Likes Received:
    38
    Trophy Points:
    0
    In other words: one set of rules for data you like, and a completely different set of rules for data you don't like.

    Gotcha.

    The mental process of Mr. Science works then something like this:
    1. Determine if dataset supports your political beliefs.
    2 (a): if YES, then data is automatically assumed to be perfect and flawless, and no QC checks are required.
    2 (b): if NO, then data is automatically assumed to be useless and worthless, because of unseen but by-golly-we're-sure-they-just-gotta-exist QC errors of which Mr. Science is the world's greatest expert.

    And this is what passes for the "scientific method" in Denierstan.

    And where oh where is the "anything incorrect" that you assume (without evidence) exists in HADCRUT4, or GISS, or NOAA, or the ARGO float data, or Jason/TOPEX sea level data, or PIOMAS, or any one of dozens of other standard datasets upon which AGW theory is built?

    Nowhere.
     
  12. contrails

    contrails Active Member

    Joined:
    Apr 18, 2014
    Messages:
    4,454
    Likes Received:
    24
    Trophy Points:
    38
    Stefan Rahmstorf gives a good explanation of confidence intervals over at RealClimate.org. I think the following paragraph sums up where some people misread what confidence interval means.
    He goes even further, and with the help of mathematician, performs a change point analysis to identify which time series have different linear trends. While there are definite changes around 1910, 1940, and 1970, there is no change in the linear trend since 1998 as many claim.
    [​IMG]
    Before you claim that this is because he used GISSTemp data instead of one of the other temperature records, Grant Foster over at Open Mind did the same analysis on all of the major datasets, and while NOAA, BEST and HadCRUT4 come close, none of them show a trend change between 1990 and today. It also seems quite clear that the likelihood of a trend change is decreasing as well.
    [​IMG]
     
  13. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    For those hard of hearing, or just plain stupid, we are not discussing incorrect data, we are discussing uncertainty within data that may or may not allow such results or conclusions to be made. Of particular interest to me is if the stated uncertainties themselves can be accurate, given underlying information.
     
  14. Hoosier8

    Hoosier8 Well-Known Member Past Donor

    Joined:
    Jan 16, 2012
    Messages:
    107,541
    Likes Received:
    34,488
    Trophy Points:
    113
    Of course, using only a dataset that agrees with his thesis.
     
  15. Hoosier8

    Hoosier8 Well-Known Member Past Donor

    Joined:
    Jan 16, 2012
    Messages:
    107,541
    Likes Received:
    34,488
    Trophy Points:
    113
    AGW is not based on the datasets but on an unvalidated hypothesis.
     
  16. contrails

    contrails Active Member

    Joined:
    Apr 18, 2014
    Messages:
    4,454
    Likes Received:
    24
    Trophy Points:
    38
    Which dataset was that, Hoosier8?
     
  17. contrails

    contrails Active Member

    Joined:
    Apr 18, 2014
    Messages:
    4,454
    Likes Received:
    24
    Trophy Points:
    38
    AGW is based on the proven theories that atmospheric CO2 retains heat and human fossil fuel use has increased atmospheric CO2.
     
  18. Hoosier8

    Hoosier8 Well-Known Member Past Donor

    Joined:
    Jan 16, 2012
    Messages:
    107,541
    Likes Received:
    34,488
    Trophy Points:
    113
    GISS land and ocean instead of more accurate satellite data. You do know the temps were supposed to warm in the troposphere first didn't you? Didn't happen. Another prediction down the drain. Since it was such a bad PR failure, IPCC5 left the hot spot graphs out. Why? Because the science isn't settled.

    Gosh, they have model bias but don't know why.

    - - - Updated - - -

    Still unvalidated in the chaotic climate system.
     
  19. contrails

    contrails Active Member

    Joined:
    Apr 18, 2014
    Messages:
    4,454
    Likes Received:
    24
    Trophy Points:
    38
    Apparently you missed where Grant Foster compared all datasets, and not only do they all support the thesis that the trend hasn't changed, but UAH TST (a satellite dataset) showed the least chance of change. And you still think temperature measurements taken from 200 miles up are more accurate than temperatures measured at the surface?

    Apparently you don't know that a tropospheric hot spot, or lack of one, does not prove or disprove the existence of anthropogenic warming.
    http://thingsbreak.wordpress.com/20...ic-temperature-instrumental-and-proxy-trends/

    You should familiarize yourself with Rule 15.

    The global warming potential of atmospheric CO2 has been validated for over 150 years.
     
  20. Hoosier8

    Hoosier8 Well-Known Member Past Donor

    Joined:
    Jan 16, 2012
    Messages:
    107,541
    Likes Received:
    34,488
    Trophy Points:
    113
  21. Reiver

    Reiver Well-Known Member

    Joined:
    Sep 24, 2008
    Messages:
    39,883
    Likes Received:
    2,144
    Trophy Points:
    113
    Please refer to one peer reviewed article that supports your position. Good luck!
     
  22. Hoosier8

    Hoosier8 Well-Known Member Past Donor

    Joined:
    Jan 16, 2012
    Messages:
    107,541
    Likes Received:
    34,488
    Trophy Points:
    113
    Ah yes, you can't understand anything unless it is some peer reviewed paper. That is what you hide behind everytime.
     
  23. jc456

    jc456 New Member

    Joined:
    Dec 4, 2013
    Messages:
    2,407
    Likes Received:
    4
    Trophy Points:
    0
    what experiment proved it? show us the one that shows adding CO2 will cause warming temperatures? Just one experiment. Herr Koch 1901. Just remember it's you saying proof. You ain't got any. The science is anything but settled.

    - - - Updated - - -

    provide one experiment that proves you peer review.
     
  24. PeakProphet

    PeakProphet Active Member

    Joined:
    Mar 12, 2012
    Messages:
    1,055
    Likes Received:
    1
    Trophy Points:
    38
    Let’s talk Germany Part 2
    As demonstrated previously using the German data set, there is a marked decrease in confidence interval (an increase in certainty) as time progresses. By the time we hit the 1900’s or so, those folks at Berkeley are pretty confident about the information they provide! Except for all the uncertainties they are still working on, and haven't included yet anyway. This increase in confidence can relate to data density, improved instrumentation, assumptions of correlation among multiple stations (not provided along with data sheet), and in statistics all of these would be reasonable reasons for decreasing uncertainty, the more sampling, the better instrumentation, the greater correlation confidence; the more certain you can be that you have that distribution figured out.

    So how about we assemble some of their data, and examine it not for confidence levels, but the actual distributions involved, and test two ideas…are distributions of means in temperature data themselves normal distributions as statistics says they should be (and as climate folks then assume) and can we determine from that data the same high level of confidence that the Berkeley folks do?
    For this exercise, we are going to bootstrap the distributions using the provided data. And then examine the percentiles within those distributions to determine what WE think a confidence level might be (confidence meaning the chance of a value falling within a given bootstrapped distribution). But I don’t want to just generate confidence intervals by just stating them, in my area of science we would empirically express the percentiles as part of the answer, because we didn't want people thinking we just DECIDED between ourselves that the error should be such and such. Nope. No misunderstanding in this regard was allowed, I show the frequency and the percentiles from the data, and it is what it is. Others can argue about each their own levels of “confidence”, I will just provide the percentiles and others may do with them as they will.

    First we need to bootstrap into given distributions using some data. So how about we start with the variation of temperature within given month’s? Better yet, let us make another decision to up the precision a little. We are going to only use data since about 1910…the point at which the Berkeley folks have decided to be very “confident” in their data aggregations.
    This figure shows the point at which the Berkeley folks have become pretty confident about their measured temperatures for Germany. I will only use their temperature data by month since about 1910. We will start with this graph.

    [​IMG]



    An obvious place to start, testing monthly variability, let us keep all January temperatures in January, and June in June. So as an example, we gather up all the January temperature, and March, and all the rest.Here is how I will bootstrap into these distributions, Poor Debater can feel free to use all the other analytic methods his extensive science career undoubtedly has taught him.
    Here is a probability density function of all mean January temperature anomalies for a century. 102 values I believe.

    [​IMG]

    The Y axis is relative frequency of occurrence, and the X axis is the temperature anomaly. The red line is a best fit equation using the Akaiki information criterion. Now, before our internet paper quoting expert who doesn't read or even think about what he quotes mentions that there are certainly other ways to fit this data than AIC, I will say…sure! Climate scientists might pick something at random and move along, but I was never a climate scientist so I didn't collect that bad habit. Someone else might point out that maybe..just MAYBE…if I were one of these loosey goosey academics who wants these kinds of distributions to be normal, and maybe I cherry picked my fit equation, I could force such a thing on this obviously non-normal distribution? We can test that too! So I tried Bayesian information criterion. No go. Chi-Square? Skunked. Anderson-Darling (a personal favorite)? Shot down again. Kolmogorov-Smirnov? No…drats!! Poor Debater can demonstrate to everyone all his favorites, but I’m done trying to pretend that this can be best fit with a normal distribution.

    The best distribution fit for above can generally be thought of as a minimum extreme, a specific form of this general form:

    http://en.wikipedia.org/wiki/Generalized_extreme_value_distribution

    Poor debater in his internet expertness can help anyone with the math involved.

    Some folks might be inclined to throw out data to force this to be something else, it looks like we’ve got 3% of the data that likes being cold, so being honest scientists we are going to just leave it in and see if it matters much later. However, I did test the distribution to see if it changed with the removal of those outliers, so I excluded them, reran the fit, and it then turns out that the best fit was triangular!! Amusing to those who speak the language, certainly. Normal was no closer of an answer afterwards as before. Again…scientists test stuff, and see what happens, as opposed to assuming this or that. Just a point for those who haven’t done any but want to pretend in the future without looking foolish.

    Even more interesting is the spread on this distribution. You see, the confidence interval provided over large periods of time by Berkeley obviously includes the month of January, and our distribution ranges across like 13C!! The difference is that I’m going to show you how to put the empirical distributions around the central tendency (THEIR central tendency I might add) and then we can compare the two, as any well trained scientist might.

    So you can’t just stop with ONE month obviously, so here are the box whisker plots from the best fit distributions for all months. Turns out, 3 of them are normal best fits. 25% in this data set can be best fit with a normal distribution. This will cause us issues within peer reviewed papers later. The following graph is the same system and same techniques being applied to all monthly German data from about 1910 to present. Notice that winter has a wider range, summer a lower range. The probabilities of any given occurrence in any given month are explained by the legend on the right.

    [​IMG]


    Now let us not forget..this isn't the uncertainty around a single data point, or data in a day, or data in a month at any single location, THIS IS THE UNCERTAINTY AROUND A SINGLE POINT FOR THE ENTIRE COUNTRY. Inside each and every one of these box whisker plots are all the OTHER uncertainties related to instrumentation, correlation, human error, methods of aggregation, you name it. We the scientists can’t be confusing folks with this distribution nonsense, you might not understand any better than Poor Debater does and then where would the world be, if our sales minions are forced to explain how small movements within a distribution as wide as these? The world would confuse our scientific answer with...us not knowing!?

    So now let us do one more thing in our testing of the original data. Let us take the mean answer as provided by the Berkeley folks, and allow our boot strapped natural variability to give us a confidence interval of 90% (because of the following box whisker metrics it is just easier than calculating a 95%). We’ll begin the series right where the Berkeley folks became very certain (except for the uncertainties they are still working on…like these perhaps?).

    Holy Kowwa Batman!

    [​IMG]


    So the black line is the mean, and then all I did was go +/- on the provided central tendency to the F5 high on each individual monthly distribution, and the F95 low. But I PINNED it to the given mean, this means that if their given mean was already drifting upwards, the distributions are drifting upwards as well.

    Notice how the extreme confidence supplied with the data apparently has nothing to do with incorporating actual ranges of natural variability. Confidence intervals can be rigged for a variety of metrics, around a percentile value, around a mean, to signify "confidence" in some statistical test of data, or even a mean itself. But that isn't explained on the data page with the info either. Cross your fingers and move on, nothing to see here….except the instant you bring monthly variability into the equation it absolutely swamps the signal that people want to make sure you see, rather than the range of natural variability it sits within.

    For those with an interest, the “confidence interval” supplied covers only a small percentage of the total natural variability within this data, with a 0.5C “confidence interval”, indicating that their “confidence” only happens 5-15% of the time based on natural variability....AND THIS FROM NOTHING MORE THAN A SINGLE FUNDAMENTAL UNCERTAINTY.

    Next up, an individual station in the US.
     
  25. contrails

    contrails Active Member

    Joined:
    Apr 18, 2014
    Messages:
    4,454
    Likes Received:
    24
    Trophy Points:
    38
    Evans, 2006: Measurements of the Radiative Surface Forcing of Climate
     

Share This Page