Showing posts with label measurement. Show all posts
Showing posts with label measurement. Show all posts

Sunday, August 03, 2025

Why Truth Is So Illusive? Braun- Blanquet Scale

[I found this draft post from July 2012.  It appears never to have been posted.  But it's interesting to see how what I wrote 12 years ago is still relevant today.  Probably more so. And posting it today - August 3 - is fitting as you will see if you read it.]

Getting the facts right - whether it's in an old sexual abuse case or an attempt to see how ground vegetation has changed over a period of time - is the first step.  Once the facts are established, then models - whether scientific theories, religious beliefs, or the unarticulated models of how the world works we carry in our heads - are applied.

For example, did your son lie to his teacher about his homework?  If the answer is yes, then you must go through various models you have about topics such as lying, education, changing the behavior of young boys and apply them to this situation to get the desired result.  It's not as easy as you might initially think.  You may have a clear value that lying is never good.  Or you may think there are times it is ok.  Do you think his teacher is wonderful and working hard to teach your son to be a great human being with all the necessary skills?  Or is he part of a corrupt educational system that expects all students in the class to be at exactly the same level at all times and finds fault with your son because he's brighter than most and bored in class, or slower than most and having trouble keeping up? Or do you think he is picking on your son because he's a different race from the teacher?  And finally, will you talk this over with your son?  Restrict his internet access for a week?  Or whup him with a belt to help him learn this lesson?  Or maybe you'll go to the school and defend your son and attack the teacher. 

Things get much more complicated when we deal with the collective problems of a community.  If king salmon aren't returning to their rivers in the numbers expected, how should state fish and game authorities deal with this?  First, is their method of counting salmon working right?  Perhaps the salmon are getting through without being counted?  Then, do you restrict subsistence fishers?  Which models do you use to explain the shortage?  Is it climate change which is affecting the water temperatures?  Is it overfishing by commercial ocean fishing vessels?  Is it that these salmon are being caught as by-catch by bottom trawlers?  And when you think you know, what model do you use to decide whether subsistence fishers are allowed to catch any?

All this is introduction to Josias Braun-Blanquet who in 1927 devised the Braun-Blanquet scale.  The Botany Dictionary tells us about the Braun-Blanquet scale.
A method of describing an area of vegetation . . . It is used to survey large areas very rapidly. Two scales are used. One consists of a plus sign and a series of numbers from 1 to 5 denoting both the numbers of species and the proportion of the area covered by that species, ranging from + (sparse and covering a small area) to 5 (covering more than 75% of the area). The second scale indicates how the species are grouped and ranges from Soc. 1 (growing singly) to Soc. 5 (growing in pure populations). The information is obtained by laying down adjacent quadrats of increasing size. One of a number of variations of Braun-Blanquet's method is the Domin scale, which is more accurate as there are more subdivisions of the original scale. The Braun-Blanquet scale also included a five-point scale to express the degree of presence of a plant. For example, 5 = constantly present in 80-100% of the areas; 1 = rare in 1-20% of the areas.
So, essentially, this is a measuring device to calculate the percentage of an area that is covered by different plant species.  Measuring is just the first step.  Once you have the measures, then you can apply your models. (OK, I know some of you will point out that you can't measure anything unless you have models that tell you what to measure.  True enough.  But once you have the measures - in this case of percentage of species of vegetation in a certain location - you have to interpret what that means using a model or several.)

But one problem is that the measurements might not be accurate or might not be used right. 

A 1978 Study in  Environmental Management found the Braun-Blanquet scale to be adequate and more efficient than another method of measuring species in an area.  Here's the abstract:
To document environmental impact predictions for land development, as required by United States government regulatory agencies, vegetation studies are conducted using a variety of methods. Density measurement (stem counts) is one method that is frequently used. However, density measurement of shrub and herbaceous vegetation is time-consuming and costly. As an alternative, the Braun-Blanquet cover-abundance scale was used to analyze vegetation in several ecological studies. Results from one of these studies show that the Braun-Blanquet method requires only one third to one fifth the field time required for the density method. Furthermore, cover-abundance ratings are better suited than density values to elucidate graphically species-environment relationships. For extensive surveys this method provides sufficiently accurate baseline data to allow environmental impact assessment as required by regulatory agencies.
 So, fifty years after Braun-Blanquet's scale went public, it was still being used.  And apparently it is still in use today.  And people are writing about some of the limitations of the model.

In Monitoring Nature Conservation in Cultural Habitats:: A Practical Guide and Case Studies, (2007) by Clive Hurford and Michael Schneider, the Braun-Blanquet scale is compared to the Domin scale and both are found to have two sources of error.  First, is the observer bias that could affect the initial estimate of the percentage of species coverage that is then used to identify the appropriate cover class.  The second problem arises when the vegetation is at or near a vegetation boundary.  This is, apparently, more of a problem in the Domin scale. (p. 82)

And a February 2009 (online) article in Journal of Vegetation Science warns that the Braun-Blanquet abudance-dominance scale cannot be used with conventional multivariate analysis techniques because the Braun-Blanquet scores use ordinal numbers. 

I bring this up for a couple of reasons.  First, today, August 3, is Braun-Blanquet's birthday.  He was born in Switzerland in 1884 and died in France at 96 in 1980.  Second, and probably of more general importance, has to do with science and truth.

We are at a time when science is under severe attack by a combined force of right wing politicians and fundamentalist religious groups.   They pounce on what they call scientific errors and publicize them to 'prove' science isn't trustworthy.  The emails about global warming data is a good example. 

Now, there are scientists who for various reasons (fame, money, revenge, you know the usual human failings that lead to compromises) do cheat.  But the beauty of science is that one's work must be made public and when others try to duplicate your work and can't, then your work becomes suspect. 

But the pursuit of truth is and will always be imperfect.  Data collection and interpretation will always be dependent on the ability to observe and measure and interpret.  And the Braun-Blanquet scale shows, in a small way, that even a technique that's been around over 70 years, is not perfect.  But in science no one holds all the cards, no one proclaims truth for everyone else to accept. 

Scientific truth is always being tested and challenged.  That's its strength, but absolutists see it as a weakness. 

DePaul University Professor of Environmental Science and co-director of DePaul University's Institute for Nature and Culture, has an interesting story about a project  to rid the oak woodlands of Rhododendron ponticum, an invasive shrub that was encroaching in the understory of this habitat in Killarney National Park in Ireland.  It talks about the use of the Braun-Blanquet scale.  It's posted at his blog Ten Things Wrong with Environmental Thinking.

Monday, February 24, 2025

Civil Service - Who Are These People ET Are Firing? - Part II

INTRO:  Part I is here.  

If you find this topic dry and hard to get your head around, then you are half way there.  Because some of the most important things to know about government are dry and hard to get one's head around.  And that makes it easy for politicians to bamboozle voters with falsehoods and misinformation.  

So if you want to understand why ET's firing of civil servants (most of government employees) is a violation of law and various regulations, you'll have to buck up and read carefully.  Even take notes.  

This content is based on testimony I gave in a local discrimination case.  So I had to pare it down to as simple an explanation as possible so that I didn't lose the jury.  The attorney was nervous that his expert would talk over their heads, but when I was done he was relieved that I'd made it very easy to understand.  And the jury said the local government was guilty.

So good luck.   [I explained ET in the Intro to Part I, but it's not critical.]



From a February 19, 2018 post: 

Graham v MOA #9: Exams 2 - Can You Explain These Terms: Merit Principles, Validity, And Reliability?

The Municipality of Anchorage (MOA) Charter [the city's constitution] at Section 5.06(c) mandates the Anchorage Assembly to adopt
“Personnel policy and rules preserving the merit principle of employment.”   AMC 3.30.041 and 3.30.044 explain examination types, content, and procedures consistent with these merit principles.  
Âs defined in the Anchorage Municipal Code Personnel Policies and Rules,
“Examination means objective evaluation of skills, experience, education and other characteristics demonstrating the ability of a person to perform the duties required of a class or position.” (AMC 3.30.005)
[OK, before I lose most of my readers, let me just say, this is important stuff to know to understand why the next posts will look so closely at the engineer test that Jeff Graham did not pass.  But it's also important to understand one of the fundamental principles underlying government in the United States (and other nations.)  And I'd add that the concepts behind merit principles are applied in most large private organizations to some extent, though they may have different names. 

Jeff Graham's attorney made me boil this down to the most basic points to increase the likelihood I wouldn't put the jury to sleep.  So bear with me and keep reading. 

And, you can see an annotated index of all the posts at the Graham v MOA tab above or just link here.]  


Basic Parts of Government In The United States

Governments can be broken down into several parts.
  • The elected politicians who pass the laws and set the broad policy directions (legislature)
  • The elected executive who carries out the laws.
  • The administration is led by the elected executive - the president, the governor at the state level, and the mayor at the city level.
  • Civil Service refers to the career government workers who actually carry out the policies.  There are also appointed officials at the highest levels who are exempt from some or all of the civil service rules.

Merit principles are the guidelines for how the career civil servants are governed.  

So What Are Merit Principles?

Probably the most basic, as related to this case, are:
  • Employees are chosen solely based on their skills, knowledge, and abilities (SKAs) that are directly related to their performance of the job. 
  • The purpose of this is to make government as as effective and efficient as possible by hiring people based on their job related qualities and nothing else.  
  • That also means other factors - political affiliation, race, color, nationality, marital status, age, and disability should not be considered in hiring or promotion.  It also means that arbitrary actions and personal favoritism should not be involved
  • Selection and promotion criteria should be as objective as possible.   


So Steve, what you're saying, this sounds obvious.  What else could there be?

Before the merit system was the Spoils System.  Before merit principles were imposed on government organizations, jobs (the spoils) were given to the victors (winning politicians and their supporters)   The intent of the Merit System is to hire the most qualified candidates.

In 1881, President Garfield was assassinated by a disgruntled job seeker, which spurred Congress to set up the first version of the federal civil service system - The Pendleton Act.

Only a small number of federal positions were covered by this new civil service act, but over the years more and more positions were covered and the procedures improved with improvements in the technology of testing.  The merit system, like any system can be abused, but it's far better than the spoils system.  Objective testing is a big part of applying merit principles.


What does 'objective criteria' mean? 

Objectivity has a couple common and overlapping meanings:
  • Grounded on facts.  Grounding your understanding or belief on something concrete, tangible.  Something measurable that different people could 'see' and agree on.
  • Unbiased.  A second, implied meaning from the first, is that you make decisions neutrally, as free as you can be from bias, preconceived ideas.  That’s not easy for most people to do, but there are ways to do it better. 


What Ways Can Make  Tests More Objective And Free Of Bias?

I think of objectivity as being on one end of a continuum and subjectivity being on the other end.  No decision is completely objective or subjective, nor should it be.  But generally, the more towards the objective side, the harder it is to introduce personal biases.* 

objective ...............................................................................................subjective



First Let's Define "Test"

In selection and promotion, we have tests. Test is defined as any thing used to weed out candidates, or rank candidates from poor to good.  So even an application form can be a test if it would lead to someone being cut out of the candidate pool.  Say candidates are required to have a college degree and someone doesn’t list one on an application.  They would be eliminated already.  

Again,  how do you make tests more objective?

There are two key terms we need to know:  validity and reliability.

What’s Validity?

Validity means that if a person scores higher on a test, we can expect that person to perform better on the specific job.  
Or saying it another way, the test has to truly test for what is necessary for the job.  So, if candidates without a college degree can do the job as well as candidates with a degree, then using college degree to screen out candidates is NOT valid.  

And what is reliability?

Reliability means that if  a person takes the same test at different times or different places, or with different graders, the person should get a very similar result.  Each test situation needs to have the same conditions, whether you take the test on Monday or on Wednesday, in LA or Anchorage, with Mr. X or Miss Y administering and/or grading the test.  

How Validity and Reliability Relate To Each Other

To be valid, the selection or promotion test must be a good predictor of success on the job. People who score high on the exam, should perform the job better than those who score low.  And people who score low should perform worse on the job than people who score high. 

BUT, even if the test is intrinsically valid, the way it is administered could invalidate it.  If the test is not also reliable (testing and grading is consistent enough that different test takers will get a very similar score regardless of when or where they take the test and regardless of who scores the test) the test will no longer be valid.  This is because the scores will no longer be good predictors of who will do well on the job. 

How do you go about testing for validity and reliability?
This can get complicated, especially for  factors that are not easy to measure.  I didn't go into this during the trial.  I wanted to point out some pages in a national Fire Safety Instructor Training Manual used by the Municipality of Anchorage, but I was not allowed to mention it.  It talks about different levels of validity and how to test for them.  It also says that for 'high stakes' tests, like promotion tests, experts should be hired to validate the test.  The jury didn't get to hear about this. But it's relevant because as I wrote in an earlier post, the people in charge of testing, and specifically in charge of the engineer exam, only had Level I certification, which allows them to administer training and testing designed by someone with Level II certification.  It's at Level II that validity and reliability are covered.  

There really wasn't need to get detailed in the trial, because the oral exam was so egregiously invalid and unreliable that you could just look at it and see the problems.  And we'll do that in the next posts. 

That should be enough but for people who want to know more about this, I'll give a bit more below.

-----------------------------------------------------------------------
Extra Credit

*"the harder it is to introduce bias"  There are always ways that bias can be introduced, from unconscious bias to intentionally thwarting the system.   When civil service was introduced in the United States, there was 'common understanding' that women were not qualified for most jobs.  That was a form of bias.  Blacks were also assumed to be unqualified for most jobs.  Over the years many of these sorts of cultural barriers have been taken down.  But people have found other ways to surreptitiously obstruct barriers.  

Merit Principles

If you want to know more about merit principles I'd refer you to the Merit System Protection Board that was set up as part of the Merit System Reform Act of 1978.  

A little more about reliability problems (because these are important to understand about the engineer promotion exam)

In the main part of this post I wrote that all the important (could affect the score) conditions of the test need to be the same no matter where or when or with whom a candidate takes the test.  Here are some more details
  • Location - If one location is less comfortable - temperature, noise, furniture, lighting, whatever - it could skew the scores of test takers there.
  • Time -  could be a problem in different ways.  
    • All candidates must have the same amount of time to take the test.  
  • Instructions - all instructions have to be identical
  • Security of the test questions - if some applicants know the questions in advance and others do not, the test is not reliable.

The scoring, too, has to be consistent from grader to grader for each applicant. 

And there are numerous ways that scoring a test can go wrong.
  • Grader bias  - conscious and unconscious.   Raters who know the candidates may rate them differently than people who don’t know them at all. 
    • The Halo effect means if you have a positive view of the candidate, you’re likely to give him or her more slack.  You think, 'I know they know this.' 
    • The Horn or Devil Effect is the opposite - If you already have a negative opinion about a candidate, you consciously or unconsciously give that a candidate less credit.  These are well documented biases.
    • Testing order bias affects graders and candidates.  
      • After three poor candidates, a mediocre candidate may look good to graders.  
  • Grading Standards - Is the grading scale clear and of a kind that the graders are familiar with?
    • Are the expected answers and how to score them clear to the graders?
    • Do the graders have enough time to calculate the scores consistently?
  • Grader Training -
    •  If they aren't well trained, it could take a while to figure out how to use their scoring techniques, so they score different at the end from the beginning. 

How Do You Overcome the Biases In More Subjective Tests Like Essays, Interviews, and Oral Exams?

Despite the popularity of job interviews, experts agree that they are among the most biased and result in the least accurate predictions of candidate job performane.  Or see this link.

You have to construct standardized, objective rubrics and grading scales - this is critical, particularly for essay and oral exams.

On November 9, 2016 when the electoral college vote totals were tallied, everyone saw the same facts, the same results.  But half the country thought the numbers were good and half thought they were bad.

When evaluating the facts of a job or promotion candidate, the organization has to agree, before hand, what ‘good’ facts look like and what ‘bad’ facts look like. Good ones are valid ones - they are accurate predictors of who is more likely to be successful in the position.   Good and bad are determined by the test maker, not by the graders.  The graders merely test whether the performance matches the pre-determined standard of a good performance.



What’s a rubric?

It’s where you describe in as much detail as possible what a good answer looks like.  If you’re looking at content, you identify the key ideas in the answer, and possibly how many points a candidate should get if they mention each of those ideas.  It has to be as objective as possible. The Fire Safety Instructor Training Manual has some examples, but even those aren't as strong as they could be. 

Good rubrics take a lot of thought - but it's thought that helps you clarify and communicate what a good answer means so that different graders give the same answer the same score.

Here are some examples: 
UC Berkeley Graduate Student Instructors Training
Society For Human Resource Management - This example doesn't explicitly tell graders what the scores (1,2, 3, 4, 5) look like, as the previous one does.
BARS - Behaviorally Anchored Rating Scales - This is an article on using BARS to grade Structured Interviews.  Look particularly at Appendices A & B. 
How Olympic Ice Skating is Scored - I couldn't find an actual scoring sheet, but this gives an overall explanation of the process.

My experience is that good rubrics force graders to ground their scores on something concrete, but they can also miss interesting and unexpected things.  It's useful for graders to score each candidate independently, and then discuss why they gave the scores they did - particularly those whose scores vary from most of the scores.  Individual graders may know more about the topic which gives their scores more value.  Or may not have paid close attention.   Ultimately, it comes down to an individual making a judgment.  Otherwise we could just let machines grade.  But the more precise the scoring rubric, the easier it is to detect bias in the graders. 


Accountability

Q:  What if a candidate thinks she got the answer right on a question, but it was scored wrong?

Everything in the test has to be documented.  Candidates should be able to see what questions they missed and how they were scored.  If the test key had an error, they should be able to challenge it. 

Q:  Are you saying everything needs to be documented?

If there is going to be any accountability each candidate’s test and each grader’s score sheets must be maintained so that if there are questions about whether a test was graded correctly and consistently from candidate to candidate, it can be checked. 

In the case of an oral exam or interview, at least an audio (if not video) record should be kept so that reviewers can see what was actually said at the time by the candidate and the graders. 

Q:  Have you strayed a bit from the Merit Principles?

Not at all. This all goes back to the key Merit Principle - selecting and promoting the most qualified candidates for the job.  There won’t be 100% accuracy. But in general, if the test is valid,  a high score will correlate with a high job performance.  But unless the test is also reliable, it won’t be valid. The more reliable the test, the more consistent the scores will be under different conditions and graders.  The best way to make tests more reliable is to make them as objective as possible.