Understanding the merit system and how to measure people's skills for a job are important to understanding why this case is important. So I beg your indulgence here. I've tried to make this pretty easy to digest.
Briefly, before the Merit System (and principles) governments were run on the spoils system - you got a government job if you helped get a candidate elected. Loyalty, not skill and public spirit, were the key job qualification. Merit principles don't guarantee fair hiring and promotion, but they go a long way in that direction.
Below is excerpted from the expert witness report I wrote up October 2016. The judge did not allow the plaintiff to use the MOA (Municipality of Anchorage) charter because he said the promotion process is part of the collective bargaining agreement. Collective bargaining agreements are approved (on the MOA side) by the Assembly. But the Charter can only be amended by a vote of the people of Anchorage. So I still don't understand how the contract would trump the Charter. But I'm not a lawyer.
Here's from my report:
Background and Purpose of Merit Principles and Systems
Race and age bias, in the context of promotion of a public employee, is an important issue. But the bigger issue is merit system principles. Modern human resources departments in reasonably sized organizations both public and private use what is known as a merit system. The system stems from the 19th Century when governmental structures were evolving from feudal systems based on loyalty to the ruler, to more modern ones, based on rationality. Scholar Max Weber noted that a new form of organization was emerging which he called a bureaucracy, that was based on rational rules rather than the arbitrary decisions of a ruler.
The point was that organizations that hired people based on their ability to do specific jobs and not on their relationship and loyalty to the ruler were more effective, more efficient, and more permanent.
These ideas of selecting the best person for the job were also promoted in the factory in the early 20th Century by Frederick Taylor and his idea of Scientific Management became widely adopted. Over time, the ideas underlying Weber and Taylor - the idea that rational, scientific analysis can be applied to management - took hold in private companies. Applicants would be evaluated by their qualifications to hold their jobs, though personal connections and other biases still were a factor.
In government the change took place both on the federal level and state and local levels. In 1883, the Pendleton Act established the US Civil Service after President Garfield was shot by a disgruntled job seeker. It only applied to a small percentage of jobs at first, but over the years, it has come to cover most federal positions in the career civil service.
On the local level, reformists pushed for merit systems as a way to combat the big city political machines like Tammany Hall that recruited immigrants into their political party with promises of government jobs, as they arrived in the US from Europe.
So, both in government and in the private sector these ideas of rational rules to develop a competent workforce took hold. But the biases of the times often got written into the rules. Job tests were written so that new immigrants wouldn’t pass. Women were assumed ineligible for most jobs and fired from those they could take - like teacher - if they got married. The societal structure which kept people of color in segregated housing, deficient schools, in poverty, prevented most people of color from getting the needed qualifications, or even from knowing about job openings. And overt racism prevented those who could qualify from being hired in most cases.
The civil rights movement changed that. Brown v. Board of Education struck down segregated schools. This was supposed to lead to African-Americans (particularly) getting better high school educations, then into universities, and then into good jobs. But many communities opposed busing and set up all-white private schools, leaving the public schools for African-Americans and the poor.
The Voting Rights Act was intended to prevent laws that kept Blacks from voting. Griggs v. Duke Power was a groundbreaking case in terms of job discrimination. Black workers traditionally got the lowest level jobs and got paid less than white workers. When they were required to put in tests for employees seeking supervisory positions, Duke Power created tests that were unrelated to the position and intended to keep blacks from passing. The Supreme Court struck this down saying that the tests for the jobs had to be related to the work that would be done. They also said that the plaintiffs didn’t have to prove intentional discrimination, only that the test had a disparate impact on the minority candidates.
The merit system was an outgrowth of science being applied to management to ensure more qualified employees got hired. Businesses developed measures that focused on someone’s ability to successfully do the job. The civil rights movement fit perfectly into this theoretical ideal. Job requirements should focus on qualifications, not race or gender. Griggs v. Duke Power drew back the curtain on the hidden biases that were blocking access to better employment for women and minorities.
Today we’ve come a long way, but we are still a society that sees minority actors in movie roles as criminals or maids or chauffeurs much more than as doctors or lawyers or accountants. Many people still cringe at the idea of their daughter marrying someone of a different race or religion. When those feelings spill over into the workplace, into hiring, it’s illegal discrimination.
Unconscious racial bias perpetuates discrimination through assumptions about people based on their race or other characteristics. Conscious bias attempts to set up barriers that seem legitimate, but are actually intended to keep out undesired applicants.
The merit system is one of the best ways to thwart discrimination so that the most qualified candidates, not the most ‘like us’ candidates, get hired. It’s the best antidote we have to cronyism, racism, and other forms of discrimination in hiring and promoting employees.
Merit Principles and Systems at Municipality of Anchorage
The MOA Charter at Section 5.06(c) mandates the Anchorage Assembly to adopt “Personnel policy and rules preserving the merit principle of employment.” AMC 3.30.041 and 3.30.044 explain examination types, content, and procedures consistent with these merit principles.
Âs defined in the Anchorage Municipal Code Personnel Policies and Rules, “Examination means objective evaluation of skills, experience, education and other characteristics demonstrating the ability of a person to perform the duties required of a class or position.” (AMC 3.30.005)
According to the Firefighters collective bargaining agreement, the conduct and administration of the Anchorage Fire Department, including selection and promotion of employees, are retained by the Municipality. (IAFF-MOA CBA Section 3.1)
Application of Merit Principles to Making And Evaluating Objective Examinations
In practice, the term merit principles means using procedures that ensure that decisions are made rationally to select and promote those people who are most suited for a job. They mean that organizations do their best to identify the factors that best predict which applicant is most likely to succeed in the position. Factors that are irrelevant to someone’s success on the job should not be part of the process.
A test (or examination as used by the MOA) is any process used to evaluate an applicant’s suitability for a position. An application form can be thought of as a test to the extent that information is used to distinguish between applicants who qualify and those who do not. A written exam, a practical exam, an interview are all tests when it comes to activities like selection and promotion.
Two basic factors are important when evaluating tests used in personnel decisions. First, is the test valid? Second, is the test reliable?
Validity means that the test, in fact, tests what it is supposed to test. In employment that generally means it is useful in separating those applicants most likely to do well in the position from those less likely to do well. For example, if a college degree is required for a position, but those without college degrees do was well as those with a degree, then that is not a valid factor to consider, because it doesn’t predict success on the job. It is common to give applicants a written or practical test or an interview. These are scored and applicants with higher scores are selected over people with lower scores.
Such tests are valid only if it is true that people with higher scores are more likely to be successful in the position than those with lower scores. That is, people with higher scores are more likely to do well AND people with lower scores are more likely to do poorly. If that is not the case, the test is not valid.
Employment tests can be validated by checking scores against actual performance of employees, though this does require selecting employees with low scores as well as with high scores to determine if the lower scoring employees really do perform poorly compared to the higher scoring employees. This can be expensive and many organizations use ‘common sense.’ But common sense may not be accurate and if an employer is accused of discrimination, they will have to defend the validity of the test.
For rare, specialized positions, validation is difficult to do. For common positions that are similar across the nation, such as fire fighters, there are often companies that prepare, validate, and sell, and even administer employment tests.
Reliability means that the way a test is administered is consistent. The same applicant, taking the test at different times or locations or with different testers, would have basically the same result every time. When people take the college entrance exams, for instance, the conditions are standardized. No matter where someone takes the test, they get exactly the same instructions, the physical conditions of the test room are within certain parameters (desk size, temperature, noise level, etc.) and they all have exactly the same amount of time to complete the exam. The scoring of the exams is also the same for everyone.
To ensure reliability of the test taking, all conditions that could affect the outcome must be the same. To ensure reliability of scoring, the way points are calculated must be as objective and measurable as possible. Often tests are designed with scales that help a rater know how to give points or how to put applicants in the correct category.
At the most basic level you might just have a scale of 1 - 5 for instance, with ‘good’ at one end and ‘poor’ at the other end. But how does the rater determine what’s good or bad?
Better would be to have a more objective descriptor such as “successfully completed task with no errors” on one end and “failed to complete the task” at the other end. Even better would be to have descriptors for each point on the scale. The more that the descriptor describes an actual objectively testable level of achievement, the more likely it is that different raters would come up with the same score. For example, ‘meets expectations’ is not as objective as “accomplished the task within 2 minutes with no errors that compromised the outcome.”
Basically, the greater the objectivity of the scoring system, the greater the likelihood of reliability, because there is a clear standard attached to each number in the scale. And with a more objective system, discrepancies can be more easily spotted. A biased evaluator has a harder job to select favored applicants or disqualify disfavored candidates. Also, a candidate who was graded unfairly has a better chance of challenging the score.
Another way to increase reliability is to train evaluators on how to use the scoring system. It is also helpful to have raters who do not have personal relationships with the applicants.
Given the need for validity and reliability, interviews, while frequently used, have been found to be prone to many biases unrelated to the job. There are ways to improve the validity and reliability of interviews. The questions asked must be clearly tied to ability to be successful in the position, recognizing that being able to perform a task is not the same as being able to describe how one would perform a task. If personality and speaking ability are not being tested, then interviews can become treacherous employment tests for the applicant and for the employer. The more subjective a test and the rating system, the easier it is to bias the outcome, whether unintentionally or intentionally.
Since proving intent to discriminate requires overhearing private conversations or emails, this is an impossible hurdle for most applicants. The courts have recognized this and have allowed ‘impact’ to be used in lieu of intent. But employment tests can often give us evidence of intent if they are subjective and there is little or no validity or reliability.
Conclusion
I have seen no materials that offer any information on the validity or reliability of the tests used in the engineer promotional examinations which Jeff Graham has taken. The exam score sheets I have seen lack rigorous descriptors for raters (or proctors) to calculate scores for applicants and appear extremely subjective. The materials I’ve seen that were used to train the raters were lacking in detail and substance.
Without evidence to show the exams are valid and reliable, one must assume that the exams do not comply with the Municipality’s mandate to follow merit principles. [Such proof of validation had been requested from but not provide by the MOA.] The point of merit systems is to identify the most qualified candidates for each position and to prevent the introduction of personal biases into their scoring of candidates. The tests themselves may or may not be discriminatory. But when they are subjective as the oral board/peer reviews are, biases of the raters are easily introduced into the scoring of candidates. The type of bias could be racial, sexual, age based, or personal depending on the rater.
It is my understanding that MOA has not produced all requested materials and that depositions still remain to be done in this case. I therefore reserve the right, should additional materials and information become available, to modify or supplement this report.
Because merit principles were ruled out as the measure the jury would use to evaluate the case, this report was not introduced in court or given to the jury. However, I was allowed to testify on merit principles in general, but not allowed to relate them to the facts of the case, or even to the MOA.
I was also not allowed to refer to the Fire Safety Instructor Training Manual that the MOA uses which talks about validity in some detail and also talks about 'high stakes' tests - like a promotion test - needing to be professionally prepared and validated.
I was allowed to talk about, again in general terms and not relating what I said to the AFD exams, subjectivity and objectivity. I acknowledged there is no such thing as 100% objective or subjective, but that there is a continuum from some theoretical total subjectivity to theoretical total objectivity. The goal of test makers is to have tests as far to the objective side of the continuum as possible. The more subjective a test, the easier it is to introduce bias, conscious or unconscious.