Exam Creation

After all of the items have been collected and reviewed, the next stage of development is to actually assemble the items into exams and deploy the exams globally.

Live Form Creation

First, we select items covering each objective and place these on complete test forms. There will be two forms (versions) of each test. Each form will consist of approximately 60-100 items, chosen from all of the test objectives. When a candidate takes the exam, he or she will receive one of the forms. If they should fail that exam and go back to retake it, they will receive the second form. (If they fail again, they will get the first form on their third attempt.)

Note that the VUE test engine randomly orders the questions of each form when someone takes the exam. If two candidates sat down next to each other at a VUE test center and both wound up with form 1, the order of the questions would still be random and so they would not view the questions in the same sequence.

Initial Exam Publishing

Once the composition of the forms has been determined by LPI psychometric staff, the exam must be converted from text-based items into the actual exam file format that can be deployed globally through LPI's network of testing centers.

The exam now enters a period of initial testing where the end goal is to determine if the questions are in fact performing correctly and measuring the skills and competencies they are intended to measure. Within the testing industry, this period is often referred to as the initial, pilot, or research stage of testing. However, within the IT certification industry, this period has become known as the beta testing period.

During this time, candidates are able to register for these tests and complete them at local testing centers. They do receive credit for taking the exam. The major difference between the original beta exam and the final version is that the candidates do not receive their scores back immediately after the exam. For the second release of Level 1 (August 2002) a new form of beta exam was used, a seeded beta. This type of beta allows a candidate to be scored on a limited number of items which have completed a prior beta evaluation. New beta items are seeded into the scored exam forms to collect data on their performance (Note: in this latter instance exams may have more than the 60 -100 questions noted above. However, additional time is usually allotted for the candidate to complete these additional test questions and the results of these questions do not in any way effect the candidate's test score results).

But before any scores can be set on a full exam containing new items, the cut-score needs to be set. This, in and of itself, is a complex process. There are several simultaneous processes that are going on at this time.

Obtaining enough exams

The first requirement for setting the cut-score is to obtain an adequate number of exam results. As Linux certification programs were very new, the target for the original Level 1 was to have at least 100 result sets for each form of an exam. So, given that we use 2 forms per exam, we needed at least 200 exams to be taken for both 101 and 102. We publicized an incentive program, offered discounts, and also used the test center at Linux Business Expo (Spring 2000) to obtain the necessary exams. As our support has grown, our target data numbers are considerably greater , helping to generate the most accurate results.

As part of the beta exam process, we also collected demographics about the people taking the exams. (How long had they worked with Linux? Did they do system administration on a daily basis? How much had they prepared?) Ideally, we are looking to have a significant number of the exams taken by people who are similar to the target job description. These demographics are taken into account by psychometric staff when they are reviewing the questions.

Reviewing the Questions

As the results are coming back, psychometric staff start to examine the data. Are there questions that everyone gets correct? Are there questions that everyone fails? (Both situations are indicators that something might be wrong with the question.) What are the comments being submitted by exam-takers?

We did put in a mechanism where people taking the exam could provide comments, and, as you might expect from a community of people with strong opinions (and often the ability to type fast!), we got plenty of comments! Volumes of comments, in fact!

So part of the work at this stage was sifting through all the comments and addressing questions and concerns raised. Despite the lengthy and comprehensive review process, there were some technical errors that did escape the review and were out on the beta exam. There were a few questions that needed to be thrown out. Many of these errors were found through comments submitted by exam candidates.

Modified-Angoff Study

While the psychometric staff was reviewing the incoming data, a separate pool of subject-matter experts (SME's) were simultaneously participating in what is called a Modified-Angoff study. The goal here is to provide the psychometric staff with additional performance data to validate the questions and also to assist in setting the cut-score.

The process is basically as follows:

  • The SME's receive a copy of the exam questions on each form.
  • The SME's look at each question (independently and in consultation with each other) and make a judgment about how likely a minimally qualified person meeting the job requirements described in the spec sheet would be able to answer the question correctly. That is, the SME's are asked to consider the question from the perspective of someone who is at the bottom of the competence scale for job performance.
  • The SME's rate each question with their estimate of what percentage of people will get this correct, keeping in mind that on multiple-choice questions, some people will get it right by virtue of guessing.

The data from these exercises is then used as follows: Let's say that there is a question that the SME's judge to be tough and they estimate that candidates may get it right 30% of the time--if the exam data comes back showing that 90% of candidates are getting the question right, then the question needs to be examined to see if the answer is being given away (or perhaps the answer is being provided in another question on the exam). Conversely, if there is a question that the SME's think all candidates should know, they might rate it at 95%. If the exam data comes back showing that only 10% of the qualified people are getting it correct, the item needs to be examined to see if there is some problem with the way it is phrased or if there is some other issue.

Ideally, the results from the Angoff study should parallel to a certain degree the actual results from the exams taken during the beta period.

Beyond validating item performance, the results of the Angoff study are also used in helping to establish the cut-score for the exam. As an example, let's say the Angoff study came back saying that all the questions were very difficult and for a given form the average percentage rating was 30%. This information would suggest to the psychometricians that they need to set the cut-score lower, because the exam questions are that much tougher. The formal report of the Standard Setting Study for the Level 1 (LPIC-1) Exams is available at the end of this article.

Distributing the Score Results

After all of the data collection, the analysis and the Angoff study, the psychometric staff set a cut-score, and start to manually score the results of the exams taken during the beta period. Scores are then distributed by postal mail to all of the exam candidates.

Exam Re-Publishing

After all of the work in the beta period, the cut-score has been set, bad items have been removed or changed, and the exam is ready to be re-published. This involves the creation of an exam file that can be distributed through exam testing centers, and involves significant review and can take a month or more to complete.

eZ publish™ copyright © 1999-2009 eZ systems as