Chapter 10. Embedded system development

Table of Contents
10.1. Critical systems
10.1.1. System dependability Availability and reliability
10.1.2. Safety
10.1.3. Security
10.2. Critical systems development
10.2.1. Fault tolerance Fault detection and damage assessment Fault recovery and repair
10.3. Real-time software design
10.3.1. System design Real-time system modelling
10.3.2. Real-time operating systems Process management
10.4. Exercises

An embedded system is a computer system with a dedicated function within a larger mechanical or electrical system that serves a more general purpose, often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts.

Embedded systems are designed to do some specific task, rather than be a general-purpose computer for multiple tasks. Some also have real-time performance constraints that must be met, for reasons such as safety and usability; others may have low or no performance requirements, allowing the system hardware to be simplified to reduce costs. Since the embedded system is dedicated to specific tasks, design engineers can optimize it to reduce the size and cost of the product and increase the reliability and performance.

The processors used in embedded systems may be types ranging from rather general purpose to very specialized in certain class of computations, or even custom designed for the application at hand. A common standard class of dedicated processors is the digital signal processor (DSP).

The program instructions written for embedded systems are referred to as firmware, and are stored mainly in read-only memory or Flash memory chips. They run with limited computer hardware resources: little memory, small or non-existent keyboard or screen. As with other software, embedded system designers use compilers, assemblers, and debuggers to develop embedded system software.

The embedded system interacts directly with hardware devices and mostly must respond, in real time, to events from the system’s environment. In the real-time systems the embedded real-time software must react to events generated by the hardware and issue control signals in response to these events.

Embedded systems control many devices in common use today. They are commonly found in consumer, cooking, industrial, automotive, medical, commercial and military applications. Physically, embedded systems range from portable devices such as digital watches and MP3 players, to large stationary installations like traffic lights, factory controllers and largely complex systems like hybrid vehicles, MRI. Telecommunications systems employ numerous embedded systems from telephone to cell phones. Many household appliances, such as microwave ovens, washing machines and dishwashers, include embedded systems to provide flexibility, efficiency and features. Transportation systems from flight to automobiles increasingly use embedded systems.

Computers are used to control a wide range of systems from simple domestic machines to entire manufacturing plants. These computers interact directly with hardware devices. The software in these systems is embedded real-time software that must react to events generated by the hardware from the environment of system and issue control signals in response to these events.

Software failures are relatively usual. In most cases, these failures cause inconvenience but no serious, long-term damage. However, in some systems failure can result in significant economic losses, physical damage or threats to human life. These systems are called critical systems. Critical systems are technical or socio-technical systems that people or businesses depend on. If these systems fail to deliver their services as expected then serious problems and significant losses may result. Modern electronic systems increasingly make use of embedded computer systems to add functionality, increase flexibility, controllability and performance. However, the increased use of embedded software to control systems brings with it certain risks. This is especially significant in safety critical systems where human safety is dependent upon the correct operation of the system.

The objective of this chapter is to introduce main characteristics of critical systems and the implementation techniques that are used in the development of critical and real-time systems.

10.1.  Critical systems

There are three main types of critical systems [ 1 ]:

  1. Safety-critical systems. A system whose failure may result in injury, loss of life or serious environmental damage.

  2. Mission-critical systems. A system whose failure may result in the failure of some goal-directed activity.

  3. Business-critical systems. A system whose failure may result in very high costs for the business using that system.

The most important property of a critical system is its dependability. The term dependability covers the related systems attributes such as availability, reliability, safety and security.

There are three system components where critical systems failures may occur:

  1. System hardware components may fail because of mistakes in their design or manufacturing errors.

  2. System software may fail due to mistakes in its specification, design or implementation.

  3. Human operators of the system may fail to operate the system correctly.

Because of the high cost of critical systems failure, trusted methods and well-known techniques must be used for development of these systems. Most critical systems are socio-technical systems where people monitor and control the operation of computer-based systems. Operators in these systems must successfully treat unexpected situations and cope with additional workload. However, this may cause more stress and so on mistakes.

10.1.1.  System dependability

The dependability is a property of systems. A dependable computer system provides a trustworthy operation to users. This means that system is expected to not fail in normal use. There are four principle attributes to dependability:

  1. Availability. The availability of a system is the probability that the system can provide the services requested by users at any time.

  2. Reliability. The reliability of a system is the probability, over a given period of time, that the system will correctly deliver services.

  3. Safety. The safety of a system shows the extent of damage may be caused by the system to people or its environment.

  4. Security. The security of a system shows that how the system can resist accidental or deliberate unauthorized intrusions.

Besides these four attributes, other system properties can also be related to dependability:

  1. Reparability. Disruption caused by any failure can be minimized if the system can be repaired as soon as possible. Therefore, it is important to be able to diagnose the problem, access the component that has failed and make changes to fix that component.

  2. Maintainability. Once new requirements are emerged it is important to maintain the system by integration of new functionalities required.

  3. Survivability. Survivability is the ability of a system to continue operation of the service during a possible attack, even at the loss of certain parts of the system.

  4. Error tolerance. This property is considered as part of usability and shows how the system is designed to avoid and tolerate user input errors.

System developers have usually to prioritize system performance and system dependability. Generally, high levels of dependability can only be achieved at the expense of system performance. Because of the additional design, implementation and validation costs, increasing the dependability of a system can significantly increase development costs.  Availability and reliability

The reliability of a system is the probability that the system correctly provides services as defined in its specification. In other words, the reliability of software can be related to the probability that the system input will be a member of the set of inputs, which cause an erroneous output to occur. If an input causing an erroneous output is associated with a frequently used part of the program, then failures will be frequent. However, if it is associated with rarely used code, then users will hardly complain about failures.

The availability of a system is the probability that the system will provide its services to users when they request them. If users need for continuous service then the availability requirements are high.

Reliability and availability are primarily compromised by system failures. These may be a failure to provide a service, a failure to deliver a service as specified, or the delivery of a service unsafely and insecurely. However, many failures are a consequence of erroneous system behaviour that derives from faults in the system. To increase the reliability of a system the following approaches can be used:

  1. Fault avoidance. Program development techniques are used that can minimize the possibility of mistakes and/or eliminate mistakes before they cause system faults.

  2. Fault detection and removal. The use of verification and validation techniques that effectively helps to detect and remove the faults before the system is used.

  3. Fault tolerance. Techniques that ensure that faults in a system do not result in system errors or that ensure that system errors do not result in system failures.

10.1.2.  Safety

The essential feature of safety-critical systems is that system operation is always safe. These systems never compromise or damage people or the environment of the system, even if the system fail. Safety-critical software has two groups:

  1. Primary safety-critical software. This software is usually embedded as a controller in a system. Malfunctioning of such software can cause a hardware malfunction, which results in human injury and/or environmental damage.

  2. Secondary safety-critical software. This is software that can indirectly result in injury. For an example, software used for design has a fault can causes the malfunction of designed system and this may results in injury to people.

The safe operation, i.e. ensuring either that accidents do not occur or that the consequences of an accident are minimal, can be achieved in the next ways:

  1. Hazard avoidance. This type of system is designed so that hazards are avoided. For example, a safe cutting system equipped with two control buttons, where the two buttons can be operated by using separate hands.

  2. Hazard detection and removal. The system is designed so that hazards are detected and removed before they result in an accident. For example, pressure control in a chemical reactor system can reduce the detected excessive pressure before an explosion occurs.

  3. Damage limitation. These systems have a functionality that can minimize the effects of an accident. For example, automatic fire extinguisher systems.

10.1.3.  Security

Security has become increasingly important attributes of systems connecting to the Internet. Internet connections provide additional system functionality, but it also allows systems to be attacked by people with hostile intentions. Security is a system attribute that shows the ability of the system to protect itself from against accidental or deliberate external attacks. In some critical systems such as systems for electronic commerce, military systems, etc., security is the most important attribute of system dependability.

Examples of attacks might be viruses, unauthorized use of system services and data, unauthorized modification of the system, etc. Security is an important attribute for all critical systems. Without a reasonable level of security, the availability, reliability and safety of the system may be compromised if external attacks cause some damage to the system. There are three types of damage that may be caused by external attack:

  1. Denial of service. In this case of attack the system is forced into a state where its normal services become unavailable.

  2. Corruption of programs or data. The software components of the system are damaged affecting reliability and safety of system.

  3. Disclosure of confidential information. Confidential information managed by the system is exposed to unauthorized people as a consequence of the external attack.

The security of a system may be assured using the following methods:

  1. Vulnerability avoidance. The system is designed not to be vulnerable. For example, if a system is not connected to Internet there is no possibility of external attacks.

  2. Attack detection and neutralization. The system is designed so that it detects and removes vulnerabilities before any damage occurs. An example of vulnerability detection and removal is the use of a virus checker to remove infected files.

  3. Exposure limitation. In these methods the consequences of attack are minimized. An example of exposure limitation is the application of regular system backups.

10.2.  Critical systems development

Due to the quick progress in computer technology, improvement of software development methods, better programming languages and effective quality management the dependability of software has significantly improved in the last two decades. In system development special development techniques may be used to ensure that the system is safe, secure and reliable. There are three complementary approaches can be used to develop dependable software:

  1. Fault avoidance. The design and implementation process are used to minimize the programming errors and so on the number of faults in a program.

  2. Fault detection. The verification and validation processes are designed to discover and remove faults in a program before it is deployed for operational use.

  3. Fault tolerance. The system is designed so that faults or unexpected system behaviour during execution are detected and managed in such a way that system failure does not occur.

Redundancy and diversity are fundamental to the achievement of dependability in any system. Examples of redundancy are the components of critical systems that replicate the functionality of other components or an additional checking mechanism that is added to system but not strictly necessary for the basic operation of system. Faults can therefore be detected before they cause failures, and the system may be able to continue operating if individual components fail. If the redundant components are not the same as other components, is the case of diversity, a common failure in the same, replicated component will not result in a complete system failure.

Software engineering research intended to develop tools, techniques and methodologies that lead to the production of fault-free software. Fault-free software is software that exactly meets its specification. Of course, this does not mean that the software will never fail. There may be errors in the specification that may be reflected in the software, or the users may misunderstand or misuse the software system. In order to develop fault-free software the following software engineering techniques must be used:

  1. Dependable software processes. The use of a dependable software process with appropriate verification and validation activities can minimize the number of faults in a program and detect those that do slip through.

  2. Quality management. The software development organization must have a development culture in which quality drives the software process. Design and development standards should be established that provide the development of fault-free programs.

  3. Formal specification. There must be a precise system specification that defines the system to be implemented..

  4. Static verification. Static verification techniques, such as the use of static analysers, can find anomalous program features that could be faults.

  5. Strong typing. A strongly typed programming language such as Java must be used for development. If the programming language has strong typing, the language compiler can detect many programming errors.

  6. Safe programming. Some programming language constructs are more complex and error-prone than others. Safe programming means avoiding or at least minimizing the use of these constructs.

  7. Protected information. Design and implementation processes based on information hiding and encapsulation is to be followed. Object-oriented languages such as Java satisfy this condition.

Although, development of fault-free software by application of these techniques is possible, it is economically disadvantageous. The cost of finding and removing remaining faults rises exponentially as faults in the program are discovered and removed. While the software becomes more dependable more tests are needed to find fewer and fewer faults.

10.2.1.  Fault tolerance

A fault-tolerant system can continue its operation even after some of its part is faulty or not reliable. The fault-tolerance mechanisms in the system ensure that these system faults do not cause system failure. Where system failure could cause a catastrophic accident or where a loss of system operation would cause large economic losses it is necessary to develop fault-tolerant system. There are four complementary approaches to ensure fault-tolerance of a system:

  1. Fault detection. The system must detect a fault that causes a system failure. Generally, this based on checking consistency of the system state.

  2. Damage assessment. The parts of the system state that have been affected by the fault must be detected.

  3. Fault recovery. The system restores its state to a known safe state. This may be achieved by correcting the damaged state or by restoring the system to a known safe state.

  4. Fault repair. This involves modifying the system so that the fault does not recur.  Fault detection and damage assessment

The first stage in ensuring fault tolerance is to detect that a fault either has occurred or will occur unless some action is taken immediately. To achieve this, the illegal values of state variables must be recognized. Therefore, it is necessary to define state constraints that define the conditions that must always hold for all legal states. If these predicates are false, then a fault has occurred.

Damage assessment involves analyzing the system state to estimate the extent of the state corruption. The role of the damage assessment procedures is not to recover from the fault but to assess what parts of the state space have been affected by the fault. Damage can only be assessed if it is possible to apply some validity function that checks whether the state is consistent.  Fault recovery and repair

The purpose of fault recovery process is to modify the state of the system so that the effects of the fault are eliminated or reduced. The system can continue to operate, perhaps in some degraded form. Forward recovery tries to correct the damaged system state and to create the intended state. Forward recovery is only possible in the cases where the state information includes built-in redundancy. Backward recovery restores the system state to a known correct state.

For an example, most database systems include backward error recovery. When a user starts a database operation, a transaction is initiated. The changes made during that transaction are not immediately incorporated in the database. The database is only updated after the transaction is finished and no problems are detected. If the transaction fails, the database is not updated.

10.3.  Real-time software design

The real-time embedded systems are significantly different from other types of software systems. Their correct operation is dependent on the system responding to events within a short time interval. The real-time system can be shortly defined as follows:

A real-time system is a software system where the correct operation of the system depends on the results produced by the system and the time at which these results are produced.

Timely response is an important factor in all embedded systems but, in some cases, very fast response is not necessary. The real-time system is a stimulus/response system. It must produce a corresponding response for a particular input stimulus. Therefore, the behaviour of a real-time system can therefore be defined by listing the stimuli received by the system, the associated responses and the time at which the response must be produced. Stimuli has two classes:

  1. Periodic stimuli. These stimuli are generated at predictable time intervals.

  2. Aperiodic stimuli. These stimuli occur irregularly.

Periodic stimuli in a real-time system are usually generated by sensors associated with the system and provide information about the state of the system’s environment. The responses of system are transmitted to actuators that may control some equipment. Aperiodic stimuli may be generated either by the actuators or by sensors. This sensor-system-actuator model of an embedded real-time system is illustrated in Figure 10.1.

General model for a real-time system.
Figure 10.1. General model for a real-time system.

A real-time system must able to respond to stimuli that occur at different times. Therefore, architecture should be designed so that, as soon as a stimulus is received, control is transferred to the correct handler. This cannot be achieved using sequential programs. Consequently, real-time systems are normally designed as a set of concurrent and cooperating processes. In order to manage these concurrent processes most real-time systems includes a real-time operating system.

The stimulus-response model of a real-time system consists of three processes. Each type of sensor has a sensor management process, computational processes to compute the required response for the stimuli received by the system and control processes for actuator to manage their operation. This stimulus-response model enables rapid collection of data from the sensor and allows the computational processes and actuator responses to be carried out later.

10.3.1.  System design

Designing a real-time system it is necessary to decide first which system capabilities are to be implemented in software and which in hardware. Then the design process of real-time software focuses on the stimuli rather than the objects and functions. The design process has a number of overlapped stages:

  1. Identification of the stimuli that the system must process and the associated responses.

  2. Specifying the timing constraints for each stimulus and associated response.

  3. Selection of hardware components and the real-time operating system to be used.

  4. Aggregation of the stimulus and response processing into a number of concurrent processes. It is usual in real-time systems design is to associate a concurrent process with each class of stimulus and response as shown in Figure 10.2.

  5. Design of algorithms of the required computations for each stimulus and response.

  6. Design a scheduling system ensuring that processes are started and completed in time.

Sensor – actuator control process.
Figure 10.2. Sensor – actuator control process.

Processes must be coordinated in a real-time system. Process coordination mechanisms ensure mutual exclusion to shared resources. Once the process architecture has been designed and scheduling policy has been decided it should be checked that the system will meet its timing requirements.

Timing constraints or other requirements often mean that some system functions, such as signal processing, should be implemented in hardware rather than in software. Hardware components can provide a better performance than the equivalent software.  Real-time system modelling

Real-time systems have to respond to events occurring at irregular intervals. These stimuli often cause the system to move to a new state. For this reason, state machine models are often used to model real-time systems. Application of state machine models is an effective way to represent the design of a real-time system. The UML supports the development of state models based on state-charts. A state model of a system assumes that the system, at any time, is in one of a number of possible states. When a stimulus is received it may cause a transition to a different state.

10.3.2.  Real-time operating systems

Most of the embedded systems have real-time performance constraints that mean they have to work in conjunction with a real-time operating system (RTOS). Real-time operating systems guarantee a certain capability within a specified time constrain. It manages processes and resource allocation in a real-time system. It can starts and stops processes and allocate memory and processor resources, so that stimuli can be handled as concurrent processes.

Real-time operating systems usually include the following components:

  1. Real-time clock. This provides information to schedule processes periodically.

  2. Interrupt handler. This manages aperiodic requests for service.

  3. Process manager. It is responsible for scheduling processes to be executed.

  4. Resource manager. Resource manager allocates resources (memory, processor, etc.) to processes.

  5. Despatcher. It is responsible for starting the execution of a process.  Process management

Real-time systems have to respond events from the hardware in real time. The processes handling events must be scheduled for execution and must be allocated processor resources to provide their deadline. In real-time operating systems the process manager is responsible for selecting the next process to be executed, allocating resources such as processor and memory resources and starting and stopping the process.

The process manager has to manage processes having different priority. Real-time operating systems define different priority levels for system processes:

  1. Interrupt level. This is the highest priority level. It is allocated to processes that need a very fast response.

  2. Clock level. This level of priority is assigned to periodic processes.

  3. Background processes level. This is the lowest priority level. It is allocated to background processes that have no timing constraints. These processes are scheduled for execution when processor capacity is available.

In most real-time systems, there are several types of periodic processes. They usually control the data acquisition and the actions of actuators. Periodic processes have different execution time and deadline. The timing requirements of all processes are specified by the application program. The real-time operating system manages the execution of periodic processes and ensures that every process have to be completed by their deadline.

RTOS actions required to start a process.
Figure 10.3. RTOS actions required to start a process.

Figure 10.3. shows the sequence of activities that are performed by the operating system for periodic process management. The scheduler examines all the periodic processes and chooses a process to be executed. The choice depends on the process priority, the process periods, the expected execution times and the deadlines of the ready processes.

10.4.  Exercises

  1. What does the embedded system mean?

  2. Give examples of embedded system!

  3. What are the three types of critical systems. What is the different between them?

  4. What are the main dimensions of system dependability?

  5. What approaches can be used for dependable software development?

  6. List some programming techniques that are not recommended in safe programming!

  7. What does the fault-tolerant system mean?

  8. What systems are called real-time systems?

  9. What types of stimulus are the real-time systems designed for?

  10. Explain the process management of real-time systems!