Next Previous Contents

5. Protocols

5.1 Border Gateway Protocol

The BGP protocol is implemented in three parts: bgp.c which takes care of the connection and most of the interface with BIRD core, packets.c handling both incoming and outgoing BGP packets and attrs.c containing functions for manipulation with BGP attribute lists.

As opposed to the other existing routing daemons, BIRD has a sophisticated core architecture which is able to keep all the information needed by BGP in the primary routing table, therefore no complex data structures like a central BGP table are needed. This increases memory footprint of a BGP router with many connections, but not too much and, which is more important, it makes BGP much easier to implement.

Each instance of BGP (corresponding to a single BGP peer) is described by a bgp_proto structure to which are attached individual connections represented by bgp_connection (usually, there exists only one connection, but during BGP session setup, there can be more of them). The connections are handled according to the BGP state machine defined in the RFC with all the timers and all the parameters configurable.

In incoming direction, we listen on the connection's socket and each time we receive some input, we pass it to bgp_rx(). It decodes packet headers and the markers and passes complete packets to bgp_rx_packet() which distributes the packet according to its type.

In outgoing direction, we gather all the routing updates and sort them to buckets (bgp_bucket) according to their attributes (we keep a hash table for fast comparison of rta's and a fib which helps us to find if we already have another route for the same destination queued for sending, so that we can replace it with the new one immediately instead of sending both updates). There also exists a special bucket holding all the route withdrawals which cannot be queued anywhere else as they don't have any attributes. If we have any packet to send (due to either new routes or the connection tracking code wanting to send a Open, Keepalive or Notification message), we call bgp_schedule_packet() which sets the corresponding bit in a packet_to_send bit field in bgp_conn and as soon as the transmit socket buffer becomes empty, we call bgp_fire_tx(). It inspects state of all the packet type bits and calls the corresponding bgp_create_xx() functions, eventually rescheduling the same packet type if we have more data of the same type to send.

The processing of attributes consists of two functions: bgp_decode_attrs() for checking of the attribute blocks and translating them to the language of BIRD's extended attributes and bgp_encode_attrs() which does the converse. Both functions are built around a bgp_attr_table array describing all important characteristics of all known attributes. Unknown transitive attributes are attached to the route as EAF_TYPE_OPAQUE byte streams.


Function

int bgp_open (struct bgp_proto * p) -- open a BGP instance

Arguments

struct bgp_proto * p

BGP instance

Description

This function allocates and configures shared BGP resources. Should be called as the last step during initialization (when lock is acquired and neighbor is ready). When error, state changed to PS_DOWN, -1 is returned and caller should return immediately.


Function

void bgp_close (struct bgp_proto * p, int apply_md5) -- close a BGP instance

Arguments

struct bgp_proto * p

BGP instance

int apply_md5

0 to disable unsetting MD5 auth

Description

This function frees and deconfigures shared BGP resources. apply_md5 is set to 0 when bgp_close is called as a cleanup from failed bgp_open().


Function

void bgp_start_timer (timer * t, int value) -- start a BGP timer

Arguments

timer * t

timer

int value

time to fire (0 to disable the timer)

Description

This functions calls tm_start() on t with time value and the amount of randomization suggested by the BGP standard. Please use it for all BGP timers.


Function

void bgp_close_conn (struct bgp_conn * conn) -- close a BGP connection

Arguments

struct bgp_conn * conn

connection to close

Description

This function takes a connection described by the bgp_conn structure, closes its socket and frees all resources associated with it.


Function

void bgp_update_startup_delay (struct bgp_proto * p) -- update a startup delay

Arguments

struct bgp_proto * p

BGP instance

Description

This function updates a startup delay that is used to postpone next BGP connect. It also handles disable_after_error and might stop BGP instance when error happened and disable_after_error is on.

It should be called when BGP protocol error happened.


Function

void bgp_connect (struct bgp_proto * p) -- initiate an outgoing connection

Arguments

struct bgp_proto * p

BGP instance

Description

The bgp_connect() function creates a new bgp_conn and initiates a TCP connection to the peer. The rest of connection setup is governed by the BGP state machine as described in the standard.


Function

int bgp_incoming_connection (sock * sk, int dummy UNUSED) -- handle an incoming connection

Arguments

sock * sk

TCP socket

int dummy UNUSED

-- undescribed --

Description

This function serves as a socket hook for accepting of new BGP connections. It searches a BGP instance corresponding to the peer which has connected and if such an instance exists, it creates a bgp_conn structure, attaches it to the instance and either sends an Open message or (if there already is an active connection) it closes the new connection by sending a Notification message.


Function

void bgp_error (struct bgp_conn * c, unsigned code, unsigned subcode, byte * data, int len) -- report a protocol error

Arguments

struct bgp_conn * c

connection

unsigned code

error code (according to the RFC)

unsigned subcode

error sub-code

byte * data

data to be passed in the Notification message

int len

length of the data

Description

bgp_error() sends a notification packet to tell the other side that a protocol error has occurred (including the data considered erroneous if possible) and closes the connection.


Function

void bgp_store_error (struct bgp_proto * p, struct bgp_conn * c, u8 class, u32 code) -- store last error for status report

Arguments

struct bgp_proto * p

BGP instance

struct bgp_conn * c

connection

u8 class

error class (BE_xxx constants)

u32 code

error code (class specific)

Description

bgp_store_error() decides whether given error is interesting enough and store that error to last_error variables of p


Function

int bgp_fire_tx (struct bgp_conn * conn) -- transmit packets

Arguments

struct bgp_conn * conn

connection

Description

Whenever the transmit buffers of the underlying TCP connection are free and we have any packets queued for sending, the socket functions call bgp_fire_tx() which takes care of selecting the highest priority packet queued (Notification > Keepalive > Open > Update), assembling its header and body and sending it to the connection.


Function

void bgp_schedule_packet (struct bgp_conn * conn, int type) -- schedule a packet for transmission

Arguments

struct bgp_conn * conn

connection

int type

packet type

Description

Schedule a packet of type type to be sent as soon as possible.


Function

const char * bgp_error_dsc (unsigned code, unsigned subcode) -- return BGP error description

Arguments

unsigned code

BGP error code

unsigned subcode

BGP error subcode

Description

bgp_error_dsc() returns error description for BGP errors which might be static string or given temporary buffer.


Function

void bgp_rx_packet (struct bgp_conn * conn, byte * pkt, unsigned len) -- handle a received packet

Arguments

struct bgp_conn * conn

BGP connection

byte * pkt

start of the packet

unsigned len

packet size

Description

bgp_rx_packet() takes a newly received packet and calls the corresponding packet handler according to the packet type.


Function

int bgp_rx (sock * sk, int size) -- handle received data

Arguments

sock * sk

socket

int size

amount of data received

Description

bgp_rx() is called by the socket layer whenever new data arrive from the underlying TCP connection. It assembles the data fragments to packets, checks their headers and framing and passes complete packets to bgp_rx_packet().


Function

unsigned int bgp_encode_attrs (struct bgp_proto * p, byte * w, ea_list * attrs, int remains) -- encode BGP attributes

Arguments

struct bgp_proto * p

BGP instance

byte * w

buffer

ea_list * attrs

a list of extended attributes

int remains

remaining space in the buffer

Description

The bgp_encode_attrs() function takes a list of extended attributes and converts it to its BGP representation (a part of an Update message).

Result

Length of the attribute block generated or -1 if not enough space.


Function

struct rta * bgp_decode_attrs (struct bgp_conn * conn, byte * attr, unsigned int len, struct linpool * pool, int mandatory) -- check and decode BGP attributes

Arguments

struct bgp_conn * conn

connection

byte * attr

start of attribute block

unsigned int len

length of attribute block

struct linpool * pool

linear pool to make all the allocations in

int mandatory

1 iff presence of mandatory attributes has to be checked

Description

This function takes a BGP attribute block (a part of an Update message), checks its consistency and converts it to a list of BIRD route attributes represented by a rta.

5.2 Open Shortest Path First (OSPF)

The OSPF protocol is quite complicated and its complex implemenation is split to many files. In ospf.c, you will find mainly the interface for communication with the core (e.g., reconfiguration hooks, shutdown and initialisation and so on). In packet.c, you will find various functions for sending and receiving generic OSPF packets. There are also routines for authentication and checksumming. File iface.c contains the interface state machine and functions for allocation and deallocation of OSPF's interface data structures. Source neighbor.c includes the neighbor state machine and functions for election of Designated Router and Backup Designated router. In hello.c, there are routines for sending and receiving of hello packets as well as functions for maintaining wait times and the inactivity timer. Files lsreq.c, lsack.c, dbdes.c contain functions for sending and receiving of link-state requests, link-state acknowledgements and database descriptions respectively. In lsupd.c, there are functions for sending and receiving of link-state updates and also the flooding algorithm. Source topology.c is a place where routines for searching LSAs in the link-state database, adding and deleting them reside, there also are functions for originating of various types of LSAs (router LSA, net LSA, external LSA). File rt.c contains routines for calculating the routing table. lsalib.c is a set of various functions for working with the LSAs (endianity conversions, calculation of checksum etc.).

One instance of the protocol is able to hold LSA databases for multiple OSPF areas, to exchange routing information between multiple neighbors and to calculate the routing tables. The core structure is proto_ospf to which multiple ospf_area and ospf_iface structures are connected. ospf_area is also connected to top_hash_graph which is a dynamic hashing structure that describes the link-state database. It allows fast search, addition and deletion. Each LSA is kept in two pieces: header and body. Both of them are kept in the endianity of the CPU.

In OSPFv2 specification, it is implied that there is one IP prefix for each physical network/interface (unless it is an ptp link). But in modern systems, there might be more independent IP prefixes associated with an interface. To handle this situation, we have one ospf_iface for each active IP prefix (instead for each active iface); This behaves like virtual interface for the purpose of OSPF. If we receive packet, we associate it with a proper virtual interface mainly according to its source address.

OSPF keeps one socket per ospf_iface. This allows us (compared to one socket approach) to evade problems with a limit of multicast groups per socket and with sending multicast packets to appropriate interface in a portable way. The socket is associated with underlying physical iface and should not receive packets received on other ifaces (unfortunately, this is not true on BSD). Generally, one packet can be received by more sockets (for example, if there are more ospf_iface on one physical iface), therefore we explicitly filter received packets according to src/dst IP address and received iface.

Vlinks are implemented using particularly degenerate form of ospf_iface, which has several exceptions: it does not have its iface or socket (it copies these from 'parent' ospf_iface) and it is present in iface list even when down (it is not freed in ospf_iface_down()).

The heart beat of ospf is ospf_disp(). It is called at regular intervals (proto_ospf->tick). It is responsible for aging and flushing of LSAs in the database, for routing table calculaction and it call area_disp() of every ospf_area.

The function area_disp() is responsible for late originating of router LSA and network LSA and for cleanup before routing table calculation process in the area. To every ospf_iface, we connect one or more ospf_neighbor's -- a structure containing many timers and queues for building adjacency and for exchange of routing messages.

BIRD's OSPF implementation respects RFC2328 in every detail, but some of internal algorithms do differ. The RFC recommends making a snapshot of the link-state database when a new adjacency is forming and sending the database description packets based on the information in this snapshot. The database can be quite large in some networks, so rather we walk through a slist structure which allows us to continue even if the actual LSA we were working with is deleted. New LSAs are added at the tail of this slist.

We also don't keep a separate OSPF routing table, because the core helps us by being able to recognize when a route is updated to an identical one and it suppresses the update automatically. Due to this, we can flush all the routes we've recalculated and also those we've deleted to the core's routing table and the core will take care of the rest. This simplifies the process and conserves memory.


Function

void area_disp (struct ospf_area * oa) -- invokes origination of

Arguments

struct ospf_area * oa

ospf area

Open Shortest Path First (OSPF)

router LSA and routing table cleanup

Description

It invokes aging and when ospf_area->origrt is set to 1, start function for origination of router, network LSAs.


Function

void ospf_disp (timer * timer) -- invokes routing table calculation, aging and also area_disp()

Arguments

timer * timer

timer usually called every proto_ospf->tick second, timer->data point to proto_ospf


Function

int ospf_import_control (struct proto * p, rte ** new, ea_list ** attrs, struct linpool * pool) -- accept or reject new route from nest's routing table

Arguments

struct proto * p

current instance of protocol

rte ** new

the new route

ea_list ** attrs

list of attributes

struct linpool * pool

pool for allocation of attributes

Description

Its quite simple. It does not accept our own routes and leaves the decision on import to the filters.


Function

int ospf_shutdown (struct proto * p) -- Finish of OSPF instance

Arguments

struct proto * p

current instance of protocol

Description

RFC does not define any action that should be taken before router shutdown. To make my neighbors react as fast as possible, I send them hello packet with empty neighbor list. They should start their neighbor state machine with event NEIGHBOR_1WAY.


Function

int ospf_reconfigure (struct proto * p, struct proto_config * c) -- reconfiguration hook

Arguments

struct proto * p

current instance of protocol (with old configuration)

struct proto_config * c

new configuration requested by user

Description

This hook tries to be a little bit intelligent. Instance of OSPF will survive change of many constants like hello interval, password change, addition or deletion of some neighbor on nonbroadcast network, cost of interface, etc.


Function

void originate_rt_lsa (struct ospf_area * oa) -- build new instance of router LSA

Arguments

struct ospf_area * oa

ospf_area which is LSA built to

Description

It builds router LSA walking through all OSPF interfaces in specified OSPF area. This function is mostly called from area_disp(). Builds new LSA, increases sequence number (if old instance exists) and sets age of LSA to zero.


Function

void originate_net_lsa (struct ospf_iface * ifa) -- originates of deletes network LSA

Arguments

struct ospf_iface * ifa

interface which is LSA originated for

Description

Interface counts number of adjacent neighbors. If this number is lower than one or interface is not in state OSPF_IS_DR it deletes and premature ages instance of network LSA for specified interface. In other case, new instance of network LSA is originated.


Function

void originate_ext_lsa (struct ospf_area * oa, struct fib_node * fn, int src, u32 metric, ip_addr fwaddr, u32 tag, int pbit) -- new route received from nest and filters

Arguments

struct ospf_area * oa

ospf_area for which LSA is originated

struct fib_node * fn

network prefix and mask

int src

the source of origination of the LSA (EXT_EXPORT/EXT_NSSA)

u32 metric

the metric of a route

ip_addr fwaddr

the forwarding address

u32 tag

the route tag

int pbit

P-bit for NSSA LSAs, ignored for external LSAs

Description

If I receive a message that new route is installed, I try to originate an external LSA. If oa is an NSSA area, NSSA-LSA is originated instead. oa should not be a stub area. src does not specify whether the LSA is external or NSSA, but it specifies the source of origination - the export from ospf_rt_notify(), or the NSSA-EXT translation.

The function also sets flag ebit. If it's the first time, the new router lsa origination is necessary.


Function

struct top_graph * ospf_top_new (pool * pool) -- allocated new topology database

Arguments

pool * pool

-- undescribed --

Description

this dynamically hashed structure is often used for keeping lsas. mainly its used in ospf_area structure.


Function

void neigh_chstate (struct ospf_neighbor * n, u8 state) -- handles changes related to new or lod state of neighbor

Arguments

struct ospf_neighbor * n

OSPF neighbor

u8 state

new state

Description

Many actions have to be taken acording to a change of state of a neighbor. It starts rxmt timers, call interface state machine etc.


Function

void ospf_neigh_sm (struct ospf_neighbor * n, int event) -- ospf neighbor state machine

Arguments

struct ospf_neighbor * n

neighor

int event

actual event

Description

This part implements the neighbor state machine as described in 10.3 of RFC 2328. The only difference is that state NEIGHBOR_ATTEMPT is not used. We discover neighbors on nonbroadcast networks in the same way as on broadcast networks. The only difference is in sending hello packets. These are sent to IPs listed in ospf_iface->nbma_list .


Function

void bdr_election (struct ospf_iface * ifa) -- (Backup) Designed Router election

Arguments

struct ospf_iface * ifa

actual interface

Description

When the wait timer fires, it is time to elect (Backup) Designated Router. Structure describing me is added to this list so every electing router has the same list. Backup Designated Router is elected before Designated Router. This process is described in 9.4 of RFC 2328.


Function

void ospf_iface_chstate (struct ospf_iface * ifa, u8 state) -- handle changes of interface state

Arguments

struct ospf_iface * ifa

OSPF interface

u8 state

new state

Description

Many actions must be taken according to interface state changes. New network LSAs must be originated, flushed, new multicast sockets to listen for messages for ALLDROUTERS have to be opened, etc.


Function

void ospf_iface_sm (struct ospf_iface * ifa, int event) -- OSPF interface state machine

Arguments

struct ospf_iface * ifa

OSPF interface

int event

event comming to state machine

Description

This fully respects 9.3 of RFC 2328 except we have slightly different handling of DOWN and LOOP state. We remove intefaces that are DOWN. DOWN state is used when an interface is waiting for a lock. LOOP state is used when an interface does not have a link.


Function

int ospf_rx_hook (sock * sk, int size)

Arguments

sock * sk

socket we received the packet.

int size

size of the packet

Description

This is the entry point for messages from neighbors. Many checks (like authentication, checksums, size) are done before the packet is passed to non generic functions.


Function

void ospf_age (struct proto_ospf * po)

Arguments

struct proto_ospf * po

ospf protocol

Description

This function is periodicaly invoked from ospf_disp(). It computes the new age of all LSAs and old (age is higher than LSA_MAXAGE) LSAs are flushed whenever possible. If an LSA originated by the router itself is older than LSREFRESHTIME a new instance is originated.

The RFC says that a router should check the checksum of every LSA to detect hardware problems. BIRD does not do this to minimalize CPU utilization.

If routing table calculation is scheduled, it also invalidates the old routing table calculation results.


Function

int lsa_validate (struct ospf_lsa_header * lsa, void * body) -- check whether given LSA is valid

Arguments

struct ospf_lsa_header * lsa

LSA header

void * body

pointer to LSA body

Description

Checks internal structure of given LSA body (minimal length, consistency). Returns true if valid.


Function

struct top_hash_entry * lsa_install_new (struct proto_ospf * po, struct ospf_lsa_header * lsa, u32 domain, void * body) -- install new LSA into database

Arguments

struct proto_ospf * po

OSPF protocol

struct ospf_lsa_header * lsa

LSA header

u32 domain

domain of LSA

void * body

pointer to LSA body

Description

This function ensures installing new LSA into LSA database. Old instance is replaced. Several actions are taken to detect if new routing table calculation is necessary. This is described in 13.2 of RFC 2328.


Function

void ospf_dbdes_send (struct ospf_neighbor * n, int next) -- transmit database description packet

Arguments

struct ospf_neighbor * n

neighbor

int next

whether to send a next packet in a sequence (1) or to retransmit the old one (0)

Description

Sending of a database description packet is described in 10.8 of RFC 2328. Reception of each packet is acknowledged in the sequence number of another. When I send a packet to a neighbor I keep a copy in a buffer. If the neighbor does not reply, I don't create a new packet but just send the content of the buffer.


Function

void ospf_rt_spf (struct proto_ospf * po) -- calculate internal routes

Arguments

struct proto_ospf * po

OSPF protocol

Description

Calculation of internal paths in an area is described in 16.1 of RFC 2328. It's based on Dijkstra's shortest path tree algorithms. This function is invoked from ospf_disp().

5.3 Pipe

The Pipe protocol is very simple. It just connects to two routing tables using proto_add_announce_hook() and whenever it receives a rt_notify() about a change in one of the tables, it converts it to a rte_update() in the other one.

To avoid pipe loops, Pipe keeps a `being updated' flag in each routing table.

A pipe has two announce hooks, the first connected to the main table, the second connected to the peer table. When a new route is announced on the main table, it gets checked by an export filter in ahook 1, and, after that, it is announced to the peer table via rte_update(), an import filter in ahook 2 is called. When a new route is announced in the peer table, an export filter in ahook2 and an import filter in ahook 1 are used. Oviously, there is no need in filtering the same route twice, so both import filters are set to accept, while user configured 'import' and 'export' filters are used as export filters in ahooks 2 and 1. Route limits are handled similarly, but on the import side of ahooks.

5.4 Routing Information Protocol

RIP is a pretty simple protocol, so about a half of its code is interface with the core.

We maintain our own linked list of rip_entry structures -- it serves as our small routing table. RIP never adds to this linked list upon packet reception; instead, it lets the core know about data from the packet and waits for the core to call rip_rt_notify().

Within rip_tx(), the list is walked and a packet is generated using rip_tx_prepare(). This gets tricky because we may need to send more than one packet to one destination. Struct rip_connection is used to hold context information such as how many of rip_entry's we have already sent and it's also used to protect against two concurrent sends to one destination. Each rip_interface has at most one rip_connection.

We are not going to honor requests for sending part of routing table. That would need to turn split horizon off etc.

About triggered updates, RFC says: when a triggered update was sent, don't send a new one for something between 1 and 5 seconds (and send one after that). We do something else: each 5 seconds, we look for any changed routes and broadcast them.


Function

void rip_timer (timer * t)

Arguments

timer * t

timer

Description

Broadcast routing tables periodically (using rip_tx) and kill routes that are too old. RIP keeps a list of its own entries present in the core table by a linked list (functions rip_rte_insert() and rip_rte_delete() are responsible for that), it walks this list in the timer and in case an entry is too old, it is discarded.


Function

struct rip_interface * new_iface (struct proto * p, struct iface * new, unsigned long flags, struct iface_patt * patt)

Arguments

struct proto * p

myself

struct iface * new

interface to be created or NULL if we are creating a magic socket. The magic socket is used for listening and also for sending requested responses.

unsigned long flags

interface flags

struct iface_patt * patt

pattern this interface matched, used for access to config options

Description

Create an interface structure and start listening on the interface.

5.5 Router Advertisements

The RAdv protocol is implemented in two files: radv.c containing the interface with BIRD core and the protocol logic and packets.c handling low level protocol stuff (RX, TX and packet formats). The protocol does not import or export any routes.

The RAdv is structured in the usual way - for each handled interface there is a structure radv_iface that contains a state related to that interface together with its resources (a socket, a timer). There is also a prepared RA stored in a TX buffer of the socket associated with an iface. These iface structures are created and removed according to iface events from BIRD core handled by radv_if_notify() callback.

The main logic of RAdv consists of two functions: radv_iface_notify(), which processes asynchronous events (specified by RA_EV_* codes), and radv_timer(), which triggers sending RAs and computes the next timeout.

Supported standards: - RFC 4861 - main RA standard - RFC 6106 - DNS extensions (RDDNS, DNSSL)

5.6 Static

The Static protocol is implemented in a straightforward way. It keeps two lists of static routes: one containing interface routes and one holding the remaining ones. Interface routes are inserted and removed according to interface events received from the core via the if_notify() hook. Routes pointing to a neighboring router use a sticky node in the neighbor cache to be notified about gaining or losing the neighbor. Special routes like black holes or rejects are inserted all the time.

Multipath routes are tricky. Because these routes depends on several neighbors we need to integrate that to the neighbor notification handling, we use dummy static_route nodes, one for each nexthop. Therefore, a multipath route consists of a master static_route node (of dest RTD_MULTIPATH), which specifies prefix and is used in most circumstances, and a list of dummy static_route nodes (of dest RTD_NONE), which stores info about nexthops and are connected to neighbor entries and neighbor notifications. Dummy nodes are chained using mp_next, they aren't in other_routes list, and abuse some fields (masklen, if_name) for other purposes.

The only other thing worth mentioning is that when asked for reconfiguration, Static not only compares the two configurations, but it also calculates difference between the lists of static routes and it just inserts the newly added routes and removes the obsolete ones.

5.7 Direct

The Direct protocol works by converting all ifa_notify() events it receives to rte_update() calls for the corresponding network.


Next Previous Contents