CS 181AG Lecture 16

# Switching

Arthi Padmanabhan Oct 31, 2022

# Upcoming Schedule

 Assignment 7 goes out Wednesday as usual, due next Wednesday 10pm (but please do reading before Wednesday's class)

# **Big Picture: Router Functionality**

Switching: Input move packet to Output ports correct port ports

> Longest Matching Prefix to decide which output port, Packet Classification to decide matching rule for packet

# Big Picture: Router Functionality





• Once router knows where a packet needs to go, it must physically move the packet to the correct output link

# Simple Solution: Shared Memory Switch

• Each packet is read into memory and then read out of memory. Then the same is done for the next packet, etc



Problem:

A single (general purpose)
CPU is too slow

### Shared Memory Switch with Multiple CPUS

• Use multiple CPUs to alleviate load on single CPU



Problem:

- Packet still traverses bus twice
  - once to get to CPU, once to
- get back
- Bus: higher load -> lower speed

### Crossbar Switch

• Each input is connected to each output





### Crossbar Switch: Constraints

- Output should be connected to no more than one input
- Inputs can be connected to more than one output though
- Packets going to the same output should arrive at the output in the correct order

# Crossbar Switch

- Best case: maximum parallelization -> N fold speedup (packets divided into cells)
- Requires finding N disjoint inputoutput pairs
- Why is this hard?
  - Several inputs may want to send to the same output at the same time
  - There may be outputs with no inputs wanting to send to them
- Could be reduced to bipartite matching problem (yay more algs class!)
  - All known bipartite matching algorithms are too slow in the context of switching packets



# Example: Grocery Store





#### Crossbar Switch: Output queues





#### Crossbar Switch: Input Queues



#### Crossbar Switch: Putting it together



#### Crossbar Switch: Putting it together



#### Crossbar Switch: Putting it together



#### Take-a-Ticket

- How does input port know when to send the front element of its queue?
- Works like deli counter: each input R "takes a ticket" for the output S at the front of its queue. S then calls out the ticket number it's serving. When R hears its number, it sends the packet to S
- Requests, calling out numbers happens on separate control bus (very light load)

#### Example

#### Round 1



# Example (cont.)

#### Round 3



- How many more rounds are needed?
  - Draw out any remaining rounds

# Example (cont)

# Head of Line Blocking

• What would have been the optimal number of rounds?





# Avoiding HOL Blocking

- One proposal queue only at output
- Requires fabric to run N times faster than input link (where N is number of input links)
  - If k is small, can be realized with k parallel buses
  - Can be very expensive

# Avoiding HOL Blocking: Virtual Output Queues



# Avoiding HOL Blocking: Virtual Output Queues

- Keep separate queue for each output
  - Can make progress on each output queue separately
  - Can express request to all ports in one bitmap



# Avoiding HOL Blocking: Virtual Output Queues + Parallel Iterative Matching

- Keep separate queue for each output
  - Can make progress on each output queue separately
  - Can express request to all ports in one bitmap



#### Parallel Iterative Matching

#### Round 1







Round 2







#### Parallel Iterative Matching





# Avoiding Randomness

- Why avoid randomness?
  - Hard to generate random numbers fast enough
  - Multiple iterations to attain maximal matches

#### iSLIP

- In each step that involves choosing (Grant and Accept), choose winner in round robin manner using a rotating pointer
- Each output keeps a grant pointer, g, initialized to first input. When it has to choose which input to grant to, it chooses the input with lowest port number that is greater than or equal to g
- After accept phase, if output port was matched with input X, grant pointer is at (X+1) mod (number of input ports)
- Input ports each keep an accept pointer that works in the same way

# iSLIP Example

• Which input's request does output 2 grant?



### iSLIP Example

• Which output's grant does C accept?



# 2<sup>nd</sup> Iteration?

• Which inputs would send where in a 2<sup>nd</sup> iteration?



# 2<sup>nd</sup> Iteration?

• Which inputs would send where in a 2<sup>nd</sup> iteration?



# 2<sup>nd</sup> Iteration



# 2<sup>nd</sup> Iteration

- Note: grant/accept pointers only increment after 1<sup>st</sup> iteration
- Where would grant/accept pointers be after this round?



# 2<sup>nd</sup> Iteration



#### iSLIP Round 1

#### **Round 1, Iteration 1**



**Round 1, Iteration 2** 



### iSLIP Remaining Rounds

# iSLIP Advantages

- Avoids HOL blocking
- Rotating priority provides long-term fairness (pointers are synchronized at the beginning but long-term lack of synchronization provides performance improvement)



### Recap

- Switching is the process of physically moving packets from input to output ports
- Using a crossbar switch, N-fold speedup is possible, but finding N disjoing input-output pairs is difficult
- Take-a-Ticket system provides communication protocol between inputs/outputs, but is subject to Head-of-Line Blocking
- Parallel Iterative Matching (PIM) avoid HOL blocking by using virtual output queus (VOQs) and randomization
- iSLIP removes randomization from PIM by introducing concept of rotating priority