2 Switch Allocator & Arbitration
In a traditional multi-core chip, all cores share a single bus. Every core that wants to talk must wait its turn, the bus becomes a bottleneck the moment you have more than 2-3 cores. A NoC replaces that shared wire with a packet-switched fabric: each core gets its own router, and data travels as packets that hop from router to router until they reach the destination. Latency is predictable, bandwidth scales with core count, and no core can monopolize the interconnect.

The mesh is a grid of routers connected to their cardinal neighbors. The indexing formula used throughout the RTL is:
Node ID = (Y * MESH_X) + X
So for a 2×2 grid: Node 0 = (0,0), Node 1 = (1,0), Node 2 = (0,1), Node 3 = (1,1). This is why cmd_dest_node[0] is X and cmd_dest_node[1] is Y in the UART top - the two bits encode a 2D coordinate directly.

Every single transfer on a physical link is one flit (flow control unit). The link is 34 bits wide, always. The bits are partitioned as:
[33:33] dest_x -> 1 bit (COORD_WIDTH=1, so X is 0 or 1)
[32:32] dest_y -> 1 bit (Y is 0 or 1)
[31:30] flit_type -> 2 bits (01=Head, 10=Body, 11=Tail)
[29:0 ] payload -> 30 bits (PAYLOAD_WIDTH = 34 - 2 - 2 = 30)

Notice the Head flit carries the timestamp in its 30-bit payload (zero-padded), not user data. The 60-bit core data travels across Body + Tail. This is why CORE_DATA_WIDTH = PAYLOAD_WIDTH * 2 = 60 - two flits × 30 bits each.