The unicache is an attempt to do full flow lookup based on the trash/trie data-structure. Trash builds very flat trees even with very large of entries. Millions. The key holds src/dst/sprt/dprt/proto. Data-nodes are called leafs as LC-trie and holds a list struct rtable. This just as the hash-chain. Which means we can reuse all available ipv4 matching code. Garbage collection ------------------ garbage collection is one of the most crucial processes. Dived into passive, timer and active GC.The main focus is at the trash/trie size, the number of leafs. Also at insertion the size/gc_thresh is checked. If gc_thresh is reached rt-entries are removed so gc_goal is reached. also at insert the leafs chain- length is pruned when gc_elasticity is reached. This is as before with hash- chains. As the trash/trie size is most important parameter "legacy" route parameters are calculatedfrom trie/trash size static int ip_rt_gc_elasticity = 3; void ip_rt_new_size(struct trie *t) { ipv4_dst_ops.gc_thresh = t->gc_thresh * ip_rt_gc_elasticity; ip_rt_max_size = t->gc_thresh * (ip_rt_gc_elasticity + 1); } TGC --- Timer-based GC is slightly different from before. See rt_may_expite Tuning ------ Virtually the GC process is now very simple and controlled in most cases only with one variable -- the trash/trie size # routing w/o route cache unicache --set_gc_thresh 2000 # default unicache --set_gc_thresh 100000 The process GC described above we call Passive GC (PDC) as we do this GC after- wards and take a lot of work not through away active and valuable entries. This is a rather expensive process. It's a good idea to keep gc_goal relatively small to keep the resizing of trie/trash at a minimum. The default should a good start. AGC --- The full flow lookup makes new ways of GC. We can monitor state changes and termination of active flows. We call this active garbage collection or AGC. For TCP can monitor session termination by looking at FIN snooping or RST and direct remove stale entries from trash. This is very effective. During development pktgen (UDP) was instrumented to signal end-of-flow stress the implementation. Monitoring ---------- /proc/net/unicache_stat Basic info: size of leaf: 36 bytes, size of tnode: 44 bytes. trie: Aver depth: 1.31 Max depth: 4 Leaves: 198375 Internal nodes: 29440 1: 26745 2: 2675 3: 19 19: 1 Pointers: 588630 Null ptrs: 359801 Total size: 10539 kB /proc/net/unicache_flows holds "active flows" pkts/trash/src/dst/sprt/dprt/proto/ifidx 00000000 00000004 08d44b2f 0101a8c0 99a9b105 09000900 00110003 00000000 00000004 08d46e23 0101a8c0 893c8505 09000900 00110003 00000000 00000004 08d486eb 010a0a0a 248a4a0b 09000900 00110001 00000000 00000004 08d4ce67 010a0a0a bbc46b0b 09000900 00110001 And of course rtstat to see hit of warm cache entries and fib lookup as other related parameters. equilibrium ----------- equilibrium resizing is not yet added. If needed ip_rt_new_size can be called on period basis to adjust goal. Recall that the dynamic trie/trash data- structure has no preallocated hash table and grows with new entries to gc_thresh. and timer-based gc will will remove stale entries. locking ------- locking. Current implantation takes a "rather safe than sorrow" approach. RCU-BH protections and trie-writers are serialized via the trie_write_lock. It should be possible to do more fine-grain lock to support many concurrent writers. flowlogging ----------- Current code has can log finished via netlink. Logged info has flow information src/dst/sprt/dprt/proto/if and packets count. unsupprted ----------- CONFIG_IP_ROUTE_MULTIPATH_CACHED is not (yet) supported. future work ----------- It's possibly to extend the key to ipv6 at very little cost. It's possibly to store i.e struct socket in leaf to get a unified lookup. The full key should give opportunities for i.e connection tracking etc.