[rust-dev] Overhead of Rc<RefCell<T>>

David Henningsson

2014-10-21 05:41:59 UTC

Hi,

So, I was trying to make a digraph where every node and edge has a
property list (which can change during the lifetime of the node/edge),
and I ended up with this structure:

struct Edge {
from_node: Weak<RefCell<Node>>,
to_node: Weak<RefCell<Node>>,
prop_list: HashMap<String, String>,
}

struct Node {
in_edges: Vec<Weak<RefCell<Edge>>>,
out_edges: Vec<Weak<RefCell<Edge>>>,
prop_list: HashMap<String, String>,
}

struct Graph {
nodes: Vec<Rc<RefCell<Node>>>,
edges: Vec<Rc<RefCell<Edge>>>,
}

So the idea is that the Graph is the real owner of all nodes and edges,
and then there are weak cross-references between the nodes and the edges.

Now, this seems to provide what I need, but it has two drawbacks:
1) A lot of code to access elements, e g,
my_edge.upgrade().unwrap().deref().borrow_mut() just to get inside the
Rc and RefCell, and potentially have error code instead of just
potentially failing on unwrap() and borrow_mut(). The runtime overhead
of the code is probably not that crucial here, but it would be nice to
avoid it if possible.
2) Memory overhead: the Rc has two ints (strong and weak) and RefCell
has one int (BorrowFlag).

Is there a way to write Rust code that has less overhead, without
resorting to unsafe blocks? If you hold a mutable reference to the
Graph, then the entire graph, including all nodes, edges, and their
property lists should be mutable as well.

I think I'd ideally want something like:

struct Graph {
nodes: Vec<&Node>,
edges: Vec<&Edge>,
}

because if I had just Vec<Node> and Vec<Edge> that would make insertion
and deletion of nodes/edges cause the cross-ref pointers to become
invalid. But then I don't know how to properly model the nodes and edges
without losing mutability of the entire graph.

As an option, I'm considering giving every node and edge an id, and then
end up with a map like:

struct Graph {
nodes: HashMap<int, Node>,
edges: HashMap<int, Edge>,
}

Then a reference from an edge to its from_node would be like:

mygraph.nodes.get_mut(myedge.from_node)

...which looks cleaner, but then we have the memory overhead of the
integer id (times two - one in the map and one within the object
itself), and the runtime overhead of the map lookup instead of just
pointing directly to the node.

And then there's the overhead of Vec and HashMap (overallocation, empty
buckets etc), but let's leave that out for now.

// David