# dynamic_network_discovery_via_infection_tracing__cf3ed95b.pdf

Dynamic Network Discovery via Infection Tracing

Ben Bals1,2 , Michelle D oring3 , Nicolas Klodt3 and George Skretas3

1Centrum Wiskunde & Informatica, Amsterdam 2Vrije Universiteit Amsterdam 3Hasso Plattner Institute, Potsdam bjjb@cwi.nl, {michelle.doering, nicolas.klodt, georgios.skretas}@hpi.de,

Researchers, policy makers, and engineers need to make sense of data from spreading processes as diverse as rumor spreading in social networks, viral infections, and water contamination. Classical questions include predicting infection behavior in a given network or deducing the network structure from infection data. Most of the research on network infections studies static graphs, that is, the connections in the network are assumed to not change. More recently, temporal graphs, in which connections change over time, have been used to more accurately represent real-world infections, which rarely occur in unchanging networks. We propose a model for temporal graph discovery that is consistent with previous work on static graphs and embraces the greater expressiveness of temporal graphs. For this model, we give algorithms and lower bounds which are often tight. We analyze different variations of the problem, which make our results widely applicable and it also clarifies which aspects of temporal infections make graph discovery easier or harder. We round off our analysis with an experimental evaluation of our algorithm on real-world interaction data from the Stanford Network Analysis Project and on temporal Erd os-Renyi graphs. On Erd os-Renyi graphs, we uncover a threshold behavior, which can be explained by a novel connectivity parameter that we introduce during our theoretical analysis.

1 Introduction Predicting the spread of infections requires precise knowledge about the network in which they take place. These spreading processes can be vastly different; they involve everything from diseases, misinformation, marketing material to contaminants in sewage networks. All of these can be modeled in a similar fashion, and we can thus utilize a common algorithmic toolkit for their analysis. Most famously, the influence maximization problem, introduced by [Kempe et al., 2003], wants to find which node in a network should be infected to maximize the number of nodes infected by the spreading process. Other problems include finding a

sensor placement to detect outbreaks as quickly as possible [Leskovec et al., 2007], or to estimate the source (of an infection or rumor) from data about the spreading process [Berenbrink et al., 2023; Luo et al., 2013; Zhu and Ying, 2016]. One common assumption is that the underlying network is known. However, this is not true in many real-world scenarios, and thus, we require algorithms for network discovery. While discovering networks is a fundamental problem in data mining, which can be approached from different angles [Rong et al., 2016; Park and Honorio, 2016], one natural approach is to discover the underlying network from the infection data itself. This idea has been extensively studied [Pouget-Abadie and Horel, 2015; Lokhov, 2016; Daneshmand et al., 2014; Netrapalli and Sanghavi, 2012]. In particular, [Chistikov et al., 2024] consider a model where the party wishing to discover the network may even intervene in the spreading process, e.g., by publishing a social media post and watching its spread. Beyond this application, network discovery is an interesting and relevant problem in its own right. After having discovered the network from infection data, we are free to abstract away from the spreading process and use the resulting network in a host of different ways. This is especially relevant for the study of social networks, both real-world and online, where infection data can reveal underlying structures that are otherwise difficult to observe. To the best of our knowledge, every paper that studies network discovery makes the same simplifying assumption: the underlying network is static. That is, the connections of the network do not change over time. In most applications, that is not a realistic assumption. For example, if two people are linked in an in-person social network, that does not imply that a disease can spread from one to the other at every point in time, but only when they physically meet. Motivated by this fact, researchers have begun to study the classical infection analysis tasks on temporal graphs. Temporal graphs are a model of dynamic networks where the edges only exist at some time steps. This model has received considerable attention from theoretical computer scientists for both foundational problems [Michail, 2016; Danda et al., 2021; Casteigts et al., 2021a; Akrida et al., 2019] and a growing number of applications, including social networks [Casteigts et al., 2021b]. For our purposes, a temporal graph G = (V, E, λ) with lifetime Tmax N is a static graph G = (V, E) with a function λ: E 2[Tmax] indicat-

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

ing that edge e E exists precisely at the time steps λ(e). [Gayraud et al., 2015], are the first to study the influence maximization problem on temporal graphs under the independent cascade model (introduced by [Kempe et al., 2003]). [Deligkas et al., 2024] build on this work and analyze the influence maximization problem on temporal graphs under the SIR model (a standard biological spreading model closely related to the susceptible-infected-resistant model [Hethcote, 1989]). However, no work on network discovery on temporal graphs has been conducted yet.

Our Contribution is twofold: (i) we define the temporal network discovery problem as a round-based, interactive twoplayer game, (ii) we provide algorithms and lower bounds for different parameters of the network discovery game, and validate our results both with theoretical proofs, as well as experiments conducted on real-world and synthetic networks. Statements where proofs or details are omitted to the full version due to space constraints are marked with . In Section 3, we define the two-player game. In each round, the Discoverer (abbreviated D) initiates infections and observes the resulting infection chains, aiming to identify the time labels of all edges in as few rounds as possible. The Adversary (abbreviated A) gets to pick the shape and temporal properties of the graph, with the aim of forcing D to take as long as possible to accomplish their task. In Section 4, we provide the Discovery Follow algorithm, which solves the graph discovery problem in 6 |E(G)| + |ECδ(G)| Tmax/δ rounds, where |ECδ(G)| are the δ-edge connected components. Intuitively, this is a grouping of the edges such that only edges from the same component may influence each other during infection chains. In Section 5, we prove that the running time of the Discovery Follow algorithm is asymptotically tight in the number of edges. Formally, we prove there is an infinite family of graphs such that any algorithm winning the graph discovery game requires at least Ω(|E(G)|) rounds. Crucially, this cannot be improved even if D is allowed to start multiple infection chains per round. We also prove that there is an infinite family of graphs such that the minimum number of rounds required to win the graph discovery game grows in Ω(n Tmax/(δk)), where k is the number of infection chains D may start per round. We finish our theoretical analysis in Section 6, where we explore variations of the graph discovery problem. We analyze the case where the feedback D receives about the infection chains is reduced to infection times. Surprisingly, we are able to show that our Discovery Follow algorithm directly translates to this scenario. We also discuss what happens if D has no information about the static graph in which the infections are taking place. Third, we allow the temporal graph to now contain multiedges or more than one label per edge. In Section 7, we empirically validate our theoretical results. Using both synthetic and real-world data, we execute the Discovery Follow algorithm and observe its performance. We utilize the natural temporal extension of Erd os-Renyi graphs [Casteigts et al., 2022] as well as the comm-f2f-Resistance data set from the Stanford Large Network Dataset Collection [Kumar et al., 2021], a social network of face-to-face interactions. Beyond the running

time, we closely analyze which factors affect the performance of the algorithm. We see that the density of the graph affects the performance since, in dense graphs, it needs to spend less time finding new δ-edge connected components. On Erd os Renyi graphs, we provide evidence that this effect is mediated by the number of δ-edge connected components, which exhibits a threshold behavior in Tmax/(|V (G)| p), where p is the Erd os-Renyi density parameter. This prompts us to give a conjecture on this threshold behavior, which mirrors the famous threshold behavior in the connected components of nodes in static Erd os-Renyi graphs [Erdos et al., 1960].

2 Preliminaries

For n N+, x N, let [x, n] := {x, . . . , n} and [n] := [1, n]. A temporal graph G = (V, E, λ) with lifetime Tmax is composed of an undirected (underlying) static graph G = (V, E) together with a labeling function λ: E 2[Tmax], denoting e E being present precisely at time steps λ(e). We also write V (G) for the nodes of G, and E(G) for its edges. A temporal path a sequence of timed edges (e1, t1), . . . , (eℓ, tℓ) that forms a path in G with strictly increasing labels, i. e., for all i [ℓ], ti < ti+1 and ti λ(ei). Apart from Section 6, we consider simple temporal graphs where each edge has exactly one label. Abusing notation, we therefore also use λ as if it were defined as λ: E [Tmax], and regard temporal paths as the corresponding sequence of nodes. We use the susceptible-infected-resistant (SIR) model, in which a node is either in a susceptible, infected, or resistant state. This model of temporal infection behavior is based on [Deligkas et al., 2024] (and more historically flows from [Kermack et al., 1927] and [Pastor-Satorras et al., 2015]). An infection chain in the SIR model unfolds as follows. At most k N nodes may be infected by D at arbitrary points in time, which we call seed infections denoted as S V [0, Tmax]. Otherwise, a node u becomes infected at time step t if and only if it is susceptible and there is a node v infectious at time step t with an edge uv with label t. Then u is infectious from time t + 1 until t + δ, after which u becomes resistant. Note that if a susceptible node has two or more infected neighbors at the same time, it can be infected by any one of them, but only one. Thus, a given set of seed infections may result in multiple possible infection chains. The infection log of an infection chain records which node was infected by which neighbor at what time. Formally, the infection log is a set L V 2 [0, Tmax], where (u, v, t) L indicates that u infected v at time step t. A seed infection at u at time t is denoted by (u, u, t) L. The infection timetable T V [0, Tmax] records only when a node became infected, omitting which neighbor caused the infection. We call an infection log L consistent with a given set of seed infections S if there is an infection chain seeded with S that produces L. Consistency for infection timetables is defined analogously. While multiple consistent infection logs may exist for a set of seeds, there is exactly one consistent infection timetable.

Lemma 1. Let S V [0, Tmax] be a set of seed infections, and L1, L2 be two infection logs consistent with S. Then the induced infection timetables Ti := {(v, t) | (u, v, t) Li} (for i = 1, 2) are the same, that is, T1 = T2.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

3 Modeling Temporal Graph Discovery

We model temporal graph discovery (TGD) as a game, where the Discoverer D seeks to uncover information about the graph (e.g., edges and their time labels), while the Adversary A influences the infection behavior (e.g., by assigning edge labels) with the goal of delaying D s progress. See Figure 1 for a description of the structure of the game.

Input: Tmax, δ, k, n N 1. D learns the nodes and possibly additional information. 2. In each round, D submits up to k seed infections S and A responds with an infection log consistent with S. 3. To end the game, D submits a temporal graph G to A. A responds with a temporal graph G consistent with all infection logs. If G = G, A wins. Otherwise, D wins.

Figure 1: The temporal graph discovery game

As default, we assume D learns the static edges in Step 1. This is a strong, but convenient assumption, and we later show that many of our results translate to other variations. We measure the quality of a D strategy by the number of rounds it needs to win the game.

Definition 1. For parameters Tmax, δ, k and a static graph G, define the graph discovery complexity as the minimum number of rounds required for D to win any TGD game.

We start off with the simplest algorithm: brute-forcing all combinations of nodes v V (G) and time steps t [Tmax] as seed infections. This will serve as a baseline for comparing more sophisticated algorithms and possible lower bounds.

Theorem 1. There is an algorithm that wins the TGD game in |V (G)| Tmax rounds.

4 An Algorithm for Graph Discovery

Before proposing a better algorithm for TGD, we consider a simpler problem: finding an ideal patient zero.

Definition 2. A node v is an ideal patient zero (IPZ) with time t if seed-infecting {(v, t)} causes every node to become infected. We call (v, t) an IPZ pair.

We adapt the TGD game. D submits either a pair (u, t), representing its guess for an IPZ, or if it believes none exists. A responds with a graph G = (V, E, λ) consistent with all infection logs. A wins if D s guess is incorrect or if is submitted when an IPZ exists. Observe that testing for each node if at every time it could spread an infection (u, λ(u, v) 1) for every neighbor v, does not always find an existing patient zero. Instead, bruteforcing every combination as in Theorem 1, discovers every IPZ pair in Tmax |V (G)| rounds.

Corollary 1. There is a strategy for D that can win the IPZ game in Tmax |V (G)| rounds.

The number of rounds required by the algorithm is trivially bad in its efficiency, as it is brute-forcing the entire graph instead of using the temporal graph s structure at all.

Note, two edges can only participate in the same infection chain if they are connected by a series of edges with time labels differing by at most δ. This leads to a new connectivity parameter for temporal graphs, reflecting the constraint that nodes remain infectious for only a limited time, and thus, infection chains must also respect these timing constraints.

Definition 3. Let G be a temporal graph and δ N+. Consider the relation linking two edges if their time difference at a shared endpoint is at most δ. Let ECδ be the partitioning of edges obtained by taking the transitive closure of this relation. We call the resulting equivalence classes the δ-edge connected components (or δ-ecc s for short) of G.

Any infection chain caused by a single seed node (e.g., an IPZ) must be contained in a single δ-ecc. This motivates Algorithm 1. See Figure 2 for an illustration of an execution.

Algorithm 1: Follow discovers the neighbors of v0. Explore discovers their respective δ-ecc s.

fun Follow(G, δ):

1. Pick node v0 V (G) arbitrarily. 2. For i [0, Tmax/δ ], perform a round with seed infection (v0, iδ). 3. For each edge e = v0u that successfully infects: Explore(e, λ(e)). fun Explore(u, t):

1. For each t {t δ 1, t 1, t}: If there has been no round with seed infection (u, t ), seed an infection (u, t ). Store that this has been done. 2. For each newly infected edge uv with infection time t: Explore(v,t).

The next result is the crucial property allowing us to argue that Algorithm 1 does not miss relevant edges.

Corollary 2. If Follow discovers an edge from a δ-ecc, it discovers the whole component.

This tool in hand, we prove the correctness of Algorithm 1, which requires at most 6 |E(G)| + Tmax/δ rounds.

Theorem 2. The Follow algorithm (Algorithm 1) correctly solves the IPZ problem.

Proof sketch. Observe that Step 2 always finds all edges adjacent to v0 and by Corollary 2, the Explore algorithm discovers the whole δ-ecc s of these edges. Finally, we show that if an IPZ exists, their infection chain must be subset of one of these components.

We now extend this idea to obtain a better TGD algorithm, see Algorithm 2. Note that its running time does not only depend on the static, but also the temporal structure of the graph. Recall that Follow explores precisely the δ-ecc s adjacent to the start node v0 and note that a graph is discovered if and only if all its δ-ecc s are discovered.

Theorem 3. Algorithm 2 wins any instance of the TGD game on G in at most 6 |E(G)|+|ECδ(G)| Tmax/δ rounds.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

1. Finding components at v0

Seed at 0, 2, 4, . . .

v0 Seed infected nodes at t, t 1, t δ 1

2. Explore components

Discovered edge labels and components

Combine information from all rounds

Figure 2: An execution of the Follow algorithm. In this example, δ = 2. We perform the initial search for an edge at v0. Ringed nodes indicate seed infections. Next, we explore the two nodes we found in the initial search by performing seed infections at t, t 1, and t δ 1 where t is the time step we observed the node being infected. We repeat that until we have no such seed infections to perform without repetitions. Since no δ-ecc spans all nodes, we conclude that no IPZ pair exists.

Algorithm 2: The TGD extension of Follow.

fun Discovery Follow(G, δ):

while there is a node v0 with an adjacent edge for which the label is still unknown do

In steps of size δ perform rounds such that v0 is seed-infected at 0, δ, 2δ, . . . , Tmax. For each edge e = v0u which successfully infects from v0 to a neighbor: Explore(u, λ(e)).

5 Lower Bounds for Graph Discovery

We build a toolkit for proving lower bounds in the TGD game. Initially, every edge could have any label. As the game progresses, information is revealed to D. A successful infection attempt determines the edge s label, while an unsuccessful attempt where one endpoint remains susceptible reduces the possible labels by at least one. With this, we can use a potential argument to establish lower bounds on TGD complexity.

Theorem 4. Let G be a temporal graph and Tmax, δ, k parameters for the TGD game. For a sequence of seed infection sets S1, . . . , Sa, define Φ(i) as the sum of the sizes of the sets of consistent labels over all edges after rounds 1 to i. Then, S1, . . . , Sa is a witnessing schedule if Φ(a) = |E(G)|.

Recall that the brute-force algorithm for TGD takes O(n Tmax) rounds (Theorem 1). In the worst case, this can be improved by at most a factor of δk.

Theorem 5. For all δ, k N+ and Tmax 4, there is an infinite family of temporal graphs {Gn}n N such that each Gn has Θ(n) nodes and the minimum number of rounds required to win the TGD game on it grows in Ω(n(Tmax 3)/(δk)).

Proof. Let n, δ N+ and Tmax 4, with n even for simplicity. We construct a temporal graph Gn on vertices v1, . . . , vn. The edges are as follows (with some edge labels already given which A will always assign). First, v1, . . . , vn 2 form a path where every even-indexed edge has label Tmax, Second, vn 1 has an edge to each v1, . . . vn 2 with fixed label Tmax 2, vn 1 and vn share an edge with fixed label Tmax 1, and vn has an edge to each v1, . . . vn 2 with fixed label Tmax. Assume D acts arbitrarily. For edges with fixed labels, A responds that label. For all other edges, A replies infection failed as long as at least one consistent label would remain; otherwise, it replies with any consistent label.

We apply the potential argument from Theorem 4 to all edges that do not have a fixed label. We call those (n 3)/2 edges relevant. Initially, each relevant edge has Tmax possible labels, so Φ(0) (n 3)/2 Tmax. In each round, for every node that gets infected, at most δ labels are removed per adjacent relevant edge, either due to failed infection attempts, or a successful infection with δ or fewer labels remaining. Thus, the potential decreases by at most δk per round. Dividing the initial potential by this maximum decrease shows that any D needs at least (n (Tmax 3))/(2δk) rounds.

5.1 Witness Complexity

The witness complexity of a temporal graph is the minimum number of rounds required for a D, knowing all labels, to convince an observer of the labeling s correctness.

Definition 4. A length a witnessing schedule for a temporal graph G is a sequence of seed infection sets S1, . . . , Sa such that after performing a rounds with the respective seed infection sets, all labels in the graph are uniquely determined by the logs of these rounds. The witness complexity of a temporal graph G is the length of its shortest witnessing schedule.

Observe that the graphs constructed in the proof of Theorem 5 have witness complexity O(n), which is significantly lower than their graph discovery complexity. However, witness complexity is a powerful tool for establishing lower bounds on graph discovery complexity, particularly when seeking bounds independent of Tmax. The following lemma states the formal relationship between the two complexities.

Lemma 2. Let G be a temporal graph and Tmax, δ, k parameters as defined above. Then the witness complexity of G in this instance is at most as large as the graph discovery complexity for the same parameters.

Note however, that the witness complexity technique can only ever be at most the number of edges in a graph.

Theorem 6. For any instance (G, Tmax, δ, k), the witness complexity is at most |E(G)|.

We now show that this worst case is actually tight, and there are graphs that asymptotically require about one round per edge to witness correctly. Observe that this bound is irrespective of k, thus does not have one of the major shortfalls of our previous lower bounds for the TGD game.

Theorem 7. There is an infinite family of temporal graphs {Gn}n N whose witness complexity grows in Ωk(|E(Gn)|).

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

We will now define {Gn}n N. Then we formalize when an infection attempt is relevant, in the sense that it makes meaningful progress towards winning the witness complexity game. Lastly, we show that there can be at most one such relevant infection attempt per round in {Gn}n N. Intuitively, each graph in {Gn}n N contains four node sets: L, R, B, and C. The nodes in L and R form a complete bipartite graph, where the edges between a node in L and all nodes in R are assigned to distinct phases. The nodes in R are connected such that once a node in R is infected, it spreads the infection through R without additional input from L. The sets B and C serve as gadgets to ensure that at the end of each phase, all nodes in L and R become infected via edges outside L and R. This prevents further information being gathered about the labels between L and R, ensuring the lower bound on the complexity of the discovery process. For x N, we construct a temporal graph Gn of size n := 5x. The vertices are given by V (Gn) := L R B C with L := {ℓ1, . . . , ℓx}, R := {r1, . . . , r2x}, B := {b1, . . . , bx}, C := {c1, . . . , cx}. Denote by R2 the nodes in R with an even index. Then the edges are given by E(Gn) := L R2 R2 L B B R B2 C R {bici | i [x]}. The labels on the edges are defined as follows: Let P = {p1, . . . , px} be a set of x edge-disjoint Hamiltonian paths on R. Such paths exists by [Axiotis and Fotakis, 2016]. Note that by their construction, for every path, there are at most two nodes with an odd index in a row. Assume without loss of generality that each pj begins at r2j, and write rp(i,j) for the j-th node from R2 on the path pi. Denote by p 1(i, j) its index on pi. Let k be arbitrary, Tmax := x(4x + 1), and δ := 4x + 1. Now,

for ℓi L, j [x], set λ(ℓi, rp(i,j)) :=

iδ + 4(p 1(i, j)) + 1, for pq P, j [x], set λ((pq)j, (pq)j+1) := qδ + 2j + 1, for bi B, j [x], set λ(bi, r(pi)j)) := iδ + 2j + 2,

for i < j [x], set λ(bi, bj) := (i + 1) δ 2, for i [x], set λ(bi, ci) := (i + 1) δ 1, for i > j [x], set λ(bi, cj) := (i + 1) δ 2, for ci C, rj R, set λ(ci, rj) := (i + 1) δ, for ℓi L, bj B, set λ(ℓi, bj) := (j + 1) δ.

Observe that the temporal edges of Gn can be partitioned into x sets, each having labels in a fixed interval of size δ and for i [x], let Ei = {e E(Gn) | λ(e) [δ i, δ (i + 1) 1]}. We refer to the edges Ei and corresponding intervals as phases. Now, an infection attempt between some li L and r2j R2 is called relevant if (i) it happens at λ(lir2j) and is successful, or (ii) it happens at λ(lir2j) 1, is unsuccessful, and exactly one endpoint was infected before the attempt. The result then follows from the following three properties: (1) for each edge in L R2, there has to be at least one relevant infection attempt to win the witness complexity game, (2) there is at most one relevant infection per phase, and (3) there is at most one phase with a relevant infection per round.

Lower Bound Upper Bound

Basic model Ωk(m) Ok(m + |ECδ(G)| Tmax/δ)

Infection times only Ωk(m) Ok(m + |ECδ(G)| Tmax/δ)

Unknown static graph Ωm,|ECδ(G)|(n Tmax/(δk)) O(n Tmax)

Multilabels Ω|ECδ(G)|(n Tmax/(δk)) O(n Tmax)

Multiedges Ω(n Tmax/(δk)) Ok(m + |ECδ(G)| Tmax/δ)

Table 1: Overview of upper and lower bounds on the number of rounds for different variations of the graph discovery game. A subscript to a Landau symbol indicates variables that the asymptotic growth is independent of.

6 Extending the Model So far, we examined a simple version of the TGD game with restricted assumptions about infection behavior and the knowledge of D. These might seem restrictive and less close to the real-world processes. In this section, we lift these restrictions and show that either the theoretical behavior remains unchanged, or the problem becomes trivial, offering no significant improvement over brute-force. See Table 1 for an overview of the resulting lower bounds and algorithms. In summary, our Discovery Follow algorithm works if D only gets infection-time feedback or if we allow multiedges, lifting the two most restrictive assumptions previously made. We also prove that variations where D needs to ensure the non-existence of a high number of edges or labels (such as the unknown static graph or multilabel variations) do not allow for significant improvements over the trivial algorithm and that our Discovery Follow algorithm is not applicable. Results and proofs of this section have been moved to the appendix due to space constraints.

7 Experimental Evaluation The gap between the lower and upper bounds for the TGD problem is small, but only tells us about the worst-case performance, motivating us to investigate the performance of our TGD algorithm on common synthetic graphs and real-world data. We formulate three hypotheses we aim to test. Hypothesis 1. The number of rounds required to discover a temporal graph is linear in the number of edge labels. This first hypothesis is motivated by the worst-case analysis from Theorem 3. We aim to test how closely the performance of our algorithm in practice matches this theoretical bound. Note, this also takes into account the effect of the additional optimization described in the setup. Hypothesis 2. Graphs with higher density spend fewer rounds on component discovery. The Discovery Follow algorithm works in two distinct phases: (1) the main loop in Algorithm 2 discovers new δecc s (the component discovery phase) and (2) the Explore

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

routine explores the found components (the component exploration phase). Both of these stages require D to spend rounds, and their respective cost is dependent on the structure of the graph to be explored. As we prove in Theorem 3, the component discovery phase is triggered at most |ECδ(G)| times. We hypothesize that |ECδ(G)| tends to be lower in denser graphs as the components tend to merge as more edges are inserted, which would lead to a relatively lower cost for component discovery as compared to component exploration.

Hypothesis 3. In Erd os-Renyi graphs, the percentage of rounds spend on component discovery shows a threshold behavior in Tmax/(|V (G)| p). This is mediated by |ECδ(G)|.

Hypothesis 3 is a specification of Hypothesis 2 for Erd os Renyi graphs. We can view this hypothesis as the temporal extension of the typical threshold behavior that static Erd os Renyi graphs exhibit in np [Erdos et al., 1960]. We also conjecture this behavior holds provably in Conjecture 1.

Setup. We perform our evaluation on two data sets. One is synthetically generated, while the other is real-world data. First, we evaluate our algorithm on Erd os-Renyi graphs. While the model has been developed for static graphs, it is commonly extended to temporal graphs [Angel et al., 2020]. To generate a simple temporal graph from an Erd os-Renyi graph, we simply choose each edge label uniformly at random from the set [Tmax]. This means that Tmax is now the third parameter to generate these graphs, in addition to the classical ones n (the number of nodes) and p (the density parameter). Denote such a graph as ERT(n, p, Tmax). Secondly, to evaluate our algorithm on real-world data, we employ a data set from the Stanford Large Network Dataset Collection [Kumar et al., 2021]. The comm-f2f-Resistance collection is described by the project as a set of 62 dynamic face-to-face interaction network[s] between a group of participants . In our implementation, we employ a small optimization upon Algorithm 2. We skip what we call redundant seed infections. A seed infection (v, t) V (G) [0, Tmax] is redundant if we already know the labels of all edges adjacent to v at the time in the algorithm we would perform this seed infection. This does not depend on t. For an illustration, consider Figure 2. There, the Discovery Follow algorithm would, after discovering the only edge adjacent to the leftmost node by a seed from its other adjacent node, perform seed infections at the leftmost node even though we already know the label of the only edge we could possibly discover. We run Discovery Follow on all graphs from the comm-f2f-Resistance dataset and for a wide range of parameters for the temporal Erd os-Renyi graphs. We test with 1 to 100 nodes in steps of size 5, for probabilities p {.01, .05, .1, .15, .2, .25, .3, .35, .4, .5, .7, .9}. We pick Tmax as a factor of n, testing with Tmax/n {.05, .1, .2, .3, .4, .5, .7, .9, 1, 2, 3, 5, 7, 10}. Similarly, δ is 1 or a multiple of Tmax, namely δ/Tmax {.01, .05, .1, .3, .5} We record the number of rounds needed to complete the discovery and how many of these rounds are spent in the component discovery versus the component exploration phases of the algorithm. Both our implementation and analysis code

are available under a permissive open-source license and can be used to replicate our findings.1

7.1 Results We now critically evaluate our hypotheses against the data thus obtained and compare the effects between the different data sets and parameters. Particularly, we pay attention to evidence on how the different hypotheses interact. Finally, we give Conjecture 1 as a result of our analysis of Hypothesis 3.

Hypothesis 1. To investigate our first hypothesis, we compare how the number of rounds relates to the number of edges. See Fig. 3 for the results on our datasets. We can see that in Erd os-Renyi graphs, the relationship closely follows the 6 |E(G)| line predicted by our theoretical results. Clearly, this effect is influenced by other parameters such as p, Tmax, and n, whose roles we will examine in the discussion of the other hypotheses. However, if we consider graphs with the same p and ratio Tmax/n (i.e., one facet and one color in Fig. 3), we see that the relationship is strictly linear the points form a tightly distributed straight line. We see that the trend for p 0.25 is slightly above 6 |E(G)| and slightly under it for larger values of p. This occurs since, when p is small, more time is spent on component discovery than on component exploration. We will explore this more thoroughly in the discussion of the results regarding Hypothesis 2. In the SNAP data set, the trend is linear in the number of edges, but significantly less than 6 |E(G)| rounds are required to discover the graph. In fact, the gradient of the regression line is only 1.78. This can be explained by the optimization skipping redundant infections as outlined in our setup, as this enables the algorithm to require less than the 6 infections per edge, which would otherwise be strictly required. In summary, while there is a strong linear relationship following the 6 |E(G)| line, the number of rounds also significantly depends on other properties of the graph. We can see these effects in both synthetic and real-world data.

Hypothesis 2. As this hypothesis is about the relationship between graph density and time spent on component discovery, we only analyze it on the Erd os-Renyi graphs. The SNAP dataset does not have significant differences in graph density. In Figure 4a, we plot the percentage of time spent on component discovery dependent on the parameter p (which specifies the density of the graph). We observe a clear and strong, inversely proportional relationship. This leads us to accept Hypothesis 2. This is explained from the theoretical analysis of Discovery Follow, as we expect denser graphs to have fewer δ-ecc s, thus less need to discover new components.

Hypothesis 3. Examining Fig. 4b, we see a threshold between np/Tmax = 0.01 and np/Tmax = 1.00 where we move from spending only a small amount of rounds on component discovery to spending close to all of our rounds on component discovery. Our hypothesis, that this is mediated by the size of the δ-ecc s, is supported by the results in Fig. 4c. The additional differentiation by color for larger values of np/Tmax can be explained since the size of a component is constrained by the size of the graph.

1See https://github.com/Ben Bals/dynamic-network-discovery.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

0.05 0.1 0.2 0.35 0.9

Number of edges

Number of rounds

10.0 Tmax n

(a) Erd os-Renyi graphs for different densities p

Number of edges

Number of rounds

(b) SNAP dataset

Figure 3: Number of rounds by number of edges for the Discovery Follow algorithm on the two data sets. The black line indicates the 6 |E(G)| line (our theoretical bound for the component exploration phase), and the red line is the trend line (i.e., the linear regression).

Percentage of rounds for component discovery

(a) The percentage of rounds the Discovery Follow algorithm spends in the component discovery phase on temporal Erd os-Renyi graphs by density p.

Percentage of rounds for component discovery

(b) The percentage of rounds spent in the component discovery phase by np/Tmax.

Mean component size

(c) Mean size of the δ-ecc s by np/Tmax.

Figure 4: Cost of component discovery in the Discovery Follow algorithm. Subfigures (b) and (c) use logarithmic axes, and the color denotes the number of nodes.

Note that big δ-ecc s imply that we can discover many edges in the component exploration phase for a single node explored in the component discovery phase. That means the plotted percentage shrinks as the average size of δ-ecc s increases. This behavior is similar to that seen in static Erd os Renyi graphs, where if np = 1, there almost surely is a largest connected component (in the classical sense) of order n2/3. And if np approaches a constant larger than 1, there asymptotically almost surely is a so-called giant component containing linearly many nodes [Erdos et al., 1960]. This motivates us to conjecture the following similar behavior. Conjecture 1. Let δ = 1, p [0, 1], Tmax, n N and G := ERT(n, p, Tmax). Then

if Tmax/(np) n c where c < 1, then |ECδ(G)| / |E(G)| n 0, and if Tmax/(np) > 1, then almost surely, all δ-ecc s have size at most O(log n2) = O(log n).

Investigating the connectivity behavior of random temporal graphs in a way that respects their inherent temporal aspects is an interesting and little understood problem [Casteigts et al., 2022; Casteigts et al., 2024]. The authors theoretically study sharp thresholds for connectivity in temporal Erd os-Renyi graphs, and conjecture about a threshold for the emergence of a node-based giant (i.e., linear size) connected component. This conjecture can be seen as capturing the equivalent behavior under waiting time constraints (i.e., where edges are only considered if their edge labels do not differ too much).

8 Conclusion We give a comprehensive theoretical and empirical study of TGD in temporal graphs. Discovery Follow provides an efficient TGD strategy requiring only 6 |E(G)| + |ECδ(G)| Tmax/δ rounds. Our lower bound proves that any algorithm must spend at least Ω(|E(G)|) rounds, showing Discovery Follow is close to optimal. Our empirical analysis highlights the relevance of our theoretical results for practical applications and gives rise to interesting insights of its own. We see that on Erd os-Renyi graphs, the observed performance of Discovery Follow matched our theoretical analysis. On real-world data from the SNAP collection, the algorithm even slightly outperforms our predictions. Finally, we observe a close link between the parameters of the temporal Erd os-Renyi model, the temporal connectivity structure of the resulting graphs, and our algorithmic performance, creating a bridge between our theoretical insights on δ-connected components and their empirical behavior. Future work can tighten the lower bound from Theorem 5 and give a lower bound that is tight in k and δ. Another avenue is to investigate |ECδ(G)| further. In particular, to prove or disprove the observed threshold in Erd os-Renyi graphs (Conjecture 1) and investigate an analogue to the static ln n/n threshold. Finally, future research can explore more variations of TGD, such as finding specific nodes or checking for structural properties instead of discovering the whole graph.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

References [Akrida et al., 2019] Eleni C Akrida, Jurek Czyzowicz, Leszek Gasieniec, Łukasz Kuszner, and Paul G Spirakis. Temporal flows in temporal networks. Journal of Computer and System Sciences, 103:46 60, 2019. [Angel et al., 2020] Omer Angel, Asaf Ferber, Benny Sudakov, and Vincent Tassion. Long monotone trails in random edge-labellings of random graphs. Combinatorics, Probability and Computing, 29(1):22 30, 2020. [Axiotis and Fotakis, 2016] Kyriakos Axiotis and Dimitris Fotakis. On the Size and the Approximability of Minimum Temporally Connected Subgraphs. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55, pages 149:1 149:14, 2016. [Berenbrink et al., 2023] Petra Berenbrink, Max Hahn Klimroth, Dominik Kaaser, Lena Krieg, and Malin Rau. Inference of a rumor s source in the independent cascade model. In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pages 152 162. PMLR, 31 Jul 04 Aug 2023. [Casteigts et al., 2021a] Arnaud Casteigts, Anne-Sophie Himmel, Hendrik Molter, and Philipp Zschoche. Finding temporal paths under waiting time constraints. Algorithmica, 83:2754 2802, 2021. [Casteigts et al., 2021b] Arnaud Casteigts, Kitty Meeks, George B. Mertzios, and Rolf Niedermeier. Temporal Graphs: Structure, Algorithms, Applications (Dagstuhl Seminar 21171). Dagstuhl Reports, 11(3):16 46, 2021. [Casteigts et al., 2022] Arnaud Casteigts, Michael Raskin, Malte Renken, and Viktor Zamaraev. Sharp thresholds in random simple temporal graphs. In Symposium on Foundations of Computer Science (FOCS 2022), pages 319 326, 2022. [Casteigts et al., 2024] Arnaud Casteigts, Timoth ee Corsini, and Writika Sarkar. Simple, strict, proper, happy: A study of reachability in temporal graphs. Theoretical Computer Science, 991:114434, 2024. [Chistikov et al., 2024] Dmitry Chistikov, Luisa Estrada, Paolo Turrini, and Mike Paterson. Learning a social network by influencing opinions. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024). AAMAS, 2024. [Danda et al., 2021] Umesh Sandeep Danda, G. Ramakrishna, Jens M. Schmidt, and M. Srikanth. On short fastest paths in temporal graphs. In Ryuhei Uehara, Seok Hee Hong, and Subhas C. Nandy, editors, WALCOM: Algorithms and Computation, pages 40 51, Cham, 2021. Springer International Publishing. [Daneshmand et al., 2014] Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, and Bernhard Schoelkopf. Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm. In International conference on machine learning, pages 793 801. PMLR, 2014.

[Deligkas et al., 2024] Argyrios Deligkas, Michelle D oring, Eduard Eiben, Tiger-Lily Goldsmith, and George Skretas. Being an influencer is hard: The complexity of influence maximization in temporal graphs with a fixed source. Information and Computation, page 105171, 2024.

[Erdos et al., 1960] Paul Erdos, Alfr ed R enyi, et al. On the evolution of random graphs. Publ. math. inst. hung. acad. sci, 5(1):17 60, 1960.

[Gayraud et al., 2015] Nathalie T.H. Gayraud, Evaggelia Pitoura, and Panayiotis Tsaparas. Diffusion maximization in evolving social networks. In Proceedings of the 2015 ACM on Conference on Online Social Networks, COSN 15, page 125 135, New York, NY, USA, 2015. Association for Computing Machinery.

[Hethcote, 1989] Herbert W. Hethcote. Three Basic Epidemiological Models, pages 119 144. Springer Berlin Heidelberg, Berlin, Heidelberg, 1989.

[Kempe et al., 2003] David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 137 146, 2003.

[Kermack et al., 1927] William Ogilvy Kermack, A. G. Mc Kendrick, and Gilbert Thomas Walker. A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 115(772):700 721, 1927.

[Kumar et al., 2021] Srijan Kumar, Chongyang Bai, VS Subrahmanian, and Jure Leskovec. Deception detection in group video conversations using dynamic interaction networks. In ICWSM 2021. International AAAI Conference on Web and Social Media, 2021.

[Leskovec et al., 2007] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne Van Briesen, and Natalie Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 07, page 420 429, New York, NY, USA, 2007. Association for Computing Machinery.

[Lokhov, 2016] Andrey Lokhov. Reconstructing parameters of spreading models from partial observations. Advances in Neural Information Processing Systems, 29, 2016.

[Luo et al., 2013] Wuqiong Luo, Wee Peng Tay, and Mei Leng. Identifying infection sources and regions in large networks. IEEE Transactions on Signal Processing, 61(11):2850 2865, 2013.

[Michail, 2016] Othon Michail. An introduction to temporal graphs: An algorithmic perspective. Internet Mathematics, 12(4):239 280, 2016.

[Netrapalli and Sanghavi, 2012] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. ACM SIGMETRICS Performance Evaluation Review, 40(1):211 222, 2012.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

[Park and Honorio, 2016] Keehwan Park and Jean Honorio. Information-theoretic lower bounds for recovery of diffusion network structures. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 1346 1350. IEEE, 2016. [Pastor-Satorras et al., 2015] Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. Epidemic processes in complex networks. Reviews of modern physics, 87(3):925 979, 2015. [Pouget-Abadie and Horel, 2015] Jean Pouget-Abadie and Thibaut Horel. Inferring graphs from cascades: A sparse recovery framework. In Proceedings of the 24th International Conference on World Wide Web, WWW 15 Companion, page 625 626, New York, NY, USA, 2015. Association for Computing Machinery. [Rong et al., 2016] Yu Rong, Qiankun Zhu, and Hong Cheng. A model-free approach to infer the diffusion network from event cascade. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 16, page 1653 1662, New York, NY, USA, 2016. Association for Computing Machinery. [Zhu and Ying, 2016] Kai Zhu and Lei Ying. Information source detection in the sir model: A sample-pathbased approach. IEEE/ACM Transactions on Networking, 24(1):408 421, 2016.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)