Analysis of pick-up and delivery and dial-a-ride problems dynamization methods and benchmark instances

As instâncias de benchmark disponíveis para as versões dinâmicas do problema de coleta e entrega com janelas de tempo (PDPTW - Pickup and Delivery Problem with Time Windows) e do problema dial-a-ride (DARP - Dial-A-Ride Problem) não compartilham as mesmas características e não necessariamente cobrem todas as características de situações reais. Analisa-se conjuntos de instâncias de PDPTW e DARP dinâmicos (DPDPTW e DDARP) atualmente disponíveis para uso e os métodos usados para gerá-los a partir de instâncias estáticas. Cada método de dinamização é aplicado a cada instância estática originalmente usada por eles. As instâncias dinâmicas resultantes são analisadas com as medidas de grau de dinamismo e urgência, bem como pelo número de pedidos estáticos e a correlação entre os limites inferiores das janelas de tempo de coleta e os instantes de chegada dos pedidos. Os resultados mostram que os conjuntos estudados apresentam baixa variabilidade de grau de dinamismo e urgência independentemente do método ou da instância estática usados para a dinamização.


INTRODUCTION
Dynamic vehicle routing problems have been the subject of research for nearly three decades (Psaraftis et al., 2015). Derived from classic vehicle routing problems (VRP), such as the dial-aride problem (DARP) and the pickup and delivery problem with time windows (PDPTW), the dynamic problems seek to model cases in which one or more parameters of the problem are not fully known a priori and may vary during the period of operation.
Among the dynamic vehicular routing problems, the dynamic dial-a-ride problem (DDARP) (Psaraftis, 1988) and the dynamic pickup and delivery problem with time windows (DPDPTW) (Dumas et al., 1991) are of great interest for the development of new urban transport technologies. These are the problems that need to be solved when a dynamic ride-sharing service is needed (Agatz et al., 2012;Alonso-González et al., 2018), or when timely parcel delivery is required (Pankratz, 2005). Currently, some companies provide such services (UberPool, Via, UBus, UberEats, Rappi, etc.). However, with the expected technological advances in the area of connected vehicles, automated driving and the diversi6ication of public transport introduced mainly by mobility as a service (MaaS) systems, algorithms for solving DDARPs and DPDPTWs in less time and providing a better result are increasingly necessary (Fulton et al., 2017).
The idea of using computational experiments in this context is to be able to generate results on the ef6iciency and computational time required by different algorithms and methods that are available for the solution of these problems in order to compare their performance. In order to obtain results that can be compared between articles without the need of rerunning the computational experiments already performed by other authors, it is necessary that the same scenarios are used. Moreover, a series of characteristics should be present in these scenarios to re6lect real applications (Uchoa et al., 2017). In the area of static VRP it is common to have extensively used sets of canonical scenarios that facilitate the comparison between algorithms (Mendoza et al., 2014, Uchoa et al., 2017. These are called sets of benchmark instances. However, the benchmark instances for dynamic vehicle routing problems (Pillac et al., 2013;Maciejewski et al., 2017) do not share the same characteristics and may not cover the full range of characteristics from real situations.
The purpose of this article is evaluating benchmark instances of DDARPs and DPDPTWs that are accessible and available for use and the dynamization methods used for obtaining these dynamic instances from static instances. The focus is on how requests are distributed throughout the systems' period of operation. Thus, two measures proposed by Van Lon et al. (2016), urgency and degree of dynamism, that help in identifying the temporal characteristics of the instances, are used. The dynamization methods are brie6ly described. Then, each method is applied to different sets of static instances and the degree of dynamism and urgency of the resulting dynamic instances are evaluated. Given that a good set of dynamic instances must cover most of the degree of dynamism and urgency spectrum, our aim is to evaluate the dispersion of these metrics for some datasets. The number of static requests and the correlation between lower limits of the pickup time window and the requests arrival times are also analyzed. It is shown that dynamic instances analyzed present low variability in their measures of degree of dynamism and urgency.
It is expected that the provided results will ease the process of searching and selecting sets of dynamic instances for use in computational experiments and in testing new algorithms. Alternatively, the results may assist in the selection of a method for generating dynamic instances that are more appropriate to the research goal.
The de6initions of the problems of interest are presented in Section 2. In Section 3, the sets of dynamic benchmark instances and dynamization methods are described. Section 4 introduces the measures of degree of dynamism and urgency, followed by an assessment of the dynamization methods and the dynamic instances obtained from static instances. Finally, Section 5 provides the concluding remarks.

FORMAL DEFINITION OF DYNAMIC PROBLEMS
This section presents the formal de6initions of the DARP, the PDPTW, the DDARP and the DPDPTW based on Cordeau and Laporte (2003). To this end, 6irst the de6inition of the static DARP is presented, based on which the other three problems are then de6ined.

DARP
The DARP consists of a set of passenger requests for transport between different pickup and delivery locations that must be met by a 6leet of vehicles capable of carrying more than one passenger at a time. The goal is to 6ind a set of routes for the vehicles in the 6leet that minimizes the time and/or cost to ful6ill all requests. Each request has a pickup location and an associated time window that identi6ies the upperand lower-time limits in which the user wishes to be picked up for the trip. Similarly, the transport request also has a passenger destination and a time window for delivery. Despite of having de6ined a time window for the desired start and end of his trip, the passenger also expects his journey to take no more time than what he considers acceptable.
The DARP can be de6ined by a complete directed graph ( , ), where are the nodes and are the arcs of the graph, with = ∪ ∪ , , = , … , , = , … , , and the number of requests. The subsets and contain, respectively, the pickup and delivery nodes of the requests, while the nodes and represent the origin and destination of the vehicles. All vehicles in the 6leet must start their routes at node and end them at node . To each request ∈ 1, … , a pickup node ∈ and a delivery node ∈ are associated. To each arc , ∈ is associated a cost ( , ) and a travel time ( , ) .
Each vehicle ∈ , with the set of available vehicles, has a capacity ! and a maximum total route time " ! . To each node ∈ there is an associated load # and a non-negative service time $ , with $ = $ = 0 and # = −# . The pickup time window is de6ined by '( , ) *, with ( and ) denoting the lower and upper limit for the start of pickup at node , respectively. Analogously, the delivery time window is given by '( , ) * for delivery at node . The maximum travel time of a request, , , is determined by the amount of time the passenger considers acceptable for his journey.
Finally, a time interval, called planning horizon, is de6ined as '0, -*. The time that corresponds to instant zero represents the beginning of the operation, when all vehicles are in the initial node ( ) and all users are waiting to be picked up at their respective origins. The time instantdenotes the end of the operation, when all the vehicles have completed their routes and are at the 6inal node ( ), having taken all users from their respective origins to their destinations. The time windows of every node must be contained in the time interval '0, -*.

PDPTW
As in the DARP, the PDPTW has a set of transport requests with different origins and destinations and associated time windows. It also has a 6leet of vehicles with the capacity to handle more than one request at a time. However, the requests in a PDPTW refer to the transport of goods instead of passengers. For that reason, the only difference between the de6inition of the DARP and the PDPTW arises (Parragh et al., 2008). In the de6inition of the DARP presented in Subsection 2.1, the parameter , represents the maximum travel time of a request, which limits the total time that a passenger wishes to remain in the vehicle. However, for the PDPTW this restriction is not necessary since the cargo does not suffer any discomfort with the delay in travel time. Therefore, the same de6inition presented previously for the DARP can also be used for PDPTW, however, for the latter, the parameter , = ∞, ∀ ∈ . TRANSPORTES | ISSN: 2237-1346 106

DDARP and DPDPTW
In the de6initions of the DARP and the PDPTW, presented in Subsections 2.1 and 2.2, respectively, requests are fully known before solving the problem and stay immutable through the solution application. Therefore, they are static problems (Psaraftis, 1988). Differently, dynamic problems receive data in real time during operation. In this case, requests are sent by users at any time between the start and end of the planning horizon, requiring new computation of solutions.
In dynamic problems each request has an arrival time 0 , ∀ ∈ , which represents the exact moment the transportation system receives the request's data. This new data can then be used to recalculate vehicle routes. Because of this, all request arrival times must be less or equal than the lower limit of the pickup time window (( ).
The DARP and the PDPTW de6initions previously described are therefore modi6ied with the addition of this set of parameters, i.e., the requests arrival times, and the corresponding constraint. This results in a succinct de6inition of their respective dynamic problems: the DDARP and the DPDPTW.

SETS OF BENCHMARK INSTANCES AND DYNAMIZATION METHODS
This section presents the sets of dynamic benchmark instances of the DDARP and the DPDPTW and the dynamization methods used to derive them from static benchmark instances. Berbeglia et al. (2012), Pureza e Laporte (2008), Pankratz (2005) and Fabri and Recht (2006) apply each a different method to convert static instances into dynamic ones. These sets were chosen because the related data is freely available for access on the internet (Pankratz e Krupczyk, 2009). The static characteristics of the instances are not presented in this article; for detailed information, the reader is referred to the corresponding articles cited in each of the following subsections. Tables summarizing the static instances are provided by Eccel (2019).
Two other sets of dynamic instances are freely available.. One set was created arti6icially in order to replicate the behavior of an urban environment, with peak hours and a concentrated demand in the city center (Gendreau et al., 2006). The other set is based on real data, collected from one medium and one large sized courier companies operating in Vancouver, Canada (Mitrovic-Minic and Laporte, 2004). Since these two sets do not involve dynamization, they are not analyzed in this paper. The reader is referred to Eccel and Carlson (2019) for an analysis of urgency and degree of dynamism of these two sets. Berbeglia et al. (2012) used two different sets of instances for their computational experiments, each derived from different sets of static instances. In the 6irst set they used the static instances proposed by Ropke et al. (2007) as a basis. In the second, the static instances presented by Cordeau and Laporte (2003) were used. In both cases, they chose to use instances whose number of requests was forty or more.

DDARP instance sets and method proposed by Berbeglia et al. (2012)
Both sets were dynamized using the following technique. A pair of parameters (1, 2) was de6ined. Parameter 1 ∈ '0,1* de6ines the percentage of requests known at the beginning of the time horizon. If 1 = 0 the problem is completely dynamic. If 1 = 1 the problem is completely static and all requests are known in advance. The parameter 2 represents a time interval for the system to react to dynamic requests. That is, the interval between the arrival of a request and its pickup is always greater than 2.
Given a request , the value 0 345675 is an arrival time upper limit which enforces the possibility to serve the request in time. Thus: (1) Therefore, the request arrival time can be de6ined as: Berbeglia et al. (2012) use the parameters 1 = 0.25 and 2 = 60 min for the dynamization of the set of instances presented by Ropke et al. (2007). For the set of instances from Cordeau and Laporte (2003), the conversion into a set of dynamic instances used the parameters 1 = 0.25 and 2 = F(60; 240), with F(0; I) representing a random variable with uniform distribution in the interval '0, I*. In addition, Berbeglia et al. (2012) uses the Euclidean distance as a value for the travel time between two points, thus ( , ) can be computed using the distance between and . In this paper, when the dynamization method by Berbeglia et al. (2012) is applied, we make 1 = 0, since our interest is to obtain completely dynamic cases. Note, however that choosing 1 = 0 does not mean that static request will not exist, but just that this type of request is not enforced. The same values of 2 are used.

DPDPTW instance sets and method proposed by Pureza and Laporte (2008)
The DPDPTW instances proposed by Pureza and Laporte (2008) were generated by dynamizing the static instances with 100, 200 and 400 nodes (50, 100 and 200 requests) proposed by Li and Lim (2003). Pureza and Laporte (2008) de6ine the request arrival time as:

DPDPTW instance sets and method proposed by Pankratz (2005)
Pankratz (2005) created two sets with a total of 5600 DPDPTW instances. These instances are based on the 100-node PDPTW instances proposed by Li and Lim (2003). Pankratz (2005) dynamized the instances by calculating the last arrival time for each request ∈ by: (4) Subsequently, each request is assigned an arrival time calculated by: with M ∈ '0.1,1.0*, in steps of 0.1. Like 2, the parameter M also represents a reaction time to dynamic requests, however, instead of being a value of time it is a percentage. For each of the 6ifty-six PDPTW instances, ten DPDPTW instances were generated, one for each of the possible values of M, resulting in a total of 560 dynamic instances.

DPDPTW instance sets and method proposed by Fabri and Recht (2006)
The instances proposed by Fabri and Recht (2006) were based on all PDPTW instances of Li and Lim (2003). To make the instances dynamic, they draw the request arrival time from: 0 = F 0; min;( , ) − ( , ) < .
Compared to the previous three dynamization methods, this method does not show an explicit parameter for the reaction time to dynamic requests. This is implicitly modelled using the uniform distribution for the de6inition of the request arrival time.

ANALYSES OF BENCHMARK INSTANCES AND DYNAMIZATION METHODS
In this section, we introduce the metrics of urgency and degree of dynamism proposed by Van Lon et al. (2016). These metrics are then used for analyzing the dynamization methods described in Section 3. We applied the four dynamization methods by Berbeglia et al. (2012), Fabri and Recht (2006), Pankratz (2005), and Pureza and Laporte (2008) to all the static instances in Ropke et al. (2007), Cordeau and Laporte (2003), Li and Lim (2003). This resulted in twelve generated dynamic datasets (sets of dynamic instances) to be evaluated with respect to urgency and degree of dynamism. Because some methods, nevertheless, create static requests, their number is computed. As seen in the previous section, the lower limit of the pickup time widow plays an important role in some methods, therefore their correlation with the requests' arrival times is also analyzed.

Degree of dynamism
For Van Lon et al. (2016), the degree of dynamism measures the continuity with which transport requests are received by the system. In other words, it relates to the distribution of request arrival times within the planning horizon. Therefore, the more distributed the request arrival times are, the higher the value of the degree of dynamism is. The degree of dynamism ranges from zero to one, with zero being a scenario in which all requests take place at the same time and one a scenario in which requests are equally spaced within the planning horizon. Figure 1 shows six different hypothetical instances, each one with a different degree of dynamism. In the 6igure, each graph depicts a scenario with a total of ten dynamic requests. In Figure 1(a), all arrival times are equally spaced and evenly allocated at the planning horizon. This scenario is considered to have a high degree of dynamism (equal or close to 1). From Figure  1(b) to Figure 1(f) the intervals between arrival times of request are gradually decreased creating clusters of requests that, eventually, become a single cluster. In Figure 1(f) all the request arrival times are roughly equal, thus the corresponding scenario has low degree of dynamism (equal or close to zero) (Van Lon et al., 2016). ∆= P , P , P Q , … , P R , = ;0 − 0 | T = + 1 ∀ ∈ <, (7) with P the interarrival time between the requests and + 1 and |∆| = − 1 the list size. Note that the interarrival times in ∆ correspond to intervals for which the transportation system is unchanged in terms of solution. Furthermore, the list is organized in chronological order.
Van Lon et al. (2016) also de6ine a perfect interarrival time, V, that represents the scenario of 100 percent degree of dynamism: (8) This enables the computation of interarrival times deviation, X , with respect to the perfect interarrival time: • X R , if > 1 and P < V 0, otherwise.
Then, the total scenario deviation is de6ined as: However, the total scenario deviation should be normalized by its maximum possible value (h̅ ), which is de6ined by:

Urgency
The urgency (q ) represents the reaction time available to the transport system so that it can ful6ill the request and is given by (Van Lon et al., 2016): (14) Figure 2 shows two requests with distinct values of urgency. The case in Figure 2(a) represents a scenario with high urgency whereas the case in Figure 2(b) represents a scenario with low urgency. Note the longer q for the latter.
Since the urgency value represents the available reaction time of one request, low values of q are related to high urgency requests. The mean and standard deviation of the urgency of all requests gives a measure of urgency for a given instance.

DistribuAon of the degree of dynamism and urgency
Each graph in Figure 3 represents the result of a different generated dynamic dataset. Each point in the graph corresponds to the normalized average urgency (vertical axis) and dynamism (horizontal axis) values of one dynamic instance of a set. The average urgency normalization was done in such a way that the value zero represents an average urgency equal to zero and the value one represents the highest average urgency found within all the generated dynamic datasets. The accumulation of points in Figure 3 shows the lack of diversity in the generated dynamic datasets. None of the generated dynamic datasets evenly cover the dynamism and urgency spectrum. Most of them have low degree of dynamism, while the slightly larger degrees of dynamism obtained from the static datasets of Li and Lim (2003) may be related to the characteristics of the instances.
Clearly, Pankratz (2005) obtained a better distribution of values in the whole range of urgency. The values of urgency seem to be more related to the method used, although again the static dataset from Li and Lim (2003) seems to result in dynamic instances covering almost the whole range of urgency values, except when the method by Berbeglia et al. (2012) is used. As matter of fact, in all three dynamic datasets created using the method by Berbeglia et al. (2012) the urgency values accumulated in one or two clusters. This is a re6lex of using only a small set of values for β. In the case of the static datasets by Ropke et al. (2007) and Cordeau and Laporte (2003) it is easy to distinguish between the instances that are generated using a 6ixed value of β (the lower cluster) and the ones that are generated using a uniform distribution (the upper cluster). Figure 4 shows the same values of degree of dynamism (left) and urgency (right) shown in Figure 3, however in the form of a box plot. The left side of each box represents the 6irst quartile , i.e., to its left lie the 25% instances with lowest degree of dynamism (or urgency). The right side of the box represents the third quartile Q , i.e., to its left lie the 75% instances with lowest degree of dynamism (or urgency). Therefore, the boxes cover, for each case, the degree of dynamism (or urgency) of 50% of the instances. The median of the degree of dynamism (or urgency) of each dynamic dataset is marked by a vertical line within the box and the lower (,,) and upper (r,) limits are delimited by the vertical line segments outside the boxes, whose values can be calculated by: with the interquartile range given by: The diamonds indicate outliers, i.e., values not contained in the interval ',,, r,*. All generated dynamic datasets have medians of the degree of dynamism of at around 0.2 or less and a high concentration of instances with degree of dynamism of less than 0.3. It is worth noting that the greater the dynamism, higher is the number of optimization calls. Therefore, dynamic datasets with a concentration of low degree of dynamism can bene6it algorithms that return good results at the cost of a long computation time.
Another point to be highlighted is the scarcity of instances with dynamism between 0.45 and 0.6. Van Lon et al. (2016) claim that this range of dynamism values occurs in scenarios generated by homogeneous Poisson distributions. Bearing in mind that the arrival of travel requests in real world DARP systems happens in a way that resembles a homogeneous Poisson distribution (Schilde et al., 2011), the lack of instances with these dynamism values hinders the analysis of realistic scenarios.
The normalized average urgency boxplot in Figure 4 shows a diversity of distributions. Li and Lim (2003), despite having a good coverage of urgency spectrum, show an accumulation of low urgency values, which over evaluate algorithms that value short-term results.

CorrelaAon between lower limits of pickup Ame windows and requests arrival Ames
By the description of dynamism brie6ly presented in Section 4.1, one can note that the interarrival times between requests are the main factors determining the degree of dynamism of an instance. Therefore, a dynamic dataset will have instances whose degrees of dynamism are different from each other, if the arrival times distribution is different between instances (Van Lon et al., 2016).
However, in Section 3, most of the dynamization methods do not allow the diversi6ication of time intervals between instances. The only exception being Pankratz (2005) (Section 3.3), who varies the value of 2 ensuring that one static instance will generate a couple of dynamic instances with different reaction times. Nevertheless, Pankratz (2005) still failed to achieve a very wide range of dynamism (Figures 3 and 4).
Among the dynamization methods presented in Section 3, it is common to use pickup time window limits to obtain arrival times, which makes the time windows distribution affect directly the arrival times distribution. Therefore, if the distribution of time window limits has an accumulation of values, there is a possibility that this accumulation will be passed on to the arrival times distribution. Figures 5 and 6 show the histogram and cumulative frequency of lower and upper limits of pickup time windows for the sets of static instances used in this work, all normalized by their respective planning horizon. All the histograms in Figure 5 show an aggregation at the beginning of the planning horizon. The accumulated frequency shows that all static instances have more than 75% of their lower limit of pickup time windows before the middle of the planning horizon.
In Figure 6 it is shown that 50% of the requests by Ropke et al. (2007) and Cordeau and Laporte (2003) have their upper limit delivery time windows at the end of the planning horizon, and also have a range (between 0.4 and 0.9) where there is no occurrence. Since some dynamization methods use the upper limit of delivery time window to establish the request arrival time, this "dead zone" can dif6icult the creation of requests with arrival time around this zone. Table 1 shows correlation values between arrival times and pickup time windows lower limits. The highest correlations are perceived for the dynamization methods proposed by Berbeglia et al. (2012) and Pureza and Laporte (2008). The two other methods have a slightly lower correlation, which can be explained by the use of random variables taken from an uniform distribution in the dynamization method by Fabri and Recht (2006) and by the reaction time parameter variation applied by Pankratz (2005).

Presence of staAc requests
The analysis of arrival times shows that the Fabri and Recht (2006) and the Pureza and Laporte (2008) dynamization methods generate many requests with arrival time equal to zero, which, by de6inition, are considered static requests. Table 2 shows the percentage of requests with arrival time equal to zero for all the generated dynamic instances. The percentage of static requests is an important feature of instances. They represent an initial condition of the system. Therefore, it is a good practice to have a ranging value of static requests percentages in benchmark datasets for testing different initial conditions. Table 2 shows that most of the generated dynamic datasets fail to cover a good range of static instances percentage. The only two exception being Li and Lim (2003) by Fabri and Recht (2006) and by Pureza and Laporte (2008). The method proposed by Pankratz (2005) does not generate any static instance. However, in his work some additional parameters are used to address this condition, if needed. It is believed that this side effect is also caused because of the use of pickup and delivery time windows limits combined with the use of static instances that have an accumulated distribution of these values, especially at the beginning of the planning horizon. Therefore, when using dynamization methods, care must be taken that they do not generate too many static requests, which can hinder the analysis of algorithms made to deal with dynamic requests.

CONCLUDING REMARKS
This article succinctly presented sets of benchmark instances for the DDARP and DPDPTW and the methods used for generating them from static instances. We performed the analysis using degree of dynamism and urgency metrics proposed by Van Lon et al. (2016) for a series of combinations of static benchmark datasets and dynamization methods presented. The number of static requests and the correlation between lower limits of the pickup time window and the requests arrival times were also analyzed. It was observed that all the dynamization methods generate dynamic benchmarks with little variability in relation to the degree of dynamism and urgency values for the used static sets. This is mainly caused by the fact that the dynamization methods work with a low variability of parameters and too much dependency on the pickup time window limits. This is an unwanted feature for dynamization methods.
Benchmark instances and dynamization methods with a wide variety of characteristics help to test different aspects of algorithms and can favor the development of more 6lexible methods, which can be used in real situations with less risk of failure (Uchoa et al., 2017).
It is hoped that this paper serves as a basis for other researchers in the 6ield of dynamic vehicle routing who are interested in studying the behavior of solution algorithms for DDARP and DDPDTW through computational experiments. It should be noted that all the data from the sets TRANSPORTES | ISSN: 2237-1346 115 of benchmark instances that are characterized and analyzed in this paper are freely available for consultation and use, as well as all the source code used for their analysis (Eccel, 2020). For future work, we suggest an analysis of the spatial factors of the instances, especially the distribution of the pickup and delivery locations. The investigation of new methods for converting static instances into dynamic instances is recommended if more variability of the degree of dynamism and urgency is required.