MADpre - a MADCAP code for map-making preconditioner pre-computation| Contact info | Radek Stompor's home page | PurposeMADpre computes a preconditioner matrix useful in iterative map-making solvers based on the Preconditioned Conjugate Gradient (PCG) algorithm, as for example the one implemented in the MADmap code. For polarized data sets it also permits a determination of a subset of all observed pixels for which the separation of all three Stokes parameters (I,Q,U) is possible.MADpre can be applied to any data produced by a single dish experiment and for which the linear data model of the time ordered data, i.e., d = A s + n, is applicable. Here, d is a measurement, s - a sky signal, n - instrumental noise, and A - a "pointing" matrix. The preconditioner matrix, P, is then defined as, P = (A^t where N characterizes the time domain noise. As a by-product the code computes "naive" (simple binned) maps, i.e., estimates of the sky signal, s. Those are given by, m = P A^t The following observations can be made: > Top of the page Running MADpreCompilation> back Calling syntax[MADpre executable] [m3 config file (string of characters)] [# of samples per proc (int)] [# of pixels per proc (int)] [processing gang size (int)] [data gang size (int)] [in/out of core (1/0)][# data gang size] is useful if the processed data set is made of many single detector data, which are to provide a single estimate of the sky signals. In such a case the parameter defines how many processors will be processing each single detector data set, implying that the data of [# of all procs]/[# data gang size] detectors are then processed simultaneously. If the ful data set contains only one detector data then [# data gang size] has to be set equal to [# of all procs]. If the detectors number is larger than [# of all procs]/[# data gang size], each data gang will process multiple single-detector-data sets one after another. [# processing gang size] defines the size of the processing gangs. MPI communication in the code is performed only between processors of the same processing gang. Effectively, the time domain data are reduced by the code in two steps. First, each processing gang reduces the data from each of the member procs producing single output objects (be it submap or part of the preconditioner). On the second step the objects produced by different processing gangs are combined together. This second reduction step is done via the disk. At any given time there is at most one I/O process per processing gang. The processing gang size should not be therefore too small to avoid I/O contention, however if it is too large the communication overhead may however become important. > back Run modesMADpre uses the disk to store some temporary, intermediate products of the processing in order to save the memory. The usage of the disk can be minimized by requesting in-the-core mode. This can be done by setting the last command line parameter to 1. The in-the-core mode is possible only if the pixel range defined on the command line exceeds a total number of pixels allowed in the pixelization scheme as defined in the input XML config file. If that is not the case MADpre will continue in the "out-of-the-core" mode. The latter can be forced by setting the last command line parameter to 0.> back Input/OutputThe input files are all described in the M3 XML config file the name of which is provided to MADpre on the command line. Check there for more details.On the output MADpre produces two kinds of 'legitimate' output files, but also leaves some temporary outputs which may (or may not) be useful for a user. All the files are binary and therefore non-portable. (This is because of the endian conventions on one hand but also different padding schemes adopted on different platforms.) To alleviate this problem to some extent we provide a simple and easy to use IDL routines to read the output files. The official MADpre outputs are: > back The temporary (not erased) outputs: The temporary files contain the complete information about the weights (i.e., a preconditioner matrix prior to the inversion) however it may be potentially distributed over mutliple files, each corresponding to a different pixel range as defined on the command line. Note that the weight matrix includes all pixels ever observed, i.e., also those for which the inversion may fail and therefore will not be included in the preconditioner matrix stored in the prec.m3 file. The temporary files are called : tmpStokesFile_p0_p1.wght, where p0 = i [specified pixel range]; p1 = p0 + [specified pixel range] and thus p0 and p1 define the pixel range, p0 <= p < p1, of pixels weights of which are stored in a given file. Only files corresponding to the observed pixel ranges are stored. In addition the files also contained the "noise weighted, simple binned map" a.k.a. the rhs of the map-making equation, e.g., A^t diag N^{-1} d. These files are saved as simple binary dumps with a content as follows: # of low resolution pixels (4 byte integer), denoted below np; # of all pixels (low and high resolution) per low resolution pixels (int4), denoted nb; # of elements in the upper triangle of each of the diagonal blocks assigedn to each low resolution pixel (int4), denoted nm; # of high resolution pixels per each low resolution pixel (int4), denoted nh. if the last number, nh equals to 1, i.e., we consider a single resolution case then: nb is a number of observed Stokes params, nm = (nb+1)nb/2, and the remaining file content is as follows: a vector of a length np of 4 byte ints, storing the observed pixel numbers; a vector of a length nb np of 8 byte reals, storing the submaps for the observed pixels in a given pixel range; a vector of a length nm np of 8 bytre reals, storing all the upper triangles of the weight matrix blocks for each observed pixel from a given range. if nh is larger than 1, i.e., we consider a multiresolution case then: nb is either nh+2 if a Stokes I map is of a higher resolution; or 2 nh + 1 if Stokes Q and U are high resolution, nm = (nb+1)nb/2, and the following objects are stored subsequently: a vector of a length np of 4 byte ints, storing the observed low resolution pixel numbers; a vector of a length nh np of 4 byte ints, storing the observed high resolution pixel numbers for each observed low resolution pixel. Unobserved high resolution pixels are marked by -1; a vector of a length nb np of 8 byte reals, storing the submaps for the observed pixels in a given pixel range; a vector of a length nm np of 8 bytre reals, storing all the upper triangles of the weight matrix blocks for each observed pixel from a given range. A simple reading IDL routine can be found here. Note that these outputs will be erased by successive runs of MADpre initiated from the same directory. > back PerformanceMADpre performs calculations in two steps.(i) on the first step it reads consecutively segments of the input time domain (measurements and pointing) data and "projects" them to a pixel domain storing only information about those pixels, which belong to a given interval of pixels. Then it loops over the pixel intervals. The collected information for each pixel interval is written on the disc as a temporary file (in the out-of-core mode). (ii) on the second stage, MADpre rereads the temporary files, distributes pixel domain object over all processors, and performs all the pixel domain operations (matrix block inversion, map estimation etc). The final output is saved at the end of this step. This is the first step which is the most time and memory consuming. The total memory used by MADpre is defined by two input, command line parameters (but see Memory considerations below): (ii) a TOD segment length to be read and stored in a memory (parameter #4) (= ntod). Ceil[ npixTot/npixPerProc], while the number of extra flops added as the results of this is roughly (ntotTod is a total length of the time ordered data processed), Ceil[ npixTot/npixPerProc] ntotTod. For this reason it is advantagous to keep this ratio as small as possible. On the other hand the performance is roughly independent on ntod, at least as long as a disc access latenancy is shorter then the time required for a single data read of ntod data. Consequently, it is often advantagous to limit the memory usage by decreasing ntot as much as it is reasonable, and set npixPerProc to be as large as allowed by the remaining memory. However, if the disc access is very slow it may be better to set ntod as large as allowed for by the available memory and keep npixPerProc rather small, even if that leads to extra, spurious flops. MADpre estimates the required memory for each run given input paramaters and outputs in stdout. > back Memory considerationThe memory allocated by MADpre depends on the two input parameters [# of samples per proc] and [# of pixels per proc]. They define the sizes of time and pixel domain objects stored in the memory by the code. MADpre estimates the memory required for these objects.However, MADpre uses the M3 library to perform input. M3 routines allocate internally memory as needed and some of the allocated M3 objects are preserved throughout the entire run-time of the code. They therefore may contribute significantly to the memory budget of MADpre. The specific case relevant for MADpre is that of computation pointing information. Often such information is read from the disk only in a compressed form and then unfolded on-the-fly by M3 and GCP libraries routines whenever needed. The unfolding requires some basic information about the pointing to be stored at any time of the code performance leading to often substantial overhead in the memory requirements of the MADpre code. An important point is that size of such resident information does not depend on either [# of samples per proc] or [# of pixels per proc]. And the memory requirement of MADpre can not be always tuned by decreasing any of these two parameters. However the extra memory will in general depend on how many processors is used per single detector data set and therefore can be decreased by increasing the value of [# data gang size]. Note that the data layout of MADpre is such that attempts to minimize the extra memory burden due to the M3/GCP libraries. If no 'on-the-fly' capability is invoked (this is defined in the xml config file), then the MADpre-provided estimate of the memory required to perform the run is generally adequate. > back > Top of the page Dos and don'tsThough MADpre can (and will) formally run for any combination of the single dish detectors, i.e., total power and polarization-sensitive, in multi-resolution cases some of the actually observed pixels may be lost. It is therefore advised to separate both kinds of detectors and run them one at the time, simply combining the precomputed preconditioners afterwards.> Top of the page Availability and distributionThe MADpre software is part of the MADCAP suite of codes and requires M3 and GCP libraries. The software is made available on the NERSC supercomputers as part of the CMB module, which is there mantained by the members of the Computational Cosmology Center of Berkeley Lab.> Top of the page Contactradek(at)apc(dot)univ(dash)paris7(dot)fr > Top of the page > Home
|