worldwidepopla.blogg.se - Median Latin Hypercube Sampling

Performance for various cases than the other sampling methods such as Latin Hypercube Sampling, Median LHS, and Monte Carlo Sampling (Wang et al., 2004).Faculty of Science: School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia DOI 10.7717/peerj.5722 Published Accepted Received Academic Editor Danlin Yu Subject Areas Soil Science, Statistics, Computational Science, Data Science Keywords Calibration sample size, Infrared spectroscopy, Sampling algorithms, Soil properties, Regression Copyright © 2018 Ng et al. This paper also discusses the misunderstandings about the non-positive definite correlation. To deal with the non-positive definite correlation matrix, an improved median Latin hypercube sampling with evolutionary algorithm (EA) called MLHS-EA into Monte Carlo simulation is proposed and investigated using IEEE 118-bus system with wind farms.

LHHS showed a consistently better performance for various cases than the other sampling methods such as Latin Hypercube Sampling, Median LHS,(C) median Latin hypercube sampling, and (D) Hammersley sequence sampling techniques (x. PeerJ 6: e5722 suggested a novel sampling method, Latin Hypercube Hammersley Sampling (LHHS) that combines the one-dimensional uniformity of LHS with multidimensional uniformity of HSS. In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra. Cite this article Ng W, Minasny B, Malone B, Filippi P. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

We analysed the effect of three sampling algorithms: Kennard-Stone (KS), conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM) against random sampling on the prediction of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH) on three datasets. MethodsHere, we show different sampling algorithms performed differently under different data size and different regression models (Cubist regression tree and Partial Least Square Regression (PLSR)). However, there is no guideline on which sampling method should be used under different size of datasets. The optimum sample size and the overall sample representativeness of the dataset could further improve the model performance. The accuracy of these regression models relies heavily on the calibration set. Soil spectra absorbance from the visible-infrared range can be calibrated using regression models to predict a set of soil properties.

Although spectroscopy utilizes wide ranges of the electromagnetic spectrum, the work presented in this study focuses on the visible near infrared (vis-NIR) region. Furthermore, multiple soil properties can be predicted from a single soil spectrum ( Bendor & Banin, 1995 Stenberg et al., 2010 Viscarra Rossel et al., 2008). Infrared spectroscopy has gained interest for various soil analyses over the conventional ‘wet chemistry’ methods because the latter is laborious, costly and time-consuming. KM is suitable for large datasets, KS is efficient in small datasets but results can be variable, while cLHS is less affected by sample size.In the last few decades, there has been growing interest in rapid soil characterisation. The use of the sampling algorithm is beneficial for larger datasets than smaller datasets where only small improvements can be made.

Nonetheless, because soil is a complex medium that might have non-linear reflectance behaviour, a linear modelling approach like PLSR might not be sufficient ( Vohland et al., 2011). The most common calibration models for soil applications are based on linear regressions, such as principal component regression ( Chang et al., 2001 Stenberg et al., 2010) and partial least squares regression (PLSR) ( McCarty et al., 2002 Wold, Johansson & Cocchi, 1993). The regression model is calibrated from a spectral library, relating infrared absorbance to standard laboratory measurements. With the help of chemometric techniques, properties of a soil sample can be predicted from its spectral absorption based on a regression model. Although the absorbance in the vis-NIR region is often broad and less resolved, this region contains some useful information on stretching and bending of the fundamentals C-H, N-H, O-H, and C=O bonds.

However, in a real-world situation, the number of samples (with complete standard measurements) are usually small due to budget and/or time constraints ( Minasny & McBratney, 2006). A larger calibration sample size may be able to create more reliable and representative models compared to those models based upon smaller sample sizes ( Kuang & Mouazen, 2012). The number of calibration samples also affects the model predictions, although this has received limited attention ( Kuang & Mouazen, 2012). To obtain a reliable prediction, representative data should be used in the model ( Kuang & Mouazen, 2012 Viscarra Rossel et al., 2008). Nonetheless, the accuracy of these regression models to produce accurate predictions relies heavily on the calibration dataset used.

The KM algorithm, on the other hand, partitions data into groups (strata) that have similar properties. The cLHS algorithm selects samples that optimally represent the multivariate distribution of the input dataset. The cLHS algorithm developed initially for generating optimal sample configurations for digital soil mapping has also been used in soil spectroscopy studies ( Mulder, De Bruin & Schaepman, 2013). One of the most common sampling algorithms used in the infrared spectroscopy literature is the KS algorithm ( Ramirez-Lopez et al., 2014), which sequentially selects samples with the largest distance in the variable space in the calibration set ( Kennard & Stone, 1969). With the expensive cost of soil analysis and limited budgets, choosing representative samples for laboratory analysis which are subsequently used for calibration, is a critical component in ensuring the establishment of the most appropriate regression models ( Brown, Bricklemyer & Miller, 2005 Ramirez-Lopez et al., 2014).There are various sampling algorithms available to select calibration samples in infrared spectroscopy, such as the Kennard-Stone (KS) algorithm, the conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM).

Thus, the objective of this paper is to investigate the effect of calibration sample size, the efficiency of sampling algorithms, and regression methods to predict various soil properties on soil samples from three different spatial extents. The performance of the models is evaluated based on the average prediction accuracies of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH). This study warrants further research as it only considers two properties for a field (5 km 2) and regional scale (<500 km 2) with a calibration sample size of up to 380 samples for each dataset.In this study, we compared three sampling algorithms (KS, cLHS, and KM) against random sampling on three different datasets at continental, regional, and local scale with various calibration sample sizes using two different regression methods: PLSR and Cubist regression modelling. (2014) compared the use of KS, cLHS and fuzzy k-means clustering sampling (FKM) to select the calibration samples, and found that although KS algorithm was outperformed by other algorithms in terms of sample representativeness, the predictive performance of regression models for the prediction of clay content and exchangeable Ca (Ca 2+) were comparable regardless of the sampling method. Aside from the random sampling, these three algorithms are utilized to optimize the selection of representative samples from the sample population.Ramirez-Lopez et al. An illustration of the three sampling algorithms as well as random sampling is given in Fig.