### Four Classifiers Used in Data Mining and Knowledge Discovery for Petroleum Exploration and Development

#### Abstract

The application of data mining and knowledge discovery in databases for petroleum exploration and development (PE&D) is becoming promising, though still at an early stage. Up to now, the data mining tools usually used in PE&D are four classifiers: multiple regression analysis (MRA), Bayesian discrimination (BAYD), back-propagation neural network (BPNN), and support vector machine (SVM). Each of the four classifiers has its advantages and disadvantages. A question, however, has been raised in applications is: which classifier is the most applicable to a specified application? This paper has given an answer to the question through two case studies: 1) trap quality evaluation of the Northern Kuqa Depression of the Tarim Basin in western China, and 2) oil identification of the Xiefengqiao anticlinal structure of the Jianghan Basin in central China. Case 1 shows that the results of BAYD, BPNN and SVM are same and can have zero residuals, while MRA has unallowable residuals; but Case 2 shows that the results of only SVM have zero residuals, while BAYD, BPNN and MRA have unallowable residuals. The reasons are: a) since the two cases are nonlinear problems, the linear MRA is not applicable; b) since the nonlinearity of Case 1 is weak, the nonlinear BAYD, BPNN and SVM are applicable; and c) since the nonlinearity of Case 2 is strong, only nonlinear SVM is applicable. Therefore, it is proposed that: we can adopt MRA when a problem is linear; adopt BAYD, BPNN, or SVM when a problem is weakly nonlinear; and adopt only SVM when a problem is strongly nonlinear. In addition, the predictions of the applicable classifiers coincide with real exploration results, and a commercial gas trap was discovered after the forecast in Case 1 and SVM can correct some erroneous well-log interpretations in Case 2.

* Key words:* Multiple regression analysis; Bayesian discrimination; Back-propagation neural network; Support vector machine; Trap quality evaluation; Oil identification

#### Keywords

#### Full Text:

PDF#### References

[1] Hand, D., Mannila, H., & Smyth, P. (2001). *Principles of Data Mining*. Cambridge, MA, USA: MIT Press.

[2] Larose, D. T. (2005). *Discovering Knowledge in Data*. New York, USA: John Wiley & Sons, Inc.

[3] Larose, D. T. (2006). *Data Mining Methods and Models*. New York, USA: John Wiley & Sons, Inc.

[4] Hirsh, H. (2008). **Data Mining Research: Current Status and Future Opportunities**. *Statistical Analysis and Data Mining*, *1*(2), 104-107.

[5] **Wong,** **P. M**. (2003). A Novel Technique for Modeling Fracture Intensity: A Case Study from the Pinedale Anticline in Wyoming. *AAPG Bulletin*, *87*(11), 1717-1727.

[6] Aminzadeh, F. (2005). Applications of AI and Soft Computing for Challenging Problems in the Oil Industry. *Journal of Petroleum Science and Engineering*, *47*(1-2), 5-14.

[7] Mohaghegh, S. D. (2005). A New Methodology for the Identification of Best Practices in the Oil and Gas Industry, Using Intelligent Systems. *Journal of Petroleum Science and Engineering*, *49*(3-4), 239-260.

[8] Shi, G. R., & Yang, X. S. (2010). Optimization and Data Mining for Fracture Prediction in Geosciences. *Procedia Computer Science*, *1*(1), 1353-1360.

[9] Chatterjee, S., Hadi, A. S., & Price, B. (2000). *Regression Analysis by Examples **(3rd ed.)*. New York, USA: John Wiley & Sons, Inc.

[10] Lee, J. H., & Yang, S. H. (2002). Statistical Optimization and Assessment of a Thermal Error Model for CNC Machine Tools.* **Int. J. Machine Tools and Manufacture*, *42*(1), 147-155.

[11] Shi, G. R., Zhou, X. X., Zhang, G. Y., Shi, X. F., & Li, H. H. (2004). The Use of Artificial Neural Network Analysis and Multiple Regression for Trap Quality Evaluation: A Case Study of the Northern Kuqa Depression of Tarim Basin in Western China. *Marine and Petroleum Geology*, *21*(3), 411-420.

[12] Singh, J., Shaik, B., Singh, S., Agrawal, V. K., Khadikar, P. V., Deeb, O., & Supuran, C. T. (2008). Comparative QSAR Study on Para-Substituted Aromatic Sulphonamides as CAII Inhibitors: Information Versus Topological (Distance-Based and Connectivity) Indices. *Chemical Biology and Drug Design*, *71*, 244-259.

[13] Shi, G. R. (2009). The Use of Support Vector Machine for Oil and Gas Identification in Low-Porosity and Low-Permeability Reservoirs. *Int. J. Mathematical Modelling and Numerical Optimisation*, *1*(1/2), 75-87.

[14] Denison, D. G. T., Holmes, C. C., Mallick, B. K., & Smith, A. F. M. (2002). *Bayesian Methods for Nonlinear Classification and Regression*. Chichester, England, UK: John Wiley & Sons, Inc.

[15] Logan, T. P., & Gupta, A. K. (1993). Bayesian Discrimination Using Multiple Observations. *Communications in Statistics**-**Theory and Methods*, *22*(6), 1735-1754.

[16] Brown, P. J., Kenward, M. G., & Bassett, E. E. (2001). Bayesian Discrimination with Longitudinal Data. *Biostatistics*, *2*(4), 417-432.

[17] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). *Learning Internal Representations by Error Propagation*. In Parallel Distributed Processing, D. E. Rumelhart, & J. L. McClelland (Eds.), Volume 1, pp. 317-362. Cambridge, MA, USA: MIT Press.

[18] Hecht-Nielsen, R. (1989). *Theory of the Backpropagation Neural Network*. In Proceedings of the Int. Joint Conf. on Neural Networks (pp. 593-605),* Washington*.

[19] Güler,і., & Übeyli, E. D. (2003). Detection of Ophthalmic Artery Stenosis by Least-Mean Squares Backpropagation Neural Network. *Computers in Biology and Medicine*, *33*(4), 333-343.

[20] Altiparmak, F., Dengiz, B., & Bulgak, A. A. (2007). **Buffer Allocation and Performance Modeling in Asynchronous Assembly System Operations: An Artificial Neural Network Metamodeling Approach**. *Applied Soft Computing*, *7*(3), 946-956.

[21] Tabach^{}, E. E., Lancelot, L., Shahrour, I., & Najjar, Y. (2007). Use of Artificial Neural Network Simulation Metamodelling to Assess Groundwater Contamination in a Road Project. *Mathematical and Computer Modelling*, *45*(7-8), 766-776.

[22] Choi, B., Lee, J. H., & Kim, D. H. (2008). Solving Local Minima Problem with Large Number of Hidden Nodes on Two-Layered Feed-Forward Artificial Neural Networks. *Neurocomputing*, *71*(16-18), 3640-3643.

[23] Bennett, K. P., & Mangasarian, O. L. (1992). Robust Linear Programming Discrimination of Two Linearly Inseparable Sets. *Optimization Methods and Software*, *1*(1), 23-34.

[24] Vapnik, V. N. (1995). *The Nature of Statistical Learning Theory*.* *New York, USA: Springer-Verlag.

[25] Blum, A. L., & Langley, P. (1997). Selection of Relevant Features and Examples in Machine Learning. *Artificial Intelligence*, *97*(1-2), 245-271.

[26] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene Selection for Cancer Classification Using Support Vector Machines, *Machine Learning*, *46*(1-3), 389-422.

[27] Cristianini, N., & Shawe-Taylor, J. (2000). *An Introduction to Support Vector Machines and other Kernel-based Learning Methods*. Cambridge, UK: Cambridge University Press.

[28] Schölkopf, B., Smola, A. J., Williamson, R. C., & Bartlett, P. L. (2000). New Support Vector Algorithms.* Neural Computation*, *12*(5), 1207-1245.

[29] Shi, G. R. (2008). Superiorities of Support Vector Machine in Fracture Prediction and Gassiness Evaluation. *Petroleum Exploration and Development*. *35*(5), 588-594.

[30] Chang, C. C., & Lin, C. J. (2011). *LIBSVM: a library for support vector machines, Version 3.1*. Retrived from http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

[31] Zhou, X. X., Zhang, G. Y., Li, H. H., Wang, H. J., & Jia, J. H. (2002). *Factors of Pool-forming of the Kuqa Petroleum System in the Tarim Basin*. Beijing, China: *Petroleum Industry* Press. (in Chinese)

[32] Shi, G. R., Ma J. S., Yang X. S., Chang J. H., & Wan, J. (2011). Finite Volume Method for Solving a Modified 3-D 3-Phase Black-Oil Hydrocarbon Secondary Migration Model, and Its Application to the Kuqa Depression of the Tarim Basin in Western China. *Advances in Petroleum Exploration and Development*, *2*(1), 1-12.

[33] Shi, G. R., Zhang, G. Y., & Shi, X. F. (2002). Application of Artificial Neural Network and Multiple Regression Analysis to Optimization of Exploration Prospects. *Acta **Petrolei **Sinica*, *23*(5), 19-22. (in Chinese with English abstract)

[34] Yang, J. X. (2002). Identification of Oil Horizons by Artificial Neural Networks in Xiefengqiao Structure. *Oil & Gas Geology*, *23*(1), 76-80. (in Chinese with English abstract)

DOI: http://dx.doi.org/10.3968/j.aped.1925543820110202.107

DOI (PDF): http://dx.doi.org/10.3968/g2248

### Refbacks

- There are currently no refbacks.

**Reminder**

- How to do online submission to another Journal?
- If you have already registered in Journal A, then how can you submit another article to Journal B? It takes two steps to make it happen:

**1. Register yourself in Journal B as an Author**

- Find the journal you want to submit to in CATEGORIES, click on “VIEW JOURNAL”, “Online Submissions”, “GO TO LOGIN” and “Edit My Profile”. Check “Author” on the “Edit Profile” page, then “Save”.

**2. Submission**

- Go to “User Home”, and click on “Author” under the name of Journal B. You may start a New Submission by clicking on “CLICK HERE”.

We only use three mailboxes as follows to deal with issues about paper acceptance, payment and submission of electronic versions of our journals to databases:

caooc@hotmail.com; aped@cscanada.net; aped@cscanada.org

Articles published in **Advances in Petroleum Exploration and Development*** *are licensed under Creative Commons Attribution 4.0 (CC-BY).

* ADVANCES IN PETROLEUM EXPLORATION AND DEVELOPMENT* Editorial Office

**Address**: 9375 Rue de Roissy Brossard, Québec, J4X 3A1, Canada

**Telephone**: 1-514-558 6138**Website**: Http://www.cscanada.net

Http://www.cscanada.org**E-mail**:office@cscanada.net; office@cscanada.org

Copyright © 2010 **Canadian Research & Development Centre of Sciences and Cultures**