Wednesday, June 13, 2012

Analytics in the Oracle Database

         The addition of analytics to databases is a natural direction. As the volume of data increases, data movement dominates the cost of computation. It starts to make more sense to move the computation and algorithms to the database than to move, to an external server, the data to be analyzed. Furthermore, in most cases, the results of the analysis need to be persisted back to the database and combined with other data in order to make them actionable.

      Picking up on this, over the past releases Oracle has continuously added analytic features to its database. Taken as a whole these features make the Oracle Database a powerful platform for developing applications leveraging analytics. However, most users are not aware of the complete set of analytic features available in the database. The following list covers the features present in the Oracle Database 10g Release 2:
Complex data transformations
Data mining
Image feature extraction
Linear algebra
OLAP
Predictive analytics
Spatial analytics
Statistical functions
Text mining

As these features are part of a common server it is possible to combine them efficiently and with ease. The overall benefit is greater than just the sum of the parts that could be achieved through the integration of different servers and tools. For example, it is possible to create efficient arbitrarily complex SQL statements that combine data mining and text processing:
SELECT A.cust_name, A.contact_info
FROM customers A
WHERE PREDICTION_PROBABILITY(tree_model,
‘attrite’ USING A.*) > 0.8
AND A.cust_value > 90
AND A.cust_id IN
(SELECT B.cust_id
FROM call_center B
WHERE B.call_date BETWEEN ’01-Jan-2005’
AND ’30-Jun-2005’
AND CONTAINS(B.notes, ‘Checking Plus’, 1) > 0);
The above query selects all customers who have a high propensity to attrite (> 80% chance), are valuable customers (customer value rating > 90), and have had a recent conversation with customer services regarding a Checking Plus account. The propensity to attrite information is computed using a data mining model (tree_model). The query uses Oracle Text's CONTAINS operator to search call center notes for references to Checking Plus accounts.

Finally, it is also easy to integrate the results from queries like the one above with Business Intelligence tools such as Oracle Discover, Oracle Portal, and Crystal Reports (more on that in future posts).

In future posts I will cover:
How to get the most out of many of these features
How to solve real problems using analytics
The role of analytics in Business Intelligence and databases
In the meantime, the following provides a brief description of each one of the above features with links for further information.

Complex Data Transformations
Data transformation is a key aspect of analytical applications and ETL (extract, transform, and load). Besides support for transformations through SQL expressions, the Oracle Database, since the Oracle Database 10g Release 1, ships with a flexible data transformation package that includes a variety of missing value and outlier treatments, as well as binning and normalization capabilities.

Data Mining
Oracle Data Mining (ODM), an option to the Enterprise Edition of the Oracle Database, provides a rich set of data mining functionality. ODM 10g Release 2 has eleven algorithms that can be used for classification, regression, clustering, anomaly detection, feature extraction, association analysis, and attribute ranking.

The database also includes in both Standard Edition and Enterprise Edition the frequent itemset package (DBMS_FREQUENT_ITEMSET). This package enables frequent itemset counting and it is used as a building block for ODM's Association algorithm. Frequent itemsets provide a mechanism for counting how often multiple events occur together. This blog post has a nice discussion of this feature.

Image Feature Extraction
Oracle Intermedia is a feature of the Oracle Database that is included in both Standard Edition and Enterprise Edition. interMedia supports the extraction of image features (e.g., color histogram, texture, and positional color) that can then be used to characterize and analyze images.

Linear Algebra
Oracle Database 10g Release 2 ships with a new package UTL_NLA. The UTL_NLA package exposes a subset of the popular BLAS and LAPACK (Version 3.0) libraries for operations on vectors and matrices represented as VARRAYs. This package includes procedures to solve systems of linear equations, invert matrices, and compute eigenvalues and eigenvectors.

Predictive Analytics
Data mining can uncover useful information buried in vast amounts of data. However, it is often the case that many users that could benefit from these results do not have any data mining expertise. The DBMS_PREDICTIVE_ANALYTICS package addresses this by automating the entire data mining process from data preprocessing through model building to scoring new data. This package provides an important tool that makes data mining possible for a wider audience of users, in particular, business analysts. The capabilities of this package are also exposed through the Oracle Spreadsheet Add-In for Predictive Analytics. The Oracle Spreadsheet Add-In for Predictive Analytics enables Microsoft Excel users to mine their Oracle Database or Excel data using simple, "one click" Predict and Explain predictive analytics features.

Statistical Functions
The Oracle Database provides a long list of SQL statistical functions with support for: hypothesis testing (e.g., t-test, F-test), correlation computation (e.g., pearson correlation), cross-tab statistics, and descriptive statistics (e.g., median and mode). The package DBMS_STAT_FUNCS adds distribution fitting procedures and a summary procedure that returns descriptive statistics for a column.

OLAP
Oracle OLAP, an option to Oracle Database 10g Enterprise Edition, has features previously found only in specialized OLAP databases. Moving beyond drill-downs and roll-ups, Oracle OLAP also supports time-series modeling and forecast.

Text Mining
Oracle Text uses standard SQL to index, search, and analyze text and documents stored in the Oracle database, in files, and on the web. It also supports automatic classification and clustering of document collections. Many of these analytical features are layered on top of ODM functionality.

Spatial Analytics
Oracle Spatial is an option for Oracle Enterprise Edition that provides advanced spatial features to support high-end GIS and LBS solutions. Oracle Spatial's analysis and mining capabilities include functions for binning, detection of regional patterns, spatial correlation, colocation mining, and spatial clustering. Oracle Spatial also includes support for topology and network data models and analytics. The topology data model of Oracle Spatial allows one to work with data about nodes, edges, and faces in a topology. It includes network analysis functions for computing shortest path, minimum cost spanning tree, nearest-neighbors analysis, traveling salesman problem, among others.

No comments:

Post a Comment