R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.
The poll got tremendous participation from analytics and data science community and vendors, attracting 2,895 voters, who chose from a record number of 102 different tools.
R remains the leading tool, with 49% share (up from 46.9% in 2015), but
Python usage grew faster and it almost caught up to R with 45.8% share
(up from 30.3%). RapidMiner remains the most popular general platform
for data mining/data science, with about 33% share. Notable tools with
the most growth in popularity include Dato, Dataiku, MLlib, H2O, Amazon
Machine Learning, scikit-learn, and IBM Watson.
The increased choice of tools is reflected in wider usage. The average number of tools used was 6.0, vs 4.8 in 2015.
The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 (and
17% in 2014), driven by Apache Spark, MLlib (Spark Machine Learning
Library) and H2O.
The participation by region was:
US/Canada (40%),
Europe (39%),
Asia (9.4%),
Latin America (5.8%),
Africa/MidEast (2.9%),
Australia/NZ (2.2%).
Top Analytics/Data Science Tools
Next table has the top 10 most popular tools in 2016 poll
Tool | 2016 % share | % change | % alone |
---|---|---|---|
R | 49% | +4.5% | 1.4% |
Python | 45.8% | +51% | 0.1% |
SQL | 35.5% | +15% | 0% |
Excel | 33.6% | +47% | 0.2% |
RapidMiner | 32.6% | +3.5% | 11.7% |
Hadoop | 22.1% | +20% | 0% |
Spark | 21.6% | +91% | 0.2% |
Tableau | 18.5% | +49% | 0.2% |
KNIME | 18.0% | -10% | 4.4% |
scikit-learn | 17.2% | +107% | 0% |
In this table 2016 % share is % of voters who used this tool, % change is the change in share vs 2015 poll, and % alone is the percent of voters who used only the reported tool among all voters who used that tool. E.g. 4.4% of KNIME voters reported using only KNIME and nothing else. We note a decrease in such lone voting, with only 9 tools having 5% or more lone votes.
Fig 1: KDnuggets Analytics/Data Science 2016 Software Poll: top 10 most popular tools in 2016
Compared to 2015 KDnuggets Analytics/Data Science Poll results, the only newcomer in top 10 was scikit-learn, displacing SAS.
Tools with the highest growth (among tools with at least 15 users in 2015) were
Tool | % change | 2016 %share | 2015 %share |
---|---|---|---|
Dato | 377% | 2.4% | 0.5% |
Dataiku | 292% | 7.8% | 2.0% |
MLlib | 253% | 11.6% | 3.3% |
H2O | 233% | 6.7% | 2.0% |
Amazon Machine Learning | 171% | 1.9% | 0.7% |
scikit-learn | 107% | 17.2% | 8.3% |
IBM Watson | 99% | 4.2% | 2.1% |
Splunk/ Hunk | 98% | 2.2% | 1.1% |
Spark | 91% | 21.6% | 11.3% |
Scala | 79% | 6.2% | 3.5% |
This year, 86% of voters used commercial software and 75% used free software. About 25% used only commercial software, and 13% used only open source/free software. A majority of 61% used both free and commercial software, similar to 64% in 2015.
New (in this poll) tools that received at least 1% share votes in 2016 were
- Anaconda, 16%
- Microsoft other ML/Data Science tools, 1.6%
- SAP HANA, 1.2%
- XLMiner, 1.2%
Among tools with at least 15 votes in 2015, the largest decline in 2016
was for the tools below, which includes probably a combination of
decline of popularity for free tools like F# and lack of a voter drive
for some of commercial tools this year.
- Ayasdi, down 85%, to 0.3% share from 2.0%
- Actian, down 83%, to 0.3% share from 2.0%
- Datameer, down 52%, to 0.4% share from 0.9%
- SAP Analytics, down 51%, to 1.5% share from 3.0%
- SAS Enterprise Miner, down 49%, to 5.6% from 10.9%
- Alteryx, down 46%, to 3.0% share from 5.6%
- F#, down 42%, to 0.4% share from 0.7%
- TIBCO Spotfire, down 36%, to 2.8% share from 4.3%
- JMP, down 36%, to 2.0% share from 3.1%
Hadoop/Big Data Tools
The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 and
17% in 2014), driven mainly by big growth in Apache Spark, MLlib (Spark
Machine Learning Library) and H2O, which we included among Big Data
tools.
Here are the Big Data tools and their share in 2016, 2015, and %change.
Here are the Big Data tools and their share in 2016, 2015, and %change.
Tool | 2016 %Share | 2015 %share | % change |
---|---|---|---|
Hadoop | 22.1% | 18.4% | +20.5% |
Spark | 21.6% | 11.3% | +91% |
Hive | 12.4% | 10.2% | +21.3% |
MLlib | 11.6% | 3.3% | +253% |
SQL on Hadoop tools | 7.3% | 7.2% | +1.6% |
H2O | 6.7% | 2.0% | +234% |
HBase | 5.5% | 4.6% | +18.6% |
Apache Pig | 4.6% | 5.4% | -16.1% |
Apache Mahout | 2.6% | 2.8% | -7.2% |
Dato | 2.4% | 0.5% | +338% |
Datameer | 0.4% | 0.9% | -52.3% |
Other Hadoop/HDFS-based tools | 4.9% | 4.5% | +7.5% |
Deep Learning Tools
For the second year KDnuggets poll include Deep Learning Tools.
This year, 18% of voters used Deep Learning tools, doubling the 9% in 2015.
Google Tensorflow jumped to first place, displacing last year leader Theano/Pylearn2 ecosystem.
Top tools:
Google Tensorflow jumped to first place, displacing last year leader Theano/Pylearn2 ecosystem.
Top tools:
- Tensorflow, 6.8%
- Theano ecosystem (including Pylearn2), 5.1%
- Caffe, 2.3%
- MATLAB Deep Learning Toolbox, 2.0%
- Deeplearning4j, 1.7%
- Torch, 1.0%
- Microsoft CNTK, 0.9%
- Cuda-convnet, 0.8%
- mxnet, 0.6%
- Convnet.js, 0.3%
- darch, 0.1%
- Nervana, 0.1%
- Veles, 0.1%
- Other Deep Learning Tools, 3.7%
The Deep Learning field is still in the beginning of its journey, as we see by the large number of options.
Programming Languages
Python, Java, Unix tools, Scala grew in popularity,
while C/C++, Perl, Julia, F#, Clojure, and Lisp declined.
Here are the programming languages sorted by popularity.
Here are the programming languages sorted by popularity.
- Python, 45.8% share (was 30.3%), 51% increase
- Java, 16.8% share (was 14.1%), 19% increase
- Unix shell/awk/gawk 10.4% share (was 8.0%), 30% increase
- C/C++, 7.3% share (was 9.4%), 23% decrease
- Other programming/data languages, 6.8% share (was 5.1%), 34.1% increase
- Scala, 6.2% share (was 3.5%), 79% increase
- Perl, 2.3% share (was 2.9%), 19% decrease
- Julia, 1.1% share (was 1.1%), 1.6% decrease
- F#, 0.4% share (was 0.7%), 41.8% decrease
- Clojure, 0.4% share (was 0.5%), 19.4% decrease
- Lisp, 0.2% share (was 0.4%), 33.3% decrease