Monday, December 12, 2016

Top Analytics, Data Science software

 

 

R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results

R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.  

The poll got tremendous participation from analytics and data science community and vendors, attracting 2,895 voters, who chose from a record number of 102 different tools.

R remains the leading tool, with 49% share (up from 46.9% in 2015), but Python usage grew faster and it almost caught up to R with 45.8% share (up from 30.3%). RapidMiner remains the most popular general platform for data mining/data science, with about 33% share. Notable tools with the most growth in popularity include Dato, Dataiku, MLlib, H2O, Amazon Machine Learning, scikit-learn, and IBM Watson.

The increased choice of tools is reflected in wider usage. The average number of tools used was 6.0, vs 4.8 in 2015.

The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 (and 17% in 2014), driven by Apache Spark, MLlib (Spark Machine Learning Library) and H2O.
The participation by region was: US/Canada (40%), Europe (39%), Asia (9.4%), Latin America (5.8%), Africa/MidEast (2.9%), Australia/NZ (2.2%).

Top Analytics/Data Science Tools

Next table has the top 10 most popular tools in 2016 poll
Tool2016
% share
% change% alone
R49%+4.5% 1.4%
Python45.8%+51% 0.1%
SQL35.5%+15% 0%
Excel33.6%+47% 0.2%
RapidMiner32.6%+3.5% 11.7%
Hadoop22.1%+20% 0%
Spark21.6%+91% 0.2%
Tableau18.5%+49% 0.2%
KNIME18.0%-10%4.4%
scikit-learn17.2%+107% 0%

In this table 2016 % share is % of voters who used this tool, % change is the change in share vs 2015 poll, and % alone is the percent of voters who used only the reported tool among all voters who used that tool. E.g. 4.4% of KNIME voters reported using only KNIME and nothing else. We note a decrease in such lone voting, with only 9 tools having 5% or more lone votes.

Top10 Analytics Data Science Software 2016
Fig 1: KDnuggets Analytics/Data Science 2016 Software Poll: top 10 most popular tools in 2016

Compared to 2015 KDnuggets Analytics/Data Science Poll results, the only newcomer in top 10 was scikit-learn, displacing SAS.

Tools with the highest growth (among tools with at least 15 users in 2015) were
Tool% change2016 %share2015 %share
Dato377%2.4%0.5%
Dataiku292%7.8%2.0%
MLlib253%11.6%3.3%
H2O233%6.7%2.0%
Amazon Machine Learning171%1.9%0.7%
scikit-learn107%17.2%8.3%
IBM Watson99%4.2%2.1%
Splunk/ Hunk98%2.2%1.1%
Spark91%21.6%11.3%
Scala79%6.2%3.5%


This year, 86% of voters used commercial software and 75% used free software. About 25% used only commercial software, and 13% used only open source/free software. A majority of 61% used both free and commercial software, similar to 64% in 2015.

New (in this poll) tools that received at least 1% share votes in 2016 were
  • Anaconda, 16%
  • Microsoft other ML/Data Science tools, 1.6%
  • SAP HANA, 1.2%
  • XLMiner, 1.2%
Among tools with at least 15 votes in 2015, the largest decline in 2016 was for the tools below, which includes probably a combination of decline of popularity for free tools like F# and lack of a voter drive for some of commercial tools this year.
  • Ayasdi, down 85%, to 0.3% share from 2.0%
  • Actian, down 83%, to 0.3% share from 2.0%
  • Datameer, down 52%, to 0.4% share from 0.9%
  • SAP Analytics, down 51%, to 1.5% share from 3.0%
  • SAS Enterprise Miner, down 49%, to 5.6% from 10.9%
  • Alteryx, down 46%, to 3.0% share from 5.6%
  • F#, down 42%, to 0.4% share from 0.7%
  • TIBCO Spotfire, down 36%, to 2.8% share from 4.3%
  • JMP, down 36%, to 2.0% share from 3.1%

Hadoop/Big Data Tools

The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 and 17% in 2014), driven mainly by big growth in Apache Spark, MLlib (Spark Machine Learning Library) and H2O, which we included among Big Data tools.

Here are the Big Data tools and their share in 2016, 2015, and %change.
Tool2016
%Share
2015
%share
% change
Hadoop22.1%18.4%+20.5%
Spark21.6%11.3%+91%
Hive12.4%10.2%+21.3%
MLlib11.6%3.3%+253%
SQL on Hadoop tools7.3%7.2%+1.6%
H2O6.7%2.0%+234%
HBase5.5%4.6%+18.6%
Apache Pig4.6%5.4%-16.1%
Apache Mahout2.6%2.8%-7.2%
Dato2.4%0.5%+338%
Datameer0.4%0.9%-52.3%
Other Hadoop/HDFS-based tools4.9%4.5%+7.5%

Deep Learning Tools

For the second year KDnuggets poll include Deep Learning Tools. This year, 18% of voters used Deep Learning tools, doubling the 9% in 2015.

Google Tensorflow jumped to first place, displacing last year leader Theano/Pylearn2 ecosystem.

Top tools:
  • Tensorflow, 6.8%
  • Theano ecosystem (including Pylearn2), 5.1%
  • Caffe, 2.3%
  • MATLAB Deep Learning Toolbox, 2.0%
  • Deeplearning4j, 1.7%
  • Torch, 1.0%
  • Microsoft CNTK, 0.9%
  • Cuda-convnet, 0.8%
  • mxnet, 0.6%
  • Convnet.js, 0.3%
  • darch, 0.1%
  • Nervana, 0.1%
  • Veles, 0.1%
  • Other Deep Learning Tools, 3.7%
The Deep Learning field is still in the beginning of its journey, as we see by the large number of options.

Programming Languages

Python, Java, Unix tools, Scala grew in popularity, while C/C++, Perl, Julia, F#, Clojure, and Lisp declined.

Here are the programming languages sorted by popularity.
  • Python, 45.8% share (was 30.3%), 51% increase
  • Java, 16.8% share (was 14.1%), 19% increase
  • Unix shell/awk/gawk 10.4% share (was 8.0%), 30% increase
  • C/C++, 7.3% share (was 9.4%), 23% decrease
  • Other programming/data languages, 6.8% share (was 5.1%), 34.1% increase
  • Scala, 6.2% share (was 3.5%), 79% increase
  • Perl, 2.3% share (was 2.9%), 19% decrease
  • Julia, 1.1% share (was 1.1%), 1.6% decrease
  • F#, 0.4% share (was 0.7%), 41.8% decrease
  • Clojure, 0.4% share (was 0.5%), 19.4% decrease
  • Lisp, 0.2% share (was 0.4%), 33.3% decrease