Monday, December 12, 2016

Top Analytics, Data Science software

 

 

R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results

R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.  

The poll got tremendous participation from analytics and data science community and vendors, attracting 2,895 voters, who chose from a record number of 102 different tools.

R remains the leading tool, with 49% share (up from 46.9% in 2015), but Python usage grew faster and it almost caught up to R with 45.8% share (up from 30.3%). RapidMiner remains the most popular general platform for data mining/data science, with about 33% share. Notable tools with the most growth in popularity include Dato, Dataiku, MLlib, H2O, Amazon Machine Learning, scikit-learn, and IBM Watson.

The increased choice of tools is reflected in wider usage. The average number of tools used was 6.0, vs 4.8 in 2015.

The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 (and 17% in 2014), driven by Apache Spark, MLlib (Spark Machine Learning Library) and H2O.
The participation by region was: US/Canada (40%), Europe (39%), Asia (9.4%), Latin America (5.8%), Africa/MidEast (2.9%), Australia/NZ (2.2%).

Top Analytics/Data Science Tools

Next table has the top 10 most popular tools in 2016 poll
Tool2016
% share
% change% alone
R49%+4.5% 1.4%
Python45.8%+51% 0.1%
SQL35.5%+15% 0%
Excel33.6%+47% 0.2%
RapidMiner32.6%+3.5% 11.7%
Hadoop22.1%+20% 0%
Spark21.6%+91% 0.2%
Tableau18.5%+49% 0.2%
KNIME18.0%-10%4.4%
scikit-learn17.2%+107% 0%

In this table 2016 % share is % of voters who used this tool, % change is the change in share vs 2015 poll, and % alone is the percent of voters who used only the reported tool among all voters who used that tool. E.g. 4.4% of KNIME voters reported using only KNIME and nothing else. We note a decrease in such lone voting, with only 9 tools having 5% or more lone votes.

Top10 Analytics Data Science Software 2016
Fig 1: KDnuggets Analytics/Data Science 2016 Software Poll: top 10 most popular tools in 2016

Compared to 2015 KDnuggets Analytics/Data Science Poll results, the only newcomer in top 10 was scikit-learn, displacing SAS.

Tools with the highest growth (among tools with at least 15 users in 2015) were
Tool% change2016 %share2015 %share
Dato377%2.4%0.5%
Dataiku292%7.8%2.0%
MLlib253%11.6%3.3%
H2O233%6.7%2.0%
Amazon Machine Learning171%1.9%0.7%
scikit-learn107%17.2%8.3%
IBM Watson99%4.2%2.1%
Splunk/ Hunk98%2.2%1.1%
Spark91%21.6%11.3%
Scala79%6.2%3.5%


This year, 86% of voters used commercial software and 75% used free software. About 25% used only commercial software, and 13% used only open source/free software. A majority of 61% used both free and commercial software, similar to 64% in 2015.

New (in this poll) tools that received at least 1% share votes in 2016 were
  • Anaconda, 16%
  • Microsoft other ML/Data Science tools, 1.6%
  • SAP HANA, 1.2%
  • XLMiner, 1.2%
Among tools with at least 15 votes in 2015, the largest decline in 2016 was for the tools below, which includes probably a combination of decline of popularity for free tools like F# and lack of a voter drive for some of commercial tools this year.
  • Ayasdi, down 85%, to 0.3% share from 2.0%
  • Actian, down 83%, to 0.3% share from 2.0%
  • Datameer, down 52%, to 0.4% share from 0.9%
  • SAP Analytics, down 51%, to 1.5% share from 3.0%
  • SAS Enterprise Miner, down 49%, to 5.6% from 10.9%
  • Alteryx, down 46%, to 3.0% share from 5.6%
  • F#, down 42%, to 0.4% share from 0.7%
  • TIBCO Spotfire, down 36%, to 2.8% share from 4.3%
  • JMP, down 36%, to 2.0% share from 3.1%

Hadoop/Big Data Tools

The usage of Hadoop/Big Data tools grew to 39%, up from 29% in 2015 and 17% in 2014), driven mainly by big growth in Apache Spark, MLlib (Spark Machine Learning Library) and H2O, which we included among Big Data tools.

Here are the Big Data tools and their share in 2016, 2015, and %change.
Tool2016
%Share
2015
%share
% change
Hadoop22.1%18.4%+20.5%
Spark21.6%11.3%+91%
Hive12.4%10.2%+21.3%
MLlib11.6%3.3%+253%
SQL on Hadoop tools7.3%7.2%+1.6%
H2O6.7%2.0%+234%
HBase5.5%4.6%+18.6%
Apache Pig4.6%5.4%-16.1%
Apache Mahout2.6%2.8%-7.2%
Dato2.4%0.5%+338%
Datameer0.4%0.9%-52.3%
Other Hadoop/HDFS-based tools4.9%4.5%+7.5%

Deep Learning Tools

For the second year KDnuggets poll include Deep Learning Tools. This year, 18% of voters used Deep Learning tools, doubling the 9% in 2015.

Google Tensorflow jumped to first place, displacing last year leader Theano/Pylearn2 ecosystem.

Top tools:
  • Tensorflow, 6.8%
  • Theano ecosystem (including Pylearn2), 5.1%
  • Caffe, 2.3%
  • MATLAB Deep Learning Toolbox, 2.0%
  • Deeplearning4j, 1.7%
  • Torch, 1.0%
  • Microsoft CNTK, 0.9%
  • Cuda-convnet, 0.8%
  • mxnet, 0.6%
  • Convnet.js, 0.3%
  • darch, 0.1%
  • Nervana, 0.1%
  • Veles, 0.1%
  • Other Deep Learning Tools, 3.7%
The Deep Learning field is still in the beginning of its journey, as we see by the large number of options.

Programming Languages

Python, Java, Unix tools, Scala grew in popularity, while C/C++, Perl, Julia, F#, Clojure, and Lisp declined.

Here are the programming languages sorted by popularity.
  • Python, 45.8% share (was 30.3%), 51% increase
  • Java, 16.8% share (was 14.1%), 19% increase
  • Unix shell/awk/gawk 10.4% share (was 8.0%), 30% increase
  • C/C++, 7.3% share (was 9.4%), 23% decrease
  • Other programming/data languages, 6.8% share (was 5.1%), 34.1% increase
  • Scala, 6.2% share (was 3.5%), 79% increase
  • Perl, 2.3% share (was 2.9%), 19% decrease
  • Julia, 1.1% share (was 1.1%), 1.6% decrease
  • F#, 0.4% share (was 0.7%), 41.8% decrease
  • Clojure, 0.4% share (was 0.5%), 19.4% decrease
  • Lisp, 0.2% share (was 0.4%), 33.3% decrease

Wednesday, November 16, 2016

6 steps to take your research from idea to action





Open Access Week exclusive cartoon

Open Access Week 2016 is here, a global event that promotes Open Access (OA) in scholarship and research. This year’s theme is ‘Open in Action’, so we’re taking a look at the potential OA research has to be ‘actioned’ by groups outside of academia (browse our OA Week article collection, showcasing how research addresses vital issues).

The benefits of OA: what you told us

Our 2014 Open Access Survey results showed that a significant number of researchers agreed with the following:
50% thought OA could lead to a larger readership for their published research.
65% believed publishing OA offered greater visibility for their work.
81% felt their work could have a wider circulation if it was published OA.

Want to know more? Check out the key findings infographic and browse the full results.

Open Access: from research to action

So what do you need to consider if you would like your research to be picked up and actioned by practitioners, policy makers, NGOS, clinicians, the media or anyone else? In honor of OA Week we’ve put together this exclusive cartoon, that outlines some of the steps you should consider (and is summarised below).
1. Who do you want to read your research? Be clear from your first draft who the key audience for your published research is.
2. Think about that audience when you’re choosing a journal to publish your research in. Find out more about a journal’s readership by reading the aims and scope and browsing published research article metrics.
3. Make your research discoverable, so spend time on your title, your abstract and your article keywords. Think about the words people will use to search for your work and include them.
4. Think about what else you can do to make your research accessible, including whether to enhance your published work with supplemental material, including data.
6. Published? Tell others about it and get in touch so we can help you spread the word (more on the role of marketing in OA).

Saturday, November 12, 2016

IoT Landscape 2016





For now, with startup creation and funding in full swing, we can barely keep track of all new IoT startups appearing on the market.  Certain areas, particularly on the consumer IoT side (most blatantly, wearables, fitness and home automation) are now overcrowded, inevitably raising the specter of failure and forced consolidation.  The enterprise and industrial sides of the Internet of Things are more open, bearing in mind that some existing players in those spaces have been operating for decades.
Here’s our 2016 landscape:
Internet-of-Things-2016_sm
As in previous versions, the chart is organized into building blocks, horizontals and verticals. Pretty much every segment is seeing a lot of activity, but it is worth noting that those parts are not particularly well integrated just yet, meaning in particular that vertical applications are not necessarily built on top of horizontals. To the contrary, we’re very much very much in the era of the “full stack” IoT startup – because there is no dominant horizontal platform, and not enough mature, cheap and fully reliable components just yet, startups tend to build a lot themselves: hardware, software, data/analytics, etc. Some enterprise IoT companies, such as our portfolio company Helium, also have a professional services organization on top, as enterprise customers are at the stage where they try to make sense of the IoT opportunity and are looking for something that “just works”, as opposed to mixing and matching best of breed components. This is a typical characteristic of startups operating in an early market, and I would expect many of those companies to evolve over time, and possibly ditch the hardware component of their business entirely.
Dancing with the giants
To fully make sense the IoT ecosystem, it’s important to fully realize that large corporations are omnipresent in it. I mentioned this in an earlier post about home automation, but a glance through the 2016 IoT landscape will quickly establish that they are active in pretty much every single category.
In the Internet era (90s and 00s), the dynamic was brutal but pretty simple (at least in retrospect) – on one side, there were the disruptors (Internet-native startups with no legacy); on the other side there were the disrupted (bricks and mortars and other large incumbents paralyzed by the innovator’s dilemma). In the IoT era, things are a little trickier – some of the startups of the Internet era have grown up to be large companies themselves, for example, and it is less clear who is best equipped to disrupt who.
Large public tech and telecom companies have been all over the IoT, which they rightly regard as something that will truly move the needle for them over the next few years and possibly decades. It is entirely possible that, in some cases, announcements are ahead of reality, but nonetheless the trend is clear. Chipmakers (Intel, Qualcomm, ARM) are racing to dominate the IoT chip market. Cisco has been incredibly vocal about the “Internet of Everything” and walked the talk with the $1.4bn Jasper acquisition a few weeks ago. IBM announced a $3 billion investment in a new IoT business unit. AT&T has been aggressive in being the connectivity layer for cars, partnering with 8 out 10 top US car manufacturers. Many telecom companies view their upcoming 5G networks as the backbone of the IoT. Apple, Microsoft and Samsung have been very active across the ecosystem, offering both hubs/platforms (Homekit for Apple, SmartThings and an upcoming OS for Samsung, and Azure IoT for Microsoft) and end products (Apple Watch for Apple, Gear VR and plenty of connected appliances for Samsung and the upcoming HoloLens AR headset for Microsoft). Salesforce announced an IoT cloud a few months ago. The list goes on and on.
Alphabet/Google and Amazon are probably worth mentioning separately because of the magnitude of their potential impact.   From Nest (home) to SideWalk Labs (smart cities) to autonomous cars to the Google Cloud, Alphabet already covers huge portions of the ecosystem, and has invested billions in it. On Amazon’s end, AWS seems to be an ever increasing force that keeps innovating and launching new products, including a new IoT platform this year which it inevitably push aggressively to become the backend for the IoT; in addition, the company’s eCommerce operations are increasingly important to IoT products distribution, and Echo/Alexa is turning out to be a major sleeper hit for the company in the home automation world. Both Alphabet and Amazon very much move at the speed of the startups they were not so long ago, sit on immense amounts of user data, and have limitless access to top talent.
Outside the technology world, many “traditional” corporate giants (industrial, manufacturing, insurance, energy, etc.) have both a lot to gain and lot to fear from the Internet of Things.   This is a perhaps unprecedented opportunity to rethink just about everything.   The IoT will essentially enable (or perhaps, force) large companies to evolve from a product-centric model to a service-centric model. In an IoT-enabled world, large companies will have direct knowledge about how their customers actually use their products; they will be able to market and customize their offerings to a variety of needs (through the software); they will be able to predict when the product will fail and may need support; and they’ll have an opportunity to charge customers by usage (as opposed to a one-time purchase cost), opening the door to subscription models and direct long-term relationships with customers.   The impact of those changes on supply chain and retail is likely to be enormous. On the other hand, the threat is immense – what happens to the car industry, for example, as autonomous vehicles become a reality powered by software developed by Google, Apple, Baidu or Uber? Will they be relegated to the status of part maker?
The opportunity to thrive in an IoT world hinges largely on those companies’ ability to gradually evolve into software companies, an immensely difficult cultural and organizational transformation. Some traditional industry companies already have software arms – see Bosch Software Innovations for example or this piece about how General Electric recruited hundreds of software developers in its new Silicon Valley tech offices – so this is not an impossible task, but many companies will struggle immensely to do so.
What does this all mean for startups? Of course, the interest from large companies opens the door to all sorts of acquisition opportunities, both small and large, and sometimes for amounts that are largely disconnected from existing traction (see Nest, Oculus or Cruise) – large tech companies have already demonstrated their acquisition appetite, and large traditional companies will most likely need to acquire their way into becoming software companies.   On the other hand, for new startups intending to stay the course and become large independent companies, the path will occasionally be fairly narrow and will require astute maneuvering.   Larger companies (e.g. Alphabet/Nest) will certainly not build every single connected product (e.g., every home automation device), but at the same time they will likely preempt the larger opportunities in the space (e.g. being the home automation platform). Or occasionally they will be incredibly aggressive in pursuing the best talent in the market – let’s remember how a few months ago, Uber poached 40 robotics researchers from Carnegie Mellon to help fuel its self-driving technology ambitions. For young startups, the successful strategy will probably involve a combination of finding the right tip of the spear away from the more crowded areas of the market, and partnering with the right large corporate giants to have access to their manufacturing and distribution networks.
Conclusion
The Internet of Things is coming.  Obstacles abound, but as our landscape shows, there is an immense amount of activity happening worldwide from both startups and large companies that make this conclusion all but inevitable. Progress may seem slow in some ways, but in fact it is happening remarkably quickly when one pauses to think about the magnitude of the change a fully connected world requires. What seemed like complete science fiction 10 years ago is in the process of becoming reality, and we are getting very close to being surrounded by connected objects, drones and autonomous cars. The bigger question might be whether we are ready as a society for this level of change.

2016 Big Data Landscape





The Big Data Landscape looks like in 2016:
matt_turck_big_data_landscape_1000px
Obviously, that’s a lot of companies, and many others were not included in the chart, deliberately or not (scroll to the bottom of the post for a few notes on methodology).
In terms of fundamental trend, the action (meaning innovation, launch of new products and companies) has been gradually moving left to right, from the infrastructure layer (essentially the world of developers/engineers) to the analytics layer (the world of data scientists and analysts)  to the application layer (the world of business users and consumers) where “Big Data native applications” have been emerging rapidly – following more or less the pattern we expected.
Big Data infrastructure:  Still Plenty of Innovation
It’s now been a decade since Google’s papers on MapReduce and BigTable led Doug Cutting and Mike Cafarella to create Hadoop, so the infrastructure layer of Big Data has had the most time to mature and some key problems there have now been solved.
However, the infrastructure space continues to thrive with innovation, in large part through considerable open source activity.
2015 was without a doubt the year of Apache Spark, an open source framework leveraging in-memory processing, which was starting to get a lot of buzz when we published the previous version of our landscape. Since then, Spark has been embraced by a variety of players, from IBM to Cloudera, giving it considerable credibility.   Spark is meaningful because it effectively addresses some of the key issues that were slowing down the adoption of Hadoop: it is much faster (benchmarks have shown Spark is 10 to 100 times faster than Hadoop’s MapReduce), easier to program, and lends itself well to machine learning.  (For more on Spark, see our fireside chat at our Data Driven NYC monthly event with Ion Stoica, one of the key Spark pioneers and CEO of Spark in the cloud company Databricks, here).
Other exciting frameworks continue to emerge and gain momentum, such as Flink, Ignite, Samza, Kudu, etc.  Some thought leaders think the emergence of Mesos (a framework to “program against your datacenter like it’s a single pool of resources”) dispenses for the need for Hadoop altogether (watch a great talk on the topic by Stefan Groschupf, CEO of Datameer and learn more about Mesos by watching Tobi Knaupf of Mesosphere).
Even in the world of databases, that seemed to have seen more emerging players than the market could possibly sustain, plenty of exciting things are happening, from the maturation of graph databases (watch Emil Eifrem, CEO Neo4j), the launch of specialized databases (watch Paul Dix, founder of time series database InfluxDB) to the emergence of CockroachDB, a database inspired by Google Spanner, billed as offering the best of both the SQL and NoSQL worlds (watch Spencer Kimball, CEO of Cockroach Labs).  Data warehouses are evolving as well (watch Bob Muglia, CEO of cloud data warehouse Snowflake).
Big Data Analytics: Now with AI
The big trend over the last few months in Big Data analytics has been the increasing focus on artificial intelligence (in its various forms and flavors) to help analyze massive amounts of data and derive predictive insights.
The recent resurrection of AI is very much a child of Big Data.  The algorithms behind deep learning (the area of AI that gets the most attention these days) were for the most part created decades ago, but it wasn’t until they could be applied to massive amounts of data cheaply and quickly enough that they lived up to their full potential (watch Yann LeCun, pioneer of deep learning and head of AI at Facebook).  The relationship between AI and Big Data is so close that some industry experts now  think that AI has regretfully “fallen in love with Big Data” (watch Gary Marcus, CEO of Geometric Intelligence).
In turn, AI is now helping Big Data deliver on its promise.  The increasing focus on AI/machine learning in analytics corresponds to the logical next step of the evolution of Big Data: now that I have all this data, what insights am I going to extract from it? Of course, that’s where data scientists come in – from the beginning their role has been to implement machine learning and otherwise come up with models to make sense of the data.  But increasingly, machine intelligence is assisting data scientists – just by crunching the data, emerging products can extract mathematical formulas (watch Stephen Purpura, founder of Context Relevant ) or automatically build and recommend the data science model that’s most likely to yield the best results (watch Jeremy Achin, CEO of DataRobot).  A crop of new AI companies provide products that automate the identification of complex entities such as images (watch Richard Socher, CEO of MetaMind, Matthew Zeiler, CEO of Clarifai, and David Luan, CEO of Dextro) or provide powerful predictive analytics (e.g., our portfolio company HyperScience, currently in stealth).
As unsupervised learning based products spread and improve, it will be interesting to see how their relationship with data scientists evolve – friend or foe?  AI is certainly not going to replace data scientists any time soon, but expect to see increasing automation of the simpler tasks that data scientists perform routinely, and big productivity gains as a result.
By all means, AI/machine learning is not the only trend worth noting in Big Data analytics.  The general maturation of Big Data BI platforms and their increasingly strong real-time capabilities is an exciting trend (watch Amir Orad, CEO of SiSense and Shant Hovespian, CTO of Arcadia Data )
Big Data Applications: A Real Acceleration
As some of the core infrastructure challenges have been solved, the application layer of Big Data is rapidly building up.
Within the enterprise, a variety of tools has appeared to help business users across many core functions.  For example, Big Data applications in sales and marketing help with figuring out which customers are likely to buy, renew or churn, by crunching large amounts of internal and external data, increasingly in real-time. Customer service applications help personalize service; HR applications help figure out how to attract and retain the best employees; etc.
Specialized Big Data applications have been popping up in pretty much any vertical, from healthcare (notably in genomics and drug research) to finance to fashion to law enforcement (watch Scott Crouch, CEO of Mark43 ).
Two trends are worth highlighting.
First,  many of those applications are “Big Data Natives” in that they are themselves built on the latest Big Data technologies, and represent an interesting way for customers to leverage Big Data without having to deploy underlying Big Data technologies, since those already come “in a box”, at least for that specific function – for example, our portfolio company ActionIQ is built on Spark (or a variation thereof) , so its customers can leverage the power of Spark in their marketing department without having to actually deploy Spark themselves – no “assembly line” in this case.
Second, AI has made a powerful appearance at the application level as well.   For example, in the cat and mouse game that is security, AI is being leveraged extensively to get a leg up on hackers and identify and combat cyberattacks in real time.  “Artificially intelligent” hedge funds are starting to appear.  A whole AI-driven digital assistant industry has appeared over the last year, automating tasks from scheduling meetings (watch Dennis Mortensen, CEO of x.ai here) to shopping to bringing you just about everything.  The degree to which those solutions rely on AI varies greatly, ranging from near 100% automation to “human in the loop” situations where human capabilities are augmented by AI – nonetheless, the trend is clear.
Conclusion
In many ways, we’re still in the early innings of the Big Data phenomenon.  While it’s taken a few years, building the infrastructure to store and process massive amounts of data was just the first phase.  AI/machine learning is now precipitating a trend towards the emergence of the application layer of Big Data.   The combination of Big Data and AI will drive incredible innovation across pretty much every industry.  From that perspective, the Big Data opportunity is probably even bigger than people thought.
As Big Data continues to mature, however, the term itself will probably disappear, or become so dated that nobody will use it anymore.  It is the ironic fate of successful enabling technologies that they become widespread, then ubiquitous, and eventually invisible.

Thursday, March 10, 2016

Most Popular Coding Languages of 2016





Most Popular Coding Languages of 2016
Data on the "Most Popular Coding Languages" based on hundreds of thousands of data points collected by processing over 1,200,000+ challenge submissions in (now) 26 different programming languages. This gives us a pretty valuable insight on what the trends are in hiring demand amongst tech companies for the upcoming year. It's data we hope will be especially helpful for new computer science graduates or coders looking to stay ahead of the curve. (CodeEval is now being used as a classroom tool in a number of schools, from university programs to boot camps.)
 
Results

For the fifth year in a row, Python retains it's #1 dominance followed by Java, C++, and Javascript.
This year's most noticeable changes were a 27% increase in C# submissions, a 15% surge in Java, as well as a 21% increase in C submissions. While still reigning champ we saw a 14% drop in Python submissions as well as a 17% decline in Ruby usage.

Programming language ranking change by year.


We've seen a triple digit surge with R and Visual basic but they still only account for less than 1%. This year we added 5 new languages D, Fortran, Guile, OCaml and Scheme.


Programming language change percentage by year.

It's interesting to note the rise of Java after several years of steady decline. Could this be the year that Java overtakes Python? On the TIOBE index, another major index and a good indicator of market share, Java has surpassed both Python and Visual basic for the top spot. This may indicate a big popularity growth in the coming year. Note: Some of the newer languages we've added; D, Guile, Fortran, OCaml, and Scheme, may have suffered somewhat since they haven't had a full year inside the platform.