Data science

Who’s Winning in Open Source Data Tech

Organizations understandably want the innovation that comes with using open source data management tech, but how do you know when it’s too green for adoption? A new survey from OpenLogic seeks the opinions of the masses in gauging what tech is being used in prime time and what’s not.

OpenLogic, which is a subsidiary of Perforce, provides professional support services for more than 400 open source technologies, ranging from MongoDB and Apache Spark to Kubernetes and Elastic. For its Open Source Trend Report, it conducted two separate surveys: one of its internal enterprise architect support staff, and another with external development professionals.

The enterprise architects survey listed some technologies that you likely already know about, as well as some names that you may not heard of. For instance, it should come as no surprise that PostgreSQL was listed in the upper-right portion of its open source data technologies quadrant (Fig 1), which measures “importance to modern development” along the Y axis and the relative maturity along the X axis.

Surrounding PostgreSQL in that quadrant were some familiar names: Apache Kafka, Apache ActiveMQ, MariaDB, Apache Camel, Apache Cassandra, Couchbase, and MongoDB. Even good old Apache Hadoop scores fairly well in terms of its maturity level and importance to modern developers. Apache Spark and Apache NiFi scored above the fold in terms of importance to modern developers, while splitting the field in terms of relative maturity.

Apache ActiveMQ Artemis, which is the codename for the HornetQ code that was donated to the Apache Software Foundation, and will eventually be the successor to Apache ActiveMQ, was deemed relatively mature, if not terribly important to modern development, while CockroachDB and Strimzi (which is Apache Kafka running on Kubernetes) occupied quadrant four with lower maturity and importance to developers.

Fig. 1.  Enterprise architects rank open source technologies’ relative maturity and importnance to modern development (Source: OpenLogic Open Source Trend Report)

OpenLogic Enterprise Architect Connor Penhale says the relative maturity levels are changing quickly. “There are data technologies, like Strimzi, that don’t even have a major version number yet – but are in production at large, enterprise companies,” Penhale states in the report. “For those who have the in-house expertise and developer hours to support these cutting edge data technologies early on in their lifecycle, they can leverage the benefits of these packages before many other organizations.

OpenLogic also queried 200 external developers to gauge their perceptions of open source databases along several lines, including awareness, usage, and future plans. MySQL topped the list, with the highest scores across those three metrics, followed by two other data stores, PostgreSQL and MongoDB, which were very close in ranking (see Fig 2).

Hadoop may have surprised some folks with a fourth-place showing. While it missed the podium, the big yellow elephant scored reasonably well in awareness and actual usage, and it scored about the same as MariaDB when it comes to adoption.

Rounding out the top 10 among databases were Elasticsearch, AmazonRDS, Redis, MariaDB, and Cassandra, showing that companies are mixing and matching NoSQL and relational database tech in relatively equal numbers.

MySQL has the most awareness, current adoption, and future adoption plans among open source databases (Source: OpenLogic Open Source Trend Report)

“There are a few surprises here,” comments OpenLogic Enterprise Architect Vince Cox, “but one of the big ones is Cassandra being ranked in the middle of the pack. However, when you look at the number of respondents from enterprise organizations it makes a bit more sense. Larger enterprises often make large, ongoing investments into their data infrastructure technologies. Making a change at that scale needs to present a clear (often monetary) benefit. Otherwise it’s just adding risk without a reward.”

When OpenLogic asked its internal panel of architects to rate the relative rate of adoption and the importance to modern development, Kafka came out on top, far and away the leader to the right and the left. Considering that the company behind Kafka, Confluent, just went public, this is perhaps not surprisingly.

Clustering just behind Kafka are a gaggle of open source data technologies, including Prometheus, etcd, Elasticsearch, MongoDB, PostgreSQL, Cassandra, Redis, and MariaDB, and Spark. At the far left of the survey results are Hadoop, MySQL, Jackrabbit, and CockroachDB, which the company says represents slowing adoption.

“The first thing I noticed in looking at our internal survey results,” OpenLogic Chief OSS Evangelist Javier Perez says in the report, “is that the data technologies, like Cassandra, Kafka, Spark, Elasticsearch, and PostgreSQL are seeing increasing adoption. Data and AI technologies are driving the adoption of data-driven business decisions and automating processes with a level of sophistication not available just a few years ago.”

In terms of development priorities, security is far and away the number one thing that developers are focused on at the moment, according to OpenLogic’s survey. Improving the customer experience, CI/CD (continuous integration/continuous deliver), containerization, and leveraging “big data” were the next four most important focuses, respectively.

“A series of high profile hacks as well as the accelerated pace of digital transformation has made securing our digital assets every bit as important as our physical assets–in some cases, even more so,” says Justin Reock, Perforce’s chief evangelist for OSS and API management.

You can download a copy of the report here.

Related Items:

A Peek at the Future of the Open Data Architecture

Do Customers Want Open Data Platforms?

Open Source Still Rolling, But Roadblocks Loom

Back to top button