<< Back to article Print this page Loading page, please wait...

Snowflake taps Python to take on Teradata, Google BigQuery, and AWS Redshift

Snowflake's updates include the introduction of support for Python on Snowpark, data access capabilities, and external tables for on-premises storage.

Anirban Ghoshal (InfoWorld)
15 June, 2022 08:00

Cloud-based data warehouse vendor Snowflake has introduced a new set of tools and integrations to take on rival firms such as Teradata, and services such as Google BigQuery, and Amazon Redshift.

Unveiled at its annual Snowflake Summit, the new capabilities, which include data access tools and support for Python on the company's Snowpark application development system, are aimed at data scientists, data engineers and developers with the intent of accelerating their machine learning journey, in turn speeding up application development.

Snowpark, launched a year ago, is a dataframe-style development environment designed to allow developers to deploy their preferred tools in a serverless manner to Snowflake's virtual warehouse compute engine. Support for Python is in public preview.

"Python is probably the single most requested capability that we hear from our customers," said Christian Kleinerman, senior vice president of products at Snowflake.

The demand for Python makes sense, as it is a language of choice for data scientists, analysts say.

"Snowflake is actually catching up on this front, as rivals including Teradata, Google BigQuery and Vertica already have Python support," said Doug Henschen, principal analyst at Constellation Research.

In one of the updates announced at the summit, the company said that it was adding a Streamlit integration for application development and iteration. Streamlit, which is an open source app framework in Python targeted at machine learning and data science engineering teams to help visualise, change and share data, was acquired by Snowflake in March.

The integration will allow users to stay within the Snowflake environment, not only to access, secure, and govern data, but to develop data science apps to model and analyse data, said Tony Baer, principal analyst at dbInsights.

Snowflake launches Python-related integrations

Some of the other Python-related integrations include Snowflake Worksheets for Python, Large Memory Warehouses, and SQL Machine Learning.

Snowflake Worksheets for Python, which is in private preview, is designed to allow enterprises to develop pipelines, machine learning models and applications in the company's web-based interface, dubbed Snowsight, the company said, adding that it has abilities such as code autocomplete and custom-logic generation.

In order to help data scientists and development teams execute memory-intensive operations such as feature engineering and model training on large data sets, the company said it was working on a feature called Large Memory Warehouses.

Currently in the development phase, Large Memory Warehouses will provide support for Python libraries through integration with the Anaconda data science platform, it added.

"Multiple rivals are configurable to support large-memory warehouses as well as Python functions and language support, so this is Snowflake keeping up with market demands," Henschen said.

Snowflake is also offering SQL Machine Learning, starting with time-series data, in private preview. The service will help enterprises embed machine learning-powered predictions and analytics in business intelligence applications and dashboards, the company said.

Many analytical database vendors, according to Henschen, have been building machine learning models for in-database execution.

"The rationale behind Snowflake starting with time-series data analysis is [that it is] among the more popular machine learning analyses, as it's about predicting future values based on previously observed values," Henschen said, adding that time-series analysis has many use cases in the financial sector.

Snowflake updates enable more data access

With the logic that faster access to data could lead to faster application development, Snowflake also introduced new capabilities including Streaming Data Support, Apache Iceberg Tables in Snowflake, and External Tables for on-premises storage.

Streaming Data Support, which is in private preview, will help eliminate the boundaries between streaming and batch pipelines with Snowpipe Streaming. Snowpipe is the company's continuous data ingestion service.

The rationale behind launching the feature, according to Henschen, is the high interest in supporting low-latency options, including near-real-time and true streaming, and most vendors in this market have checked the streaming box.

"The feature gives engineering teamsÂ a built-in way to analyse the stream alongside the historical data, so data engineers don't have to cobble together something themselves. It's a time saver," Henschen said.

In order to keep up with demand for more open source table formats, the company said that it was developing Apache Iceberg Tables to run in its environment.

"Apache Iceberg is a very hot open source table format and it's quickly gaining traction for analytical data platforms. Table formats like Iceberg provide metadata that helps with consist and scalable performance. Iceberg was also recently adopted by Google for its Big Lake offering," Henschen said.

Meanwhile, in an effort to keep its on-premises customers engaged while trying to get them to adopt its cloud data platform, Snowflake is introducing External Tables On-Premises Storage. Currently in private preview, the tool allows users to access their data in on-premises storage systems from companies including Dell Technologies and Pure Storage, the company said.

"Snowflake had a 'cloud-only' policy for some time, so they clearly had big important customers who wanted some way to bring on-premises data into analysis without moving it all into Snowflake," Henschen said.

Further, Henschen said that rivals including Teradata, Vertica and Yellowbrick offer on-premises as well as hybrid and multi-cloud deployment.