BlogBack to index ←
Today, after three years in stealth, we’re very excited to announce the beta release of System for Enterprise, the solution to data discovery and knowledge management missing from data science, analytics, and ML. It’s already helping teams in our private beta discover, learn, and work better, and we humbly believe it will become the foundation of the truly data-first organization.
Alongside System for Enterprise, we built System, a free, open, collaborative, and ever-growing data and knowledge base of the world’s systems. We launched a private beta of System earlier this year with leading research, advocacy, and academic organizations, and will launch a public beta next year.
Both System for Enterprise and System are part of System Inc., a Public Benefit Corporation based in New York, and backed by leading venture and angel investors. Our mission is to relate everything, to help the world see and solve anything, as a system.
I wanted to share with you, our community, what motivated us to build System for Enterprise, how it’s helping data science teams today, and how we hope it might change companies and society in the future.
Why we built it
As COVID has so clearly revealed this year, the world is more complex and interconnected than ever before. And so, business is too. There is no longer one lever to pull to make the quarter, or a handful of independent metrics to track to steward a company, or one predictive model for all segments and markets. In this new world, the best companies will be those that see the whole system driving their business and market — that embrace complexity and have the tools and culture to harness it. The best companies will use this systemic understanding to prioritize what to focus on and where to invest, limit risk, and discover and pull the right levers at the right time.
Data science is the foundation for how we will get there. As we set out to conceive and design a platform to help move toward this North Star, we worked with a diverse community of fellow data scientists, analysts, and data and ML engineers to figure out how their foremost challenges today could be solved now, with this future in mind. (In fact, we can only reach this future if we start relating data and knowledge differently today. More on that below.)
We learned that the best teams try to employ a range of empirical methods to understand their business and the world around it. They strive to combine results from data science, user research, analytics, machine learning, and A/B testing to produce systemic, reliable, and actionable insights and strategies. This is extremely hard to do today because data and insights are not organized to be discovered or used as a system. (Think about it: how difficult is it for you to know and rank all the things that drive and are impacted by a single key metric? Wouldn’t it be amazing if you and your product and business colleagues could just look that up?) We believe the way we organize data and knowledge — inside companies or out in the world — should help, not hinder, that aim.
I founded System to solve this. System organizes data and knowledge in a radically new way to help you reveal the system that drives your business — solving the twin challenges of data and knowledge discovery and management.
We are data scientists who have both experienced the problems System solves and built the first generation of tools to tackle these problems. In our experience, data scientists produce the most value for their teams (and are themselves happiest!) when they actually get to do science. During my time leading data at Spotify, we found that our data scientists and analysts — and by extension, our product and business colleagues — were spending way too much time looking for data and insights, trying to figure out what to use and what we already knew, instead of producing new insights and impact. So we built Lexikon, and dramatically improved data discovery and knowledge management. System builds on the lessons we learned along the way.
How it helps
System is an always-on, always-learning metadata layer that runs through data science, analytics, and ML. More data science catalog than just data catalog, System links and organizes everything you use and produce, from features through to insights. System doesn’t store your data, code, notebooks, or documents. Instead, it integrates with the tools you already use — to store data, run analyses and A/B tests, write and deploy models, produce decks, etc. — and uniquely gathers, structures, and enriches metadata to help you answer these questions:
- What data do we have that measures x?
- What data do we have that can help predict y?
- What features did we use to train this model?
- Who is working with this feature?
- What is the distribution of this feature?
- How was this data generated?
- What data is often used with this data?
- Is this data recent? Frequently updated?
- Can I use this data?
- How did this feature perform in this model?
- How is the distribution of this feature changing?
- How important is this feature?
- How important is this dataset to our team?
- What do we know about this topic (e.g. Retention) or metric (e.g. WAU)?
- What is the definition of this metric?
- What are the drivers of this metric? What drives the drivers?
- What should we focus on to improve this metric?
- Does this hold across all our segments and markets?
- What models predict this metric?
- What experiments target this metric?
- What other metrics are correlated with this metric?
- Who works on this metric?
- What datasets measure this metric?
- What insights have we learned about this topic this month?
- What is the performance of this model?
- How is this model’s performance changing in production?
- Is the model drifting?
- Why are we seeing drift in this model?
How it works
System is able to answer these and many other valuable questions for data science teams because of the unique and proprietary way System relates metadata.
First, System relates things semantically. System learns and recommends relationships between features across datasets and between features and the metrics they measure — to help data/ML engineers, data scientists, and your product/business partners speak the same language.
Second, System relates things technically. System’s data model links features, datasets, metrics, segments, models, notebooks, projects, and insights — and augments those links based on provenance, dependency, and usage.
Third, System relates things statistically. From each dataset, model, notebook, or experiment you trivially add to System, System learns and links the statistical associations that drive your business. System recognizes over 100 model types, Python and R, and dozens of ways of characterizing a statistical association — from a simple correlation in a point-in-time analysis to a dynamic permutation score in a production ML model to a causal relationship.
Regardless of size or stage of data maturity, we think System for Enterprise will help your team learn and work better (and save time and money along the way). We’d love for you to give it a try and help us improve the platform ahead of GA. You can also subscribe to System Updates for product updates and release notes.