Launch HN: Narrator (YC S19) – a data modeling platform built on a single table https://ift.tt/34bcMD0
Launch HN: Narrator (YC S19) – a data modeling platform built on a single table Hi HN, We’re Ahmed, Cedric, Matt, and Mike from Narrator ( https://www.narrator.ai ). We’ve built a data platform that transforms all data in a data warehouse into a single 11-column data model and provides tools for analysts to quickly build any table for BI, reporting, and analysis on top of that model. Narrator initially grew out of our experience building a data platform for a team of 40 analysts and data scientists. The data warehouse, modeled as a star schema, grew to over 700 data models from 3000+ raw production tables. Every time we wanted to make a change or build a new analysis, it took forever as we had to deal with managing the complexity of these 700 different models. With all these layers of dependencies and stakeholders constantly demanding more data, we ended up making lots of mistakes (i.e. dashboard metrics not matching). These mistakes led to loss of trust and soon our stakeholders were off buying tools (Heap, Mixpanel, Amplitude, Wave Analytics, etc…) to do their own analysis. With a star schema (also core to recently IPO-ed Snowflake), you build the tables you need for reporting and BI on top of fact tables (what you want to measure, i.e. leads, sales…) and dimension tables (how you want to slice your data, i.e. gender, company, contract size…). Using this approach, the amount of fact and dimension tables grow in size and complexity in relation to the number of questions / datasets / metrics that need to be answered by the business. Over time the rate of new questions increases rapidly and data teams spend more time updating models and debugging mismatched numbers than answering data questions. What if instead of using the hundreds of fact and dimension tables in a star schema, we could use one table with all your customer data modeled as a collection of core customer actions (each a single source of truth), and combine them together to assemble any table at the moment the data analyst needs that table? Numbers would always match (single source of truth), any new question could be answered immediately without waiting on data engineering to build new fact and dimension tables (assembled when the data analyst needs it), and investigating issues would be easy (no nested dependencies of fact and dimension tables that depend on other tables). After several iterations, Narrator was born. Narrator uses a single 11-column table called the Activity Stream to represent all the data in your data warehouse. It’s built from sql transformations that transform a set of raw production tables (for example, Zendesk data) into activities (ticket opened, ticket closed, etc). Each row of the Activity Stream has a customer, a timestamp, an activity name, a unique identifier, and a bit of metadata describing it. Creating any table from this single model made up of activities that don’t obviously relate to each other is hard to imagine. Unlike star schema, we don’t use foreign keys (the direct relationships in relational databases that connect objects, like employee.company_id → company.id) because they don’t always exist when you’re dealing with data in multiple systems. Instead each activity has a customer identifier which we use, along with time, to automatically join within the single table to generate datasets. As an example, imagine you were investigating a single customer who called support. Did they visit the web site before that call? You’d look at that customer’s first web visit, and see if that person called before their next web visit. Now imagine finding all customers who behaved this way per month -- you’d have to take a drastically different approach with your current data tools. Narrator, by contrast, always joins data in terms of behavior. The same approach you take to investigate a single customer applies to all of them. For the above example you’d ask Narrator’s Dataset tool to show all users who visited the website and called before the next visit, grouped by month. We started as a consultancy to build out the approach and prove that this was possible. We supported eight companies per Narrator data analyst, and now we’re excited for more data folks to get their hands on it so y’all can experience the same benefits. We’d love to hear any feedback or answer any questions about our approach. We’ve been using it ourselves in production for three years, but only launched it to the public last week. We’ll answer any comments on this thread and can also set up a video chat for anyone who wants to go more in-depth. September 30, 2020 at 05:30PM
No comments