Data Product Manager: Data asset Management product architecture planning

2022-08-12 0 By

Data asset management and governance is one of the four directions of data product manager.The author of this article shares what the data asset module is doing for us, in order to help partners who need to judge whether it is the direction of data products that can be tried.It is mentioned in the relevant article of data Product Manager practice Guide that data asset management and governance is one of the four directions of data product manager.Q2 began, recently in the collation of data assets direction of the product work planning, by the way to share, data assets module in the end what to do, but also convenient for everyone in the future when looking for a job (this year copper, iron and steel four market so that many people can only wait for jingzhe) judge whether you can try the direction of data products.First, who is the user to solve the problem?In the working methodology of b-end product manager, the first point is to make clear who your users are, what their demands are, and what points affect their work efficiency, which can be solved in the way of productization.The users of data asset products are divided into two categories: one is the producer of data asset and the other is the consumer of data asset.1. Work content and Appeal of asset producers Producers here refer to data developers. Although “we do not produce data, we are only porters of data”, they generate asset-based data after processing based on original rawdata.The picture above shows a “happy” day for a lot of data developers. Some people joke that they are doing “dirty work”. If something goes wrong, it is called data empowerment, and if something goes wrong, it is “data quality problem”.As a Chinese saying goes, “To do a good job, you must do a good job first.” Therefore, as a data asset product manager, it is only a little comfort to them to provide them with timely tools to work efficiently and quickly and help them manage and manage their own assets well.When developing data, ODS layer, DWD layer, APP layer, temporary table, a bunch of naming specification restrictions, write down the CPU consumption, remember the modeling is not standard after being criticized and have to correct.So, can we make it a little bit simpler, the way we develop it a little bit simpler.Sleep is 稥 when the alarm phone what the most disgusting, so, task scheduling operation and maintenance of the alarm strategy, failure retry mechanism can be more AI some?Even if it has to be handled manually, the task of one key notice, re-run can close your eyes to finish the operation and then sleep?Responsible for a lot of data models, business often ask where the data is, what the field means, can you leave me alone?Therefore, I want the model to be reused more, but it is better to use it by myself. I just want to coding quietly.Every time I was pointed at by the boss and said that the health of the model was poor, which model was not named properly, the metadata was missing, the task was time-consuming, and no one visited for a long time.So, can you provide a workbench, like farmers go to the field to see crops grow what kind of weeding, let me go to work every morning the first thing, the list of governance matters to complete in advance, the next boss will be praised directly, we want to learn from XX students, the development habit is very elegant.In addition to not being able to delete databases and run away, data developers also need to be responsible for data security issues, so they need streamlined, automated authorization and approval management processes.2. Scenarios and appeals of asset consumers refer to data developers who use data for business products, operations, analysis and secondary processing.As a data consumer, just like when you go to a physical store or an e-commerce platform to buy something, you want to be able to: find it, see it, and use it safely.That is to say, in the asset warehouse, SKU coverage is comprehensive, and specification parameters, usage (metadata) is fully visible, to help you decide whether it is needed, in addition, it is best to have some customer praise recommended or official certification of the fair trade proof, so that you can rest assured to use, not fall into the pit.The main demands of asset consumers include: When I need to use data but don’t know where the data is, I can have tools to guide me, from product line to data classification can gradually narrow the scope, and finally find him thousands of times, ah, there you are.So, you need the ability to map.For the handover of new employees, my predecessors told me that all the data needed were in this table. However, as I have a strong thirst for knowledge, I hope to know the ins and outs of the data, so that I can draw inferences from one point of view, rather than just changing the date parameter to look up the data, so I need convenient data retrieval ability.The data has been found. Is there any relevant certification to prove that there is no problem with the data today?Although I refuse to harass the data developers in my heart, I still want to find the person in charge or other people who have used the wall crack recommendation of this table to communicate when the logic is not clear and the data is uncertain.In addition to the use of tables for SQL query or drag and drop analysis, now not all mentioned in Taiwan, so, also hope to have direct output of data services, such as index API, label services, can be generated through the configuration of the interface interface, DAAS (data interface is a service).Second, the product system planning and design of the data asset module has clarified the users and their demands. The next step is to empower them through the corresponding data products.There may be overlap between the two types of users. For example, data developers can also use data developed by others as data consumers. Similarly, business people can also request to build tables themselves.Therefore, the asset product architecture design mainly covers data aggregation, processing, asset management, data governance, value output and other links.1. Data aggregation mainly solves the problem of data transmission between heterogeneous data sources. Data is collected from business databases, buried points at the product end, or other third-party API interfaces and FTP file transfer.In product function design, parameters required by different sources and targets are differentiated and can be connected one by one.In addition, data needs to be consumed synchronously daily or in real time, so it needs to be combined with the scheduling system to provide intelligent and automatic resource scheduling and task operation and maintenance capabilities.As a result, many data products incorporate data integration as a type of data development task within data development suite products.2. Data processing in this link is mainly to carry out data cleaning and logical processing for data usage scenarios based on business, including offline data development and real-time data development. It is equivalent to a data processing factory, which processes based on synchronized data sources and forms a highly available data model.The development suite is large and can be separated into separate product modules.At the same time, model building specifications can be integrated into the verification process of task development.More ex post verification, not just ex post governance.For example, dataphin and other process-based modeling or data processing tools are provided. 3. Asset-based management of data Asset-based management: Data processed by data factories need to be sorted and labeled with various specifications before they can be used by downstream consumers.Asset-based management mainly carries on the data table query retrieval through the data map, the metadata information maintenance query, provides the user with the convenient data guidance ability.Data consanguinity: it is the data link relationship through the whole process of data from entering the lake to the business terminal. First, it can facilitate the investigation of the ins and out of the process of data production and provide guidance for code searching and logic checking.In addition, based on the blood relationship, it can realize downstream notification when data is abnormal, one-click data management when downstream applications are not in use, and release storage computing resources.Data quality monitoring: Monitor the accuracy of the results of task execution, and find the inaccurate data caused by source database changes, development bugs and other problems in advance.Data management: establish asset health evaluation index system from different dimensions such as task resource consumption, time consumption, business use (hot and cold data), development specifications, model coverage reuse degree and so on, as well as data management workbench, you can know what pits to fill at work every day and bury yourself in advance.4. Output of data Value The ultimate purpose of big data is to generate value from data. First, data-based decisions, and second, data-driven intelligent products and refined operations.SQL AD hoc queries are SQL fetch based on the data model, while self-service analysis is a fool-drag-and-drop approach for business people with no SQL skills.In this link, index management and label asset management are closely related to assets. Through data API, data will be output to the front-end visual analysis products or the access application of the main process of products and operations.5. Data security management database, data tables, indicators and label metadata can be publicly consulted, but to really take the number of use, must first obtain the corresponding authorization, so it is necessary to provide one-key permission application, approval message notification, application automatic authorization after authorization and other processes of automatic product design.Data assets are the foundation of big data. In the early stage of business development, we pursue short, flat and fast development, and we still need to fill up the hole of non-standard and imperfect assets one by one in the future.The first problem to be solved in digital transformation is data aggregation and data assets.Product managers related to data asset module should not only have good product general ability, but also have a good understanding of big data ecology, data flow process, data warehouse construction and other theories, so that they can be more proficient in making products.But all changes are the same, and the data product modules involved in the process of collecting and storing data are similar from company to company.Everyone is a product manager columnist.Focus on the field of data products in Taiwan, covering development suite, data assets and data governance, BI and data visualization, precision marketing platform and other data products.Good at big data solution planning and product solution design.This article was originally published by Renren is a product manager, without the permission of the author, reprint is prohibited.The picture is from Unsplash, based on CC0 protocol.