Digital transformation, the data-driven organization, and the ‘data economy’ are popular topics in boardrooms today. Regardless of what these terms exactly mean, it means that organizations want to do more with data. Data has to be deployed more widely, more efficiently, and more effectively to improve their business and decision-making processes and to increase their competitive power. Technically, this implies that new forms of data usage must be deployed, such as data science, self-service BI, embedded BI, edge analytics, and customer-driven BI.
Unfortunately, current IT systems such as the data warehouse and the transactional systems, can no longer cope with these new, more intense, and resource-intense forms of data usage. The current data architecture for data delivery is already overstretched. Some of these systems are over twenty years old and many changes and extensions have been applied. They can’t process the ever-increasing workload. Additionally, because they have become static and inflexible, implementing new reports and executing new forms of analytics have become very time consuming. In other words, the current data-architecture can’t cope with today’s current ‘speed of business change’.
The effect is that, understandably, countless organizations have decided to develop a new and future-proof data architecture. However, this is easier said than done. You don’t design data architectures every day. Which new technologies are available today? What is the influence of new technologies on the architecture, such as Hadoop, NoSQL, big data, data warehouse automation, and data-streaming? Which new architectural principles should be applied? How do we handle the new rules and regulations for data storage and analysis? And what is the influence of cloud platforms?
This two-day seminar answers most of the common questions architects have when designing a modern data architecture. This is done through guidelines, tips, and design rules. Concepts and technologies, such as data lakes, big data, datavault, cloud, data virtualization, Hadoop, NoSQL, data warehouse automation, and anonymization of data are discussed. The seminar is based on practical experiences while designing and implementing modern data architectures. Also, the relationship between a modern data architecture and more organizational aspects are addressed as well, including data quality, data governance, data strategy, and the migration to a new architecture.
- Part 1: Introduction; what is a data architecture?
- Part 2: Overview of new technologies for data storage, data processing, and data analytics
- Part 3: Modernizing existing data-architectures
- Part 4: Innovating new data architectures
- Part 5: The periphery of a data architecture
- Part 6: Requirements for a complete and correct data architecture
Part 1: Introduction; what is a data architecture?
- Why a new data architecture?
- Examples of real life data architectures
- What are the key components of a data architecture?
- What are the differences between a data architecture and a solutions-architecture?
- From batch via Lambda to the Kappa architecture
- Benefits, drawbacks, and shortcomings of well-known reference architectures, such as the classic data warehouse architecture, the data lake, and transactional systems
- From vision to implementation plan
- The importance of reusable transformation specifications for e.g. integration, filtering, correcting, and aggregation of data
Part 2: Overview of new technologies for data storage, data processing, and data analytics
- Benefits, drawbacks, features, and use cases of each technology
- Data storage: analytical SQL, NoSQL, Hadoop, cubes
- Data integration: ETL, data virtualization, data replication, data warehouse automation, database triggers
- Data cleansing: home-made, professional
- Data streaming: messaging, Kafka, streaming SQL
- Data documentation: data glossary, data catalog, metadata management
- Reporting tools: self-service BI, dashboards, embedded BI
- Data science tools: programming languages, such as R and Python, machine learning automation tools, data science workbenches
- Data security: anonymization, authorization
Part 3: Modernizing existing data-architectures
- First the technology or first the data architecture?
- Influence of specialized technology on data architectures
- Why migration to the cloud: unburdening, high performance, scalability, available software?
- Are all software products suitable for the cloud?
- Modernization of a classic data warehouse architecture
- Generating a data warehouse architecture with data warehouse automation tools
- New requirements for transactional systems, such as storing historic data
- The influence of GDPR: deleting customer data
- Responsibility of data quality
Part 4: Innovating new data architectures
- The logical data warehouse architecture as an agile alternative
- Design rules, do’s and don’ts for a logical data warehouse architecture
- From a single-purpose to a multi-purpose data lake
- Requirements for implementing data science models, such as transparency, immutability, and version control
- The changing role of the data lake: from data delivery system for data scientists to a platform for storing all the enterprise and external data
- A data streaming architecture; when every microsecond counts
- Technical challenges: performance, inconsistent data streams, storing massive amounts of messages for analytics afterwards
- Operationalization of data science models
- Merging data architectures to one unified data delivery platform
- Differences between data hub and data warehouse
Part 5: The periphery of a data architecture
- Setting up data governance
- The importance of a data strategy and the relationship with the data architecture
- Data strategy and business strategy in harmony
- The culture of the organization: informal or formal, self-service potential
- Maturity levels of the organization
- Technical and organizational aspects of data quality
Part 6: Requirements for a complete and correct data architecture
- What is the business motivation for a new data architecture; ICT cost reduction, competitive improvement, new business model, new laws and regulations, improving reaction speed to business demands, or more efficient exploitation of available data
- Who are the stakeholders and what is the C-level support?
- Description of the current data architecture; data flow, data storage, quantities, and technologies in use
- Stock-taking of current bottlenecks; business and ICT, performance, functionality, costs, ICT organization and the immediate environment
- Restricting aspects; laws and regulations, budget, software and systems that have to remain
- Requirements and needs of the new data architecture; financial, available expertise, software, quantities, uptime, speed of data delivery, and level of unburdening
- Architecture and design principles
- Current and future forms of data usage: standard reports, self-service BI, data science, customer-driven, mobile apps
- Forms of data usage; batch, manual internally, manual extern ally, and sensors
- Data types in use; structured, unstructured, audio, video, text, and geo/gis
- Status of the data architecture project; which choices must be made, which steps to take, why does the project falter, is a PoC or Pilot required, what are key questions in a RfI, convincing the organization