PCA
- 6 sections
- Designing and planning a cloud solution architect
- Managing and provisioning solution infrastructure
- Designing for security and compliance
- Analyzing and optimizing technical and business processes
- Managing implementation
- Ensuring solution and operation reliability
- Case studies
- Mountkirk games
- Dress4Win
- TerramEarth
Mountkirk games
- Online, session-based, multiplayer games for mobile platforms.
- Server-side integration: cloud based leased physical servers.
- MySQL databases.
- Analytics tools.
- Write game statistics to files and send them through an ETL tool that loads to a centralized MySQL database for reporting.
- Purposes:
- Building a new game
- Scale for their global audiences.
- Run their backend on Compute Engine.
- Capture streaming metrics; run intensive analytics.
- Adopt managed NoSQL database.
- Replace MySQL to a NoSQL scale better.
- Remove physical servers management.
Issues
- Scale for global audience: application servers, MySQL databases, and analytics tools.
- Limited analytics capability.
Requirements
Business requirements: global + scale + latency
- Global footprint
- Improve uptime (downtime = loss of players)
- Increase efficiency of cloud resources
- Reduce latency to all customers
Technical requirements
- Dynamically scale up/down based on game activity.
- Transactional database for user profiles and game state.
- Timeseries database for game activities.
- No data loss due to processing backlogs.
- OS hardening: i.e. Compute Engine.
Analytics requirements
- Dynamically scale.
- Process incoming data on the fly (real-time?) directly from the game servers.
- Queries on 10 TB historical data.
- Process files uploaded by users' mobile devices.
Solutions
Pain points
- Compute Engine (hardened OS) + MIG (scale).
- Scaling with a MySQL replacement.
- Analytics: Pub/Sub (ingest, no data loss, late) -> ETL (Dataproc) -> Cloud Spanner -> Bigtable (timeseries) -> BigQuery.
Solutions
- Compute: Compute, hardened OS, autoscaling, global load balancing, Pub/Sub.
- Compute Engine in Managed Instance Group (MIG); custom hardened golden image.
- HTTPs Load Balancing.
- Loggings/Monitoring with custom metric to scale up/down.
- Storage: NoSQL, Transactional DB, Timeseries (BigTable), SQL querying (BigQuery).
- Datastore for game state.
- Cloud SQL for profile.
- Analytics: data ingestion.
- Cloud Pub/Sub: buffer for live and late data.
- Cloud Storage for data uploads.
- Cloud Dataflow for batch and stream processing.
- BigQuery for storage and analytics.
Dress4Win Use case
- Web app and mobile app.
- Wardrobe management.
- Social network that connect users with designers/retailers.
- Monetize through advertising, e-commerce, referrals, freemium app.
Issues: cost and scale
- Colo data center: Insufficient infrastructure for growth.
- Current infra:
- MySQL: one server for user data, inventory, static data.
- Compute:
- Servers: Micro-services based APIs, static content.
- Hadoop/Spark servers: data analysis, real-time trending calculation.
- RabbitMQ servers: messaging, social notifications, events.
- Others: Jenkins, bastion hosts, Redis, monitoring.
- Storage:
- iSCSI for VM hosts.
- SAN for database.
- NAT for image storage, logs, backup.
- Not sure on a migration strategy (what to migrate first).
Requirements
Business requirements
- Reliable and reproducible environment mimicking production.
- Improve security: access control and other cloud best practices e.g. IAM.
- Business agility through rapid provisioning of new resources
- Analyze and optimize architecture for performance in the cloud
- First phase: migrate DEV/TEST/DR
Technical requirements
- Ease in replicating environments: IaC with Terraform/Deployment Manager.
- Ease in provisioning new resources: GKE
- CI/CD:
- Failover to cloud if needed
- Security: data encryption KMS (.boto)
- Private connections between production data center and cloud: Cloud VPN, Cloud Interconnect.
Analytics requirements
Solutions
Pain points
Solutions
- Cloud SQL: MySQL
- Cloud Memorystore: Redis
- Cloud Datastore: mobile Backend
- K8s/Container, Compute Engine for VMs; Cloud Marketplace for Nginx.
- Dataflow: Batch/real-time processing
- App Engine: Application
- Load Balancer: HA/routing
- Compute: local SSD
- Cloud Dataproc: HDFS -> Cloud Storage, Hbase -> BigTable
- BigQuery
- Cloud Pub/Sub + Cloud Functions
- Cloud VPN
- Cloud Marketplace for Jenkins apps
TerramEarth
- Manufacturer equipments.
- 500 dealers and service centers in 100 countries.
- Objective:
- Build products that make their customers more productive.
Issues
- 20 million vehicles, 120 data fields (per second), 22 hours per day, 9TB/day (batch).
- 200,000 vehicles are connected to a cellular network, data is collectedly in real-time, 9 TB/day.
- Linux and Windows-based systems, single region data center.
- Gzip CSV files, upload via FTP -> takes longer time, delay reports (3 weeks).
- Preemptively stock replacement parts -> reduce unplanned downtime of their vehicles 60%. But downtime can be up to 4 weeks.
- ERP solution is not in scope (deferred).
- Off-the-shelf analytic application, limited to 2 licenses tied to physical CPUs.
Requirements
Business requirements
- Decrease unplanned vehicle downtime < 1 week.
- Share reports with dealers for better preparation.
- Partner integration for supply.
Technical requirements
- Expand beyond single data center -> to reduce latency.
- Backup strategy.
- Data transfer security.
- Data analytics to anticipate customer needs.
Analytics requirements
Solutions
- BigQuery for DataWarehouse.
- IOT Core: helps
- IOT devices -> Cloud Pub/Sub
- Cloud DataFlow -> Cloud Storage
- API
- Replace off-the-shelf licensed software with DataStudio.
Misc
- Can create replica/slave on Cloud SQL to replicate (async) data from MySQL master on-prem.
- Use Storage Transfer Service (instead of gsutil rsync) to sync between AWS S3 and GCS.
- Dataprep only works with data in GCS. Alternate solution is using DLP API.
- GKE
- Tag image with specific version # and not using latest.
- Using “IfNotPresent” to avoid unnecessary image pull (vs. Always).
- Logs which is deleted past retention can’t be retrieved again.
- Snapshot should be done during downtime (e.g. maintenance).
- Pen test is recommended to be done from out side of GCP. There is no requirement to inform GCP.
- gcloud containers resize
- 5 phases of a cloud migration
- Assess: identify a. easy to move b. hard to move c. can’t move.
- Pilot: baby steps.
- Move data: data is quite independent, so easier to move.
- Move application
- Cloudify and optimize
- Lift and shift
- Add availability and elasticity (i.e. scale and HA).
- Refactor code: SQL to NoSQL, micro-services.
- HTTP 429 / 5xx: implement exponential backoff method.