This post will help you to avoid mistakes in Cloud projects from the beginning to release. No master you are SA, PM, System Engineer, DevOps Engineer or Developer, avoiding those mistakes will help you reduce the probability of failure for your project.
No 1. Do not include Infra design and construction effort to the estimation
Money always is the biggest problem. If biggest problem was solved, every thing else will be OK
Project Manager or who create the estimation often ignore or just put some man-days effort on Infrastructure activity.
Cause: Lack of experience on Cloud Project. Do not have able to imagine what need to do to design, construct a system.
How to avoid?
- Involve who has experience on cloud project to help estimate this work package.
- Clear with customer about number of resources, environment need to be deployed.
- In assesment phase, having assumption about system architect will help easier to estimate the effort.
No 2. Do not involve SA/SE to the cloud project
Instead of release to customer a system, we deliver to them a mess of source code.
Many project only focus on coding and testing activities, they only think about how to deploy the system several week before the release day. Project technology, framework, programming method need to be consider at early phase so that it can work correctly on the Cloud. The code run on your PC does not mean it can run on the Cloud without any consideration.
How to avoid?
- Assign SA/SE to follow the project from the beginning phase (Before every thing is going so far!)
- SA/SE must follow the status of the project until release to production.
- Need a highest reponsibility person who make the final decision for the overal solution.
- NG: Do not involve too many people with the same level (Example 3 SA to design an architect)
No 3. Do not aware enough about security risk
If any security incident occured, we do not have change to re-do!
Access priviledge to the account need to be created carefully. Sensitive information like username/password, access key need to be stored in appropriate way.
- Provide just-enough permission to the member (for example tester often need only Read permission).
- Turn-on MFA for all member accounts.
- Use password management tools (ex: Bitwarden)
- Everyone is Administrator.
- Sharing account between members.
- Store sensitive information to public cloud/drive.
No 4. Design an IMPOSSIBLE solution
The architect solution like the foundation of a building. If it wrong, the effort to correct will be huge
Many SA/SE choose solution for the project after several minutes search on Google. Lack of knowledge about the technology/service will lead us to a wrong architect.
How to avoid?
- Study carefully about a technology/service before you choose it. Pros/cons, limitation, when to use are some aspects you need to consider.
- Make some small PoC step to verify the feasibility of the solution. Even some customer willing to pay money for PoC phase.
No 5. Making the system architect too complicated
Complicate system architect will make it difficult to implement and maintain. To avoid making a complicated architect, you need to:
- Follow the best practice guideline from cloud provider.
- Choose the solution that matching with technical stack of your company/team.
- Newest not mean the best. (Avoid "following the trend").
No 6. Do not use Infrastructure as code to save effort
For small project or limit of effort, customer do not offer to spends effort to create infra-template. However, during the project, the cloud resources need to be delete and re-create many time. Infra-as-code will help you:
- Reduce the effort of resource re-creation or deploy to new environment
- Reduce the mistake when you forget to setting resource appropriately
- Created template can easily to be re-used in the future
- Avoid/Prohibit making change to resource manually (By cơm, sửa chui!)
So, even the customer do not approve the effort to create the template, just create for your own but not release it free to the customer. Also, company need a internal repository where every project can share their template to re-use.
No 7. Do not have consistency between dev/test/prd environment
For who is SA, SE, DevOps.
You can not make something not well designed on dev environment then hope that it will work well on production environment!.
To avoid the inconsistency between environment, you need to follow below principles:
- Implement exactly what has been designed.
- Dev/test/prd environment must have the same architect, region, service, setting (security rule, network rule). However, they can be difference in resource size.
- Using inra-as-code to make sure the consistency between environments.
- Bad example:
Putting more than one applications on a server to save cost.
Open all (0.0.0.0) network setting for dev purpose.
Using difference regions betwen Dev and Production.
No 8. Do not think about the failure (the plan B)
You should think about the failure of the system before it happen.
Some principle for this point:
- Your system must be able to auto-recovery
- Having plan for backup and recovery before making change. Recovery strategy must be defined and tested.
- Practice failure situation to measure RTO (Time to recover the system after failure).
- A machenism to detect system failure need to consider as design phase (not after system is constructed complete).
No 9. Do not have any process
If you do not have any process to follow, the Customer will force you to follow their way, which often direct us to failure.
Process for system construction, deployment, feature release, operation and mainternance must be defined before or during implementation of project.
Having a process for each activity help us to work more efficiency, reduce failure, risk. By forcing customer to follow our process and clearly SLA, we can avoid customer's over-demand which help to saving cost too.
No 10. Do not perform performance test appropriately before release to production
Everything must be tested with real workload before open to the end-user. Or you will received hundred thousands of 1 star rating.
All assumption about production environment is non-sense until they are proved.
- Assumpt production workload carefully base on the estimation from customer (number of user, amount of data, distribution of data,...).
- For migration project, need a reliable number of current workload.
- Reserve a big capacity before open to the end-user.
- System need to be scalable to adapt with end-user workload.
- Test with assumpted workload and apply performance turning until it meet requirement (response time avg, request/minute, cpu/memory utilization,...)
- Assumpt production workload base on unreliable information.
- Assumption not provided by customer (project team imagination).
- Do not have strategy for testing and turning performance.
- Dev team, infra team, test team do not work closed together.