This looks like a branch created out of another branch (`sfedoriv/APIAddSupportForAmazonEC2ComputeCloudService`) which is not yet merged into master.
So I'll review diff: https://git.bubblev.org/bubblev/bubble/compare/sfedoriv/APIAddSupportForAmazonEC2ComputeCloudService...cobbzilla/introduce_packer and put my comments here. I guess most of changed files here are actually either new or deleted files, so it should be easy for me to do so.
kris
changed title from cobbzilla/introduce_packer to WIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 )cobbzilla/introduce_packer4 years ago
kris
changed title from WIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 )cobbzilla/introduce_packer to WIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 ) cobbzilla/introduce_packer4 years ago
I did something similar while I didn’t have root access to btest5. Only root has rights to do supervisorctl, so it would be good to add a check - if user is set and it is not root, do not allow execution of this script without norestart option.
Also, do rm -rf /tmp/classes afterwards, so the same script may be called with other user afterwards. The one created here is owned by the currently used user, so if it is root, next call with bubble@hostname will fail.
I did something similar while I didn't have root access to btest5. Only `root` has rights to do `supervisorctl`, so it would be good to add a check - if `user` is set and it is not `root`, do not allow execution of this script without `norestart` option.
Also, do `rm -rf /tmp/classes` afterwards, so the same script may be called with other `user` afterwards. The one created here is owned by the currently used `user`, so if it is `root`, next call with `bubble@hostname` will fail.
You’re introducing usage of this script by some other user and not default root within the header comment at the top of this script. And for that to be completed, this change should be done. However, this PR is already a huge one, so, I have to agree that this can fo in future PR.
You're introducing usage of this script by some other user and not default `root` within the header comment at the top of this script. And for that to be completed, this change should be done. However, this PR is already a huge one, so, I have to agree that this can fo in future PR.
restart case should be restricted to root user only
/tmp/bubble.jar file should be removed at the end of this script
Also, any currently existing /tmp/bubble.jar and /tmp/classes should be deleted from all existing servers once this change is applied, so these scripts will actually work for other than root user. This can be done simply by calling these 2 patch scripts as root user on all servers before using them with any other user.
Same as in `bpatch`:
- restart case should be restricted to root user only
- /tmp/bubble.jar file should be removed at the end of this script
Also, any currently existing /tmp/bubble.jar and /tmp/classes should be deleted from all existing servers once this change is applied, so these scripts will actually work for other than root user. This can be done simply by calling these 2 patch scripts as root user on all servers before using them with any other user.
I’m not yet sure where the file came from as I cannot find it in this PR, but grep -rn /home/bubble/current . returns: ./bubble-server/src/main/resources/packer/roles/bubble/tasks/main.yml:71: dest: /home/bubble/current
I guess, the current -> api replacement should be done in there also.
I'm not yet sure where the file came from as I cannot find it in this PR, but `grep -rn /home/bubble/current .` returns:
`./bubble-server/src/main/resources/packer/roles/bubble/tasks/main.yml:71: dest: /home/bubble/current`
I guess, the current -> api replacement should be done in there also.
We can probably remove that. That line is creating a symlink from current -> api just in case anyone still refers to the jar or env file under current instead of api, it will get the same file.
We can probably remove that. That line is creating a symlink from current -> api just in case anyone still refers to the jar or env file under `current` instead of `api`, it will get the same file.
jar command is not available on btest5 while it is used within bpatch script. Should we add openjdk-11-jdk-headless here, or is it added somewhere else in this PR (I’ll try finding it, but still, I’ll leave this comment as a reminder).
`jar` command is not available on btest5 while it is used within bpatch script. Should we add `openjdk-11-jdk-headless` here, or is it added somewhere else in this PR (I'll try finding it, but still, I'll leave this comment as a reminder).
Arrays.stream(names).anyMatch(cfgName::equals) might work a bit faster, but I guess this is not so important here as names array should not be larger.
`Arrays.stream(names).anyMatch(cfgName::equals)` might work a bit faster, but I guess this is not so important here as names array should not be larger.
...
.findFirst()
.map(cr -> configRegion.setId(cr.getId()))
.ifPresentOrElse(cloudRegions::add),
() -> log.warn("initRegions: config region not found: " + configRegion.getInternalName());
So actually no need for cloudRegion variable here.
No need to use null below. Instead:
```
...
.findFirst()
.map(cr -> configRegion.setId(cr.getId()))
.ifPresentOrElse(cloudRegions::add),
() -> log.warn("initRegions: config region not found: " + configRegion.getInternalName());
```
So actually no need for `cloudRegion` variable here.
Strange naming and a bit confusing. According to the code above, this returned array list actually contains (slightly altered) configRegion-s (line 63), and so it’s name cloudRegions is not really correct. Maybe just call it regions.
Strange naming and a bit confusing. According to the code above, this returned array list actually contains (slightly altered) `configRegion`-s (line 63), and so it's name `cloudRegions` is not really correct. Maybe just call it `regions`.
And here it is inverted as opposed to the regions method - here the result is really made of couldSize entries. Is this intentionally? Or are these cloud vs. config entries really the same thing except these few fields updated in these methods?
And here it is inverted as opposed to the regions method - here the result is really made of `couldSize` entries. Is this intentionally? Or are these cloud vs. config entries really the same thing except these few fields updated in these methods?
The cloud regions are (for Vultr and DigitalOcean) coming from the vendor API. The config regions come from the JSON config for the CloudService. It’s possible that the vendor has new regions that we don’t have, or that we have obsolete regions that the vendor has decommissioned.
The cloud regions are (for Vultr and DigitalOcean) coming from the vendor API. The config regions come from the JSON config for the CloudService. It's possible that the vendor has new regions that we don't have, or that we have obsolete regions that the vendor has decommissioned.
@Override public List<PackerImage> getPackerImages() {
final List<PackerImage> images = getResources(PACKER_IMAGES_URI, new DigitalOceanPackerImageParser(configuration.getVersion(), packerService.getPackerPublicKeyHash(), configuration.getJarSha()));
According to the code, I would say that this cannot be null. One less check to do here. Just add lombok’s @NonNull annotation to getResources and all Parser calsses’ newResults methods.
According to the code, I would say that this cannot be null. One less check to do here. Just add lombok's `@NonNull` annotation to `getResources` and all Parser calsses' `newResults` methods.
I’m just guessing that we should use bandwidth here, and that it is given in TB again.
According to https://www.vultr.com/api/#plans_plan_list the property's name is just `bandwidth`.
But then again https://www.vultr.com/api/#plans_plan_list_baremetal API call has `bandwidth_tb`.
I'm just guessing that we should use `bandwidth` here, and that it is given in TB again.
Why change in method’s name here? Similar method in DigitalOcean’s class was named just getResources.
BTW, I link this one loadCloudResources name better as it is not a single getter.
Why change in method's name here? Similar method in DigitalOcean's class was named just `getResources`.
BTW, I link this one `loadCloudResources` name better as it is not a single getter.
final HttpRequestBean destroyKeyRequest = auth(new HttpRequestBean(POST, DESTROY_SSH_KEY_URL, keyData));
@Override public List<PackerImage> getPackerImages() {
final List<PackerImage> images = loadCloudResources(SNAPSHOT_URL, new VultrPackerImageParser(configuration.getVersion(), packerService.getPackerPublicKeyHash(), configuration.getJarSha()));
Autowiring class of the same type (Services). And this instance is used only once, while configuration is available. Please use configuration.getBean instead.
Autowiring class of the same type (Services). And this instance is used only once, while `configuration` is available. Please use configuration.getBean instead.
Completely unrelated to ROLE_PATH, this line was returning false in case if just activated() returned true. Was this a bug, or should we leave just this activated() check here? Same question in remove line 117 right after this change.
Completely unrelated to ROLE_PATH, this line was returning false in case if just `activated()` returned true. Was this a bug, or should we leave just this `activated()` check here? Same question in remove line 117 right after this change.
We don’t need the check anymore since roles are no longer stored in localstorage. Can probably remove activated everywhere in this class, but will wait on that for now.
We don't need the check anymore since roles are no longer stored in localstorage. Can probably remove `activated` everywhere in this class, but will wait on that for now.
If ansible scripts are the same, than node started as restore should executed only sage roles initially (or at least some tasks from algo and mitm should be excluded). The remaining two roles should be executed after restore call run on that restored instance bubble API’s.
Changes I recently made to automation files should be applied to bubble-server/src/main/resources/ansible/roles files for restoring to work properly as it is implemented now.
Other way would be to use packer for full backup and restore process instead of one we have right now. Should we do that - it would take more space, right?
If ansible scripts are the same, than node started as restore should executed only sage roles initially (or at least some tasks from algo and mitm should be excluded). The remaining two roles should be executed after restore call run on that restored instance bubble API's.
Changes I recently made to automation files should be applied to bubble-server/src/main/resources/ansible/roles files for restoring to work properly as it is implemented now.
Other way would be to use packer for full backup and restore process instead of one we have right now. Should we do that - it would take more space, right?
It’s an interesting idea to use packer for backup/restore, but (1) it would take a LOT more space and (2) what if you wanted to restore to a different cloud? there’s no way to restore a Vultr image to a DigitalOcean droplet, for example. I think the existing way of backing up/restoring through a CloudService of type storage is nice because it is cloud-independent.
It's an interesting idea to use packer for backup/restore, but (1) it would take a LOT more space and (2) what if you wanted to restore to a different cloud? there's no way to restore a Vultr image to a DigitalOcean droplet, for example. I think the existing way of backing up/restoring through a `CloudService` of type `storage` is nice because it is cloud-independent.
// determine lat/lon to find closest cloud region to perform build in
final GeoLocation here = geoService.locate(account.getUuid(), getExternalIp());
final List<CloudRegionRelative> closestRegions = findClosestRegions(configuration, new SingletonList<>(cloud), null, here.getLatitude(), here.getLongitude());
if (empty(closestRegions)) return die("writePackerImages: no closest region could be determined");
I couldn’t find out why is this needed. And does this influences memory usage anyhow? It’s currently happening only within newNode method, but it is not anyhow limited for usage in other places in the future (i.e. with a JavaDoc comment which will discourage usage of this).
I couldn't find out why is this needed. And does this influences memory usage anyhow? It's currently happening only within newNode method, but it is not anyhow limited for usage in other places in the future (i.e. with a JavaDoc comment which will discourage usage of this).
In StandardNetworkService we re-enabled retries if the SSH command fails due to SSH not yet being available. In this case, the progressMeter was being closed, and no further writes were allowed. Thus, if the second attempt to run ansible succeeded, it would run but you would not see any output and the progress meter would be stuck.
In `StandardNetworkService` we re-enabled retries if the SSH command fails due to SSH not yet being available. In this case, the progressMeter was being closed, and no further writes were allowed. Thus, if the second attempt to run ansible succeeded, it would run but you would not see any output and the progress meter would be stuck.
Yeah it’s faster to pass it because the caller already has it. Getting it from configuration requires a spring bean lookup. Probably cached, but still.
Yeah it's faster to pass it because the caller already has it. Getting it from configuration requires a spring bean lookup. Probably cached, but still.
It would be good to have all the var names here as constants - I was looking for roles as I find that one in json.hbs, but there are some others used in this class.
It would be good to have all the var names here as constants - I was looking for `roles` as I find that one in json.hbs, but there are some others used in this class.
As this is actually using ctx to build a value which will be set as ENV, should this env.put part go further back in the code - after the last ctx.put call and before usage of env in command execution (line 221).
As ctx is not really read and used in such way anywhere in between, it would be safe to move this complete // set environment variables part there.
As this is actually using `ctx` to build a value which will be set as ENV, should this env.put part go further back in the code - after the last `ctx.put` call and before usage of `env` in command execution (line 221).
As ctx is not really read and used in such way anywhere in between, it would be safe to move this complete `// set environment variables` part there.
What if within these images we find one that is for all regions (image.getRegions() == null as checked above in line 155). Should be break there as use that one?
If so, maybe a better check in line 155 would be
final var existingForAllRegions = existingForInstallType.stream().filter(i -> i.getRegions() == null).findFirst();
if (existingForAllRegions.isPresent()) {
// found an image for all regions: `existingForAllRegions.get()`
...
} else {
...
}
What if within these images we find one that is for all regions (`image.getRegions() == null` as checked above in line 155). Should be break there as use that one?
If so, maybe a better check in line 155 would be
```
final var existingForAllRegions = existingForInstallType.stream().filter(i -> i.getRegions() == null).findFirst();
if (existingForAllRegions.isPresent()) {
// found an image for all regions: `existingForAllRegions.get()`
...
} else {
...
}
```
log.info("packer images already exist for "+installType+" for regions: "+existingRegions.stream().map(CloudRegion::getInternalName).collect(Collectors.joining(", ")));
final List<String> existingRegionNames = existingRegions.stream().map(CloudRegion::getInternalName).collect(Collectors.toList());;
I guess I thought of just calling it up to map. Even if this collecting is done, and then we do String.join in the log, then that String.join is executed without need.
But then again, collect has to be done twice, and I’m not sure is this optimized in Java streams properly.
Anyhow, the list should be short, right, so you can just skip this. Sorry for this one.
I guess I thought of just calling it up to `map`. Even if this collecting is done, and then we do String.join in the log, then that String.join is executed without need.
But then again, collect has to be done twice, and I'm not sure is this optimized in Java streams properly.
Anyhow, the list should be short, right, so you can just skip this. Sorry for this one.
So this will actually build packer image for each and every region on the first call. I hope that packer will create those images in parallel so there will be no significant delay.
So this will actually build packer image for each and every region on the first call. I hope that packer will create those images in parallel so there will be no significant delay.
On the other hand, VultrDriver is calling just getPackerImage(node.getInstallType()) which takes just the first packer image. Do we have any need for different images in different regions yet?
On the other hand, VultrDriver is calling just `getPackerImage(node.getInstallType())` which takes just the first packer image. Do we have any need for different images in different regions yet?
Just use Collectors.joining("\", \""), and do not forget to add surrounding quotes as in the current implementation if those are really needed around each element.
Just use `Collectors.joining("\", \"")`, and do not forget to add surrounding quotes as in the current implementation if those are really needed around each element.
That would probably be easier. A good change to make in the future. fwiw, the leading/trailing quotes are intentionally omitted, they are present in the template.
This is so we can write valid JSON in cloudService.json, for example "something": "<<listOfValues>>", when listOfValues is replaced, we want the inner quotes but the outer ones are already there.
That would probably be easier. A good change to make in the future. fwiw, the leading/trailing quotes are intentionally omitted, they are present in the template.
This is so we can write valid JSON in `cloudService.json`, for example `"something": "<<listOfValues>>"`, when `listOfValues` is replaced, we want the inner quotes but the outer ones are already there.
Is this really allowed to return Future where List is expected???
Anyhow, this writePackerImages is used in such way that is doesn’t expect any return value, so void as return type would be more understandable.
Is this really allowed to return Future<T> where List<PackerImage> is expected???
Anyhow, this `writePackerImages` is used in such way that is doesn't expect any return value, so void as return type would be more understandable.
Compare and use newer from master: ansible/roles/algo/tasks/main.yml (master was merged after, but these ansible files were moved to other folder, so I guess you just missed this one).
Same for roles/mitmproxy/tasks/main.yml
I assume that packer/roles/... are executed only for building a new packer image, and so these backup restore changes are not needed there.
I’ll have to make a break here. Continuing tomorrow from /bubble-server/src/main/resources/ansible/roles/bubble/tasks/main.yml
- Compare and use newer from master: ansible/roles/algo/tasks/main.yml (master was merged after, but these ansible files were moved to other folder, so I guess you just missed this one).
- Same for roles/mitmproxy/tasks/main.yml
- I assume that `packer/roles/...` are executed only for building a new packer image, and so these backup restore changes are not needed there.
I'll have to make a break here. Continuing tomorrow from `/bubble-server/src/main/resources/ansible/roles/bubble/tasks/main.yml`
/bubble-server/src/main/resources/ansible/roles/bubble/tasks/main.yml restore.yml is not imported any more, but bubble_restore_monitor.sh was required for restore. Is this monitor started somewhere else now? I coudn’t find ‘restore.yml’ anywhere.
Restart dnscrypt-proxy in bubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/main.yml - how do we and do we at all build packer instances for restored bubbles? Is algo installed there before actual restoring? Note that algo installs this dnscrypto-proxy, and so this will fail if there’s no algo
Is API-Key used inside of our code fully replaced with apiKey - please check if activation works (bubble/config/activation.json line 26 still uses API-Key)
should we use tags in DigitalOcean packer builds in bubble-server/src/main/resources/models/defaults/cloudService.json also as in Vultr? Also in bubble-server/src/test/resources/models/system/cloudService.json while addin yet another tag within this one: test_instance.
bubble-server/src/main/resources/packer/roles/algo/defaults/main.yml where is this used and why adding it together with bubble-server/src/main/resources/packer/roles/algo/files/config.cfg.hbs ?
bubble-server/src/main/resources/packer/roles/bubble/files/init_bubble_db.sh lines 83 and 85 - argumenth with indexes 3 and 4 are already read into consts IS_FORK and INSTALL_MODE in lines 42-43. And with that theirs usage here is a bit strange: IS_FORK == "INIT" and INSTALL_MODE > .BUBBLE_DB_ENCRYPTION_KEY. I saw the usage - that’s cheating :) Can you please just move this new part of code above the IS_FORK and INSTALL_MODE definition just to be clear. Not sure if you would need to move used db_exists method also.
bubble-server/src/main/resources/packer/roles/bubble/tasks/main.yml - at the very end - line: shell: su - bubble bash -c install_packer.sh it would be better to use runuser -l bubble -c instead of su - bubble. And also, do we really need bash -c there - will this runuser or su - actually start a new bash, and therefore there’s no need to start yet another inside?
Do we need bubble-server/src/main/resources/ansible/roles/bubble/templates/snapshot_ansible.sh.j2 now that we have bubble-server/src/main/resources/packer/roles/finalizer/files/snapshot_ansible.sh?
Yes, bubble-server/src/main/resources/packer/roles/finalizer/tasks/main.yml kept just task Touch first-time setup file, while there no similar for case when restore_key is defined. Should then the one that is left really have condition when: restore_key is not defined? Will this restore_key be defined anytime when running packer ansible scripts? And how we will build instances when user is trying to restore a bubble - building full clean instance with full algo and mitmproxy setup, and then alter setup on it restoring the backupped setup?
bubble-server/src/main/resources/packer/roles/mitmproxy/files/supervisor_mitmproxy.conf current line command=sudo -H -u mitmproxy bash -c "/home/mitmproxy/mitmproxy/run_mitmdump.sh 8888" - should the port 8888 really be fixed here? {{ mitm_port }} is still used in bubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/route.yml:23 and it is setup in bubble-server/src/main/resources/ansible/roles/mitmproxy/files/bubble_role.json:6, and so in configuration.defaultMitmProxyPort. If fixed, then we can fix/remove all this other references.
Learning some thing would require much more time for me, so I just run through things like all that algo setup (dnscrypt setup, strongswan & charon, upgrades,... - is this all just taken from algo’s installation?) - looks ok, so I hope it works fine.
Where can we test this? I would like to run backup_and_restore test there.
And more:
DigitalOcean has available flags for regions and sizes in regions. We are not processing those. Should we? If some regios or size is not available we shouldn’t use/show it, right?
Migration is missing!
We don’t need scripts/update_role.json and bin/update_role any more, right? If so, please delete those.
The 1st Review iteration done... I’ll start with reviewing new commits now
- `/bubble-server/src/main/resources/ansible/roles/bubble/tasks/main.yml` restore.yml is not imported any more, but bubble_restore_monitor.sh was required for restore. Is this monitor started somewhere else now? I coudn't find 'restore.yml' anywhere.
- `Restart dnscrypt-proxy` in `bubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/main.yml` - how do we and do we at all build packer instances for restored bubbles? Is algo installed there before actual restoring? Note that algo installs this dnscrypto-proxy, and so this will fail if there's no algo
- Is `API-Key` used inside of our code fully replaced with `apiKey` - please check if activation works (bubble/config/activation.json line 26 still uses `API-Key`)
- should we use `tags` in DigitalOcean packer builds in `bubble-server/src/main/resources/models/defaults/cloudService.json` also as in Vultr? Also in `bubble-server/src/test/resources/models/system/cloudService.json` while addin yet another tag within this one: `test_instance`.
- `bubble-server/src/main/resources/packer/roles/algo/defaults/main.yml` where is this used and why adding it together with `bubble-server/src/main/resources/packer/roles/algo/files/config.cfg.hbs` ?
- `bubble-server/src/main/resources/packer/roles/bubble/files/init_bubble_db.sh` lines 83 and 85 - argumenth with indexes 3 and 4 are already read into consts IS_FORK and INSTALL_MODE in lines 42-43. And with that theirs usage here is a bit strange: `IS_FORK == "INIT"` and `INSTALL_MODE > .BUBBLE_DB_ENCRYPTION_KEY`. I saw the usage - that's cheating :) Can you please just move this new part of code above the IS_FORK and INSTALL_MODE definition just to be clear. Not sure if you would need to move used db_exists method also.
- `bubble-server/src/main/resources/packer/roles/bubble/tasks/main.yml` - at the very end - line: `shell: su - bubble bash -c install_packer.sh` it would be better to use `runuser -l bubble -c` instead of `su - bubble`. And also, do we really need `bash -c` there - will this `runuser` or `su -` actually start a new bash, and therefore there's no need to start yet another inside?
- Do we need `bubble-server/src/main/resources/ansible/roles/bubble/templates/snapshot_ansible.sh.j2` now that we have `bubble-server/src/main/resources/packer/roles/finalizer/files/snapshot_ansible.sh`?
- Yes, `bubble-server/src/main/resources/packer/roles/finalizer/tasks/main.yml` kept just task `Touch first-time setup file`, while there no similar for case when `restore_key is defined`. Should then the one that is left really have condition `when: restore_key is not defined`? Will this `restore_key` be defined anytime when running packer ansible scripts? And how we will build instances when user is trying to restore a bubble - building full clean instance with full algo and mitmproxy setup, and then alter setup on it restoring the backupped setup?
- `bubble-server/src/main/resources/packer/roles/mitmproxy/files/supervisor_mitmproxy.conf` current line `command=sudo -H -u mitmproxy bash -c "/home/mitmproxy/mitmproxy/run_mitmdump.sh 8888"` - should the port 8888 really be fixed here? `{{ mitm_port }}` is still used in `bubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/route.yml:23` and it is setup in `bubble-server/src/main/resources/ansible/roles/mitmproxy/files/bubble_role.json:6`, and so in `configuration.defaultMitmProxyPort`. If fixed, then we can fix/remove all this other references.
- Learning some thing would require much more time for me, so I just run through things like all that algo setup (dnscrypt setup, strongswan & charon, upgrades,... - is this all just taken from algo's installation?) - looks ok, so I hope it works fine.
- Where can we test this? I would like to run backup_and_restore test there.
And more:
- DigitalOcean has `available` flags for regions and sizes in regions. We are not processing those. Should we? If some regios or size is not available we shouldn't use/show it, right?
- Migration is missing!
- We don't need `scripts/update_role.json` and `bin/update_role` any more, right? If so, please delete those.
The 1st Review iteration done... I'll start with reviewing new commits now
It would be better to return empty colleciton from getAllPackerImages and getPackerImagesForRegion instead of null, and set those to be @NonNull, so some previous comments I added here may be applicable, and some parts of code will be simplified (no null checks).
ComputeServiceDriverBase.getPackerImage(BubbleNode node) - better name would be getOrCreatePackerImage to clearly difirentiate from simpler getPackerImage(AnsibleInstallType installType, String region) which only does get.
Also, this other method getPackerImage(AnsibleInstallType installType, String region) is used only in this class - set it to private, please.
Class AmazonEC2Driver line 121 - will this return the oldest image? What is order resulting from comparing(Image::getCreationDate) - ASC or DESC for time(stamp)?
Looks like this method getPackerImagesForRegion should be protected now.
VPC_IPV6_CIDR_BLOCK is not used - Amazon’s provided value is used instead.
Class AmazonEC2Driver line 285 - NPE for subnet coparing with == null check in the next line. And again, please try to get this default value just by using orElseGet(SupplierFunction) in line 284.
Class AmazonEC2Driver: Similarly, instead of usage of or(() -> Optional.ofNullable(inner_supplier_function)).get() in line 309 use orElseGet(inner_supplier_function) making it clear that result should be existing object. or() itself returns Optional again, and so it is not required to have the real existing object as a result of its supplier.
Class AmazonEC2Driver: AmazonEC2Driver.getSubnetsByRegion is huge method. And it look like it is easy to extract some of those parts for fetching different objects from Amazon.
As you can already see, I’m avoiding using null as much as I can - Class AmazonEC2Driver, line 319-323 - just use .orElseGet(() -> die(...))at the far end of stream command.orElseThrowwould be nicer here, butdie` has some other logging staff which is better thing here.
Class AmazonEC2Driver lines 326-331 - dead code?
Reached bubble-server/src/main/java/bubble/cloud/compute/vultr/VultrDriver.java today. Continuing on Monday...
Review of newer commit 828cdcf736
- Should we replace `__REGION__` in config template at line 5 in `bin/aws/init_aws_configs.sh` ?
- New verion available on https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-ec2 (1.11.800). Look like they are regularly updating so we should remember to update this verion again when the PR is ready.
- It would be better to return empty colleciton from getAllPackerImages and getPackerImagesForRegion instead of null, and set those to be @NonNull, so some previous comments I added here may be applicable, and some parts of code will be simplified (no null checks).
- `ComputeServiceDriverBase.getPackerImage(BubbleNode node)` - better name would be `getOrCreatePackerImage` to clearly difirentiate from simpler `getPackerImage(AnsibleInstallType installType, String region)` which only does get.
- Also, this other method `getPackerImage(AnsibleInstallType installType, String region)` is used only in this class - set it to private, please.
- Class `AmazonEC2Driver` line 121 - will this return the oldest image? What is order resulting from `comparing(Image::getCreationDate)` - ASC or DESC for time(stamp)?
- Looks like this method `getPackerImagesForRegion` should be protected now.
- `VPC_IPV6_CIDR_BLOCK` is not used - Amazon's provided value is used instead.
- Class `AmazonEC2Driver` line 285 - NPE for `subnet` coparing with `== null` check in the next line. And again, please try to get this default value just by using `orElseGet(SupplierFunction)` in line 284.
- Class `AmazonEC2Driver`: Similarly, instead of usage of `or(() -> Optional.ofNullable(inner_supplier_function)).get()` in line 309 use `orElseGet(inner_supplier_function)` making it clear that result should be existing object. `or()` itself returns `Optional` again, and so it is not required to have the real existing object as a result of its supplier.
- Class `AmazonEC2Driver`: `AmazonEC2Driver.getSubnetsByRegion` is huge method. And it look like it is easy to extract some of those parts for fetching different objects from Amazon.
- As you can already see, I'm avoiding using null as much as I can - Class `AmazonEC2Driver, line 319-323 - just use `.orElseGet(() -> die(...))` at the far end of stream command. `orElseThrow` would be nicer here, but `die` has some other logging staff which is better thing here.
- Class `AmazonEC2Driver` lines 326-331 - dead code?
Reached `bubble-server/src/main/java/bubble/cloud/compute/vultr/VultrDriver.java` today. Continuing on Monday...
Again, some backup-restore fixes were reverted - tags: algo_related was removed from Allow MITM private port task in bubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/route.yml so we should definitelly try this out with backup and restore before releasing it.
Are you updating files like bubble-server/src/main/resources/packer/roles/algo/files/config.cfg.hbs or is this a part of algo version update? Is there need to upgrade ubuntu version to 20.04 there for lightsail and openstack cloud_provider-s also?
Good thing I just run throug those charon and dnscrypt-proxy setup files at the initial code review here as they are removed now :)
CR on currently existing commits done.
No new comments on commit 828cdcf736 and no comments on commit d97c7641bc
Continuing CR with next commit 1900999670
- Again, some backup-restore fixes were reverted - `tags: algo_related` was removed from `Allow MITM private port` task in `bubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/route.yml` so we should definitelly try this out with backup and restore before releasing it.
- Are you updating files like `bubble-server/src/main/resources/packer/roles/algo/files/config.cfg.hbs` or is this a part of algo version update? Is there need to upgrade ubuntu version to 20.04 there for lightsail and openstack `cloud_provider`-s also?
- Good thing I just run throug those charon and dnscrypt-proxy setup files at the initial code review here as they are removed now :)
CR on currently existing commits done.
Awesome feedback. I’ve made a bunch of changes and will have one last commit coming, then I’ll merge.
Awesome feedback. I've made a bunch of changes and will have one last commit coming, then I'll merge.
jonathan
changed title from WIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 ) cobbzilla/introduce_packer to Introduce packer support4 years ago
@kris please review. This is a big one
This looks like a branch created out of another branch (
sfedoriv/APIAddSupportForAmazonEC2ComputeCloudService
) which is not yet merged into master.So I’ll review diff: https://git.bubblev.org/bubblev/bubble/compare/sfedoriv/APIAddSupportForAmazonEC2ComputeCloudService...cobbzilla/introduce_packer and put my comments here. I guess most of changed files here are actually either new or deleted files, so it should be easy for me to do so.
cobbzilla/introduce_packerto WIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 )cobbzilla/introduce_packer 4 years agoWIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 )cobbzilla/introduce_packerto WIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 ) cobbzilla/introduce_packer 4 years agoBreak before continuing review starting from
bubble-server/src/main/java/bubble/cloud/storage/local/LocalStorageDriver.java
I did something similar while I didn’t have root access to btest5. Only
root
has rights to dosupervisorctl
, so it would be good to add a check - ifuser
is set and it is notroot
, do not allow execution of this script withoutnorestart
option.Also, do
rm -rf /tmp/classes
afterwards, so the same script may be called with otheruser
afterwards. The one created here is owned by the currently useduser
, so if it isroot
, next call withbubble@hostname
will fail.These would be nice-to-haves, but it’s working OK for me now -- so maybe in a future PR
You’re introducing usage of this script by some other user and not default
root
within the header comment at the top of this script. And for that to be completed, this change should be done. However, this PR is already a huge one, so, I have to agree that this can fo in future PR.Same as in
bpatch
:Also, any currently existing /tmp/bubble.jar and /tmp/classes should be deleted from all existing servers once this change is applied, so these scripts will actually work for other than root user. This can be done simply by calling these 2 patch scripts as root user on all servers before using them with any other user.
These would be nice-to-haves, but it’s working OK for me now -- so maybe in a future PR
I’m not yet sure where the file came from as I cannot find it in this PR, but
grep -rn /home/bubble/current .
returns:./bubble-server/src/main/resources/packer/roles/bubble/tasks/main.yml:71: dest: /home/bubble/current
I guess, the current -> api replacement should be done in there also.
We can probably remove that. That line is creating a symlink from current -> api just in case anyone still refers to the jar or env file under
current
instead ofapi
, it will get the same file.jar
command is not available on btest5 while it is used within bpatch script. Should we addopenjdk-11-jdk-headless
here, or is it added somewhere else in this PR (I’ll try finding it, but still, I’ll leave this comment as a reminder).this script is only intended to be run in development environments, and it installs openjdk-11-jdk, so we have the jar command.
Version 1.6.0 is available on https://www.packer.io/downloads/ - any reason not to use newer version?
1.5.6 was the latest version when I started this. we can upgrade later.
This one has new version too: 1.0.9
Maybe it would be good to have these version numbers in env file so it would be easier to upgrade versions.
Yes, nice additions, for later :)
Remove. There’s no UPDATED folder any more.
good catch, I’ll remove that line.
Arrays.stream(names).anyMatch(cfgName::equals)
might work a bit faster, but I guess this is not so important here as names array should not be larger.No need to use null below. Instead:
So actually no need for
cloudRegion
variable here.Strange naming and a bit confusing. According to the code above, this returned array list actually contains (slightly altered)
configRegion
-s (line 63), and so it’s namecloudRegions
is not really correct. Maybe just call itregions
.Same as above with
ifPresentOrElse
Not using
equalsIgnoreCase
here?can cleanup later
And here it is inverted as opposed to the regions method - here the result is really made of
couldSize
entries. Is this intentionally? Or are these cloud vs. config entries really the same thing except these few fields updated in these methods?The cloud regions are (for Vultr and DigitalOcean) coming from the vendor API. The config regions come from the JSON config for the CloudService. It’s possible that the vendor has new regions that we don’t have, or that we have obsolete regions that the vendor has decommissioned.
Not using
equalsIgnoreCase
here?These are vendor keys, they will be case sensitive.
Here also, do we need equalsIgnoreCase?
According to the code, I would say that this cannot be null. One less check to do here. Just add lombok’s
@NonNull
annotation togetResources
and all Parser calsses’newResults
methods.Checking for
name
here while usingslug in the next line. I'm guessing here we should check for
slug`.good catch, will fix.
It would be good to return notSupported("...") in these 3 methods here also for now.
Add check for
bandwidth
above.yup, will fix.
According to https://www.vultr.com/api/#plans_plan_list the property’s name is just
bandwidth
.But then again https://www.vultr.com/api/#plans_plan_list_baremetal API call has
bandwidth_tb
.I’m just guessing that we should use
bandwidth
here, and that it is given in TB again.Why change in method’s name here? Similar method in DigitalOcean’s class was named just
getResources
.BTW, I link this one
loadCloudResources
name better as it is not a single getter.Here also you may use ifPresentOrElse method, and remove
snapshot
This method is also @NonNull.
And here then loadCloudResources is @NonNull, so no need for check - just return.
And additional not needed null check (even with the current code)... No need for this one.
I had to leave. Continuing tomorrow from
bubble-server/src/main/java/bubble/service/cloud/NodeProgressMeter.java
Autowiring class of the same type (Services). And this instance is used only once, while
configuration
is available. Please use configuration.getBean instead.log messages’ prefixes are wrong in this method.
thanks, will fix.
Completely unrelated to ROLE_PATH, this line was returning false in case if just
activated()
returned true. Was this a bug, or should we leave just thisactivated()
check here? Same question in remove line 117 right after this change.Also, this
activated
method is used only in this class, and so can be changed to private.We don’t need the check anymore since roles are no longer stored in localstorage. Can probably remove
activated
everywhere in this class, but will wait on that for now.This method is not used outside of this class. Any reason why changing it to protected?
iirc I got some kind of spring error keeping it as
private
and had to change it. some kind of reflection problem i guess.If ansible scripts are the same, than node started as restore should executed only sage roles initially (or at least some tasks from algo and mitm should be excluded). The remaining two roles should be executed after restore call run on that restored instance bubble API’s.
Changes I recently made to automation files should be applied to bubble-server/src/main/resources/ansible/roles files for restoring to work properly as it is implemented now.
Other way would be to use packer for full backup and restore process instead of one we have right now. Should we do that - it would take more space, right?
It’s an interesting idea to use packer for backup/restore, but (1) it would take a LOT more space and (2) what if you wanted to restore to a different cloud? there’s no way to restore a Vultr image to a DigitalOcean droplet, for example. I think the existing way of backing up/restoring through a
CloudService
of typestorage
is nice because it is cloud-independent.Autowiring class of same type (Service). Please use configuration.getBean instead.
Wrong log message’s prefix here. There no method name as prefix in log messages elsewhere in this method AFAICS
yup, removed the prefix. it’s all packer.
Reach the gitea’s limit. Following comments will be as plain comments within the PR.
I couldn’t find out why is this needed. And does this influences memory usage anyhow? It’s currently happening only within newNode method, but it is not anyhow limited for usage in other places in the future (i.e. with a JavaDoc comment which will discourage usage of this).
In
StandardNetworkService
we re-enabled retries if the SSH command fails due to SSH not yet being available. In this case, the progressMeter was being closed, and no further writes were allowed. Thus, if the second attempt to run ansible succeeded, it would run but you would not see any output and the progress meter would be stuck.Only 2 jobs are submitted into this pool. I guess we can set 2 here also.
Used only in one place where you can get it from already available
configuration
.Yeah it’s faster to pass it because the caller already has it. Getting it from configuration requires a spring bean lookup. Probably cached, but still.
Never used.
good catch, will remove
All constants here except this one are used only within this class. It would be good to show that by setting them to be private.
It would be good to have all the var names here as constants - I was looking for
roles
as I find that one in json.hbs, but there are some others used in this class.As this is actually using
ctx
to build a value which will be set as ENV, should this env.put part go further back in the code - after the lastctx.put
call and before usage ofenv
in command execution (line 221).As ctx is not really read and used in such way anywhere in between, it would be safe to move this complete
// set environment variables
part there.this was some complex stuff to get right and it’s working. we can revisit later.
What if within these images we find one that is for all regions (
image.getRegions() == null
as checked above in line 155). Should be break there as use that one?If so, maybe a better check in line 155 would be
Collect this list above the log line so mapping here will not be done twice.
Can you call
.collect()
on aStream
more than once? I wasn’t sure.I guess I thought of just calling it up to
map
. Even if this collecting is done, and then we do String.join in the log, then that String.join is executed without need.But then again, collect has to be done twice, and I’m not sure is this optimized in Java streams properly.
Anyhow, the list should be short, right, so you can just skip this. Sorry for this one.
'"' + String.join("\", \"", imagesToCreate)
+ ‘"‘` instead of creating a new method for this.So this will actually build packer image for each and every region on the first call. I hope that packer will create those images in parallel so there will be no significant delay.
On the other hand, VultrDriver is calling just
getPackerImage(node.getInstallType())
which takes just the first packer image. Do we have any need for different images in different regions yet?Just use
Collectors.joining("\", \"")
, and do not forget to add surrounding quotes as in the current implementation if those are really needed around each element.That would probably be easier. A good change to make in the future. fwiw, the leading/trailing quotes are intentionally omitted, they are present in the template.
This is so we can write valid JSON in
cloudService.json
, for example"something": "<<listOfValues>>"
, whenlistOfValues
is replaced, we want the inner quotes but the outer ones are already there.Not needed as per last 2 comments here.
Is this really allowed to return Future where List is expected???
Anyhow, this
writePackerImages
is used in such way that is doesn’t expect any return value, so void as return type would be more understandable.It’s returning a Future where a Future is expected. The Future is a Future.
Probably could declare this void.
packer/roles/...
are executed only for building a new packer image, and so these backup restore changes are not needed there.I’ll have to make a break here. Continuing tomorrow from
/bubble-server/src/main/resources/ansible/roles/bubble/tasks/main.yml
/bubble-server/src/main/resources/ansible/roles/bubble/tasks/main.yml
restore.yml is not imported any more, but bubble_restore_monitor.sh was required for restore. Is this monitor started somewhere else now? I coudn’t find ‘restore.yml’ anywhere.Restart dnscrypt-proxy
inbubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/main.yml
- how do we and do we at all build packer instances for restored bubbles? Is algo installed there before actual restoring? Note that algo installs this dnscrypto-proxy, and so this will fail if there’s no algoIs
API-Key
used inside of our code fully replaced withapiKey
- please check if activation works (bubble/config/activation.json line 26 still usesAPI-Key
)should we use
tags
in DigitalOcean packer builds inbubble-server/src/main/resources/models/defaults/cloudService.json
also as in Vultr? Also inbubble-server/src/test/resources/models/system/cloudService.json
while addin yet another tag within this one:test_instance
.bubble-server/src/main/resources/packer/roles/algo/defaults/main.yml
where is this used and why adding it together withbubble-server/src/main/resources/packer/roles/algo/files/config.cfg.hbs
?bubble-server/src/main/resources/packer/roles/bubble/files/init_bubble_db.sh
lines 83 and 85 - argumenth with indexes 3 and 4 are already read into consts IS_FORK and INSTALL_MODE in lines 42-43. And with that theirs usage here is a bit strange:IS_FORK == "INIT"
andINSTALL_MODE > .BUBBLE_DB_ENCRYPTION_KEY
. I saw the usage - that’s cheating :) Can you please just move this new part of code above the IS_FORK and INSTALL_MODE definition just to be clear. Not sure if you would need to move used db_exists method also.bubble-server/src/main/resources/packer/roles/bubble/tasks/main.yml
- at the very end - line:shell: su - bubble bash -c install_packer.sh
it would be better to userunuser -l bubble -c
instead ofsu - bubble
. And also, do we really needbash -c
there - will thisrunuser
orsu -
actually start a new bash, and therefore there’s no need to start yet another inside?Do we need
bubble-server/src/main/resources/ansible/roles/bubble/templates/snapshot_ansible.sh.j2
now that we havebubble-server/src/main/resources/packer/roles/finalizer/files/snapshot_ansible.sh
?Yes,
bubble-server/src/main/resources/packer/roles/finalizer/tasks/main.yml
kept just taskTouch first-time setup file
, while there no similar for case whenrestore_key is defined
. Should then the one that is left really have conditionwhen: restore_key is not defined
? Will thisrestore_key
be defined anytime when running packer ansible scripts? And how we will build instances when user is trying to restore a bubble - building full clean instance with full algo and mitmproxy setup, and then alter setup on it restoring the backupped setup?bubble-server/src/main/resources/packer/roles/mitmproxy/files/supervisor_mitmproxy.conf
current linecommand=sudo -H -u mitmproxy bash -c "/home/mitmproxy/mitmproxy/run_mitmdump.sh 8888"
- should the port 8888 really be fixed here?{{ mitm_port }}
is still used inbubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/route.yml:23
and it is setup inbubble-server/src/main/resources/ansible/roles/mitmproxy/files/bubble_role.json:6
, and so inconfiguration.defaultMitmProxyPort
. If fixed, then we can fix/remove all this other references.Learning some thing would require much more time for me, so I just run through things like all that algo setup (dnscrypt setup, strongswan & charon, upgrades,... - is this all just taken from algo’s installation?) - looks ok, so I hope it works fine.
Where can we test this? I would like to run backup_and_restore test there.
And more:
available
flags for regions and sizes in regions. We are not processing those. Should we? If some regios or size is not available we shouldn’t use/show it, right?scripts/update_role.json
andbin/update_role
any more, right? If so, please delete those.The 1st Review iteration done... I’ll start with reviewing new commits now
Review of newer commit
828cdcf736
__REGION__
in config template at line 5 inbin/aws/init_aws_configs.sh
?ComputeServiceDriverBase.getPackerImage(BubbleNode node)
- better name would begetOrCreatePackerImage
to clearly difirentiate from simplergetPackerImage(AnsibleInstallType installType, String region)
which only does get.getPackerImage(AnsibleInstallType installType, String region)
is used only in this class - set it to private, please.AmazonEC2Driver
line 121 - will this return the oldest image? What is order resulting fromcomparing(Image::getCreationDate)
- ASC or DESC for time(stamp)?getPackerImagesForRegion
should be protected now.VPC_IPV6_CIDR_BLOCK
is not used - Amazon’s provided value is used instead.AmazonEC2Driver
line 285 - NPE forsubnet
coparing with== null
check in the next line. And again, please try to get this default value just by usingorElseGet(SupplierFunction)
in line 284.AmazonEC2Driver
: Similarly, instead of usage ofor(() -> Optional.ofNullable(inner_supplier_function)).get()
in line 309 useorElseGet(inner_supplier_function)
making it clear that result should be existing object.or()
itself returnsOptional
again, and so it is not required to have the real existing object as a result of its supplier.AmazonEC2Driver
:AmazonEC2Driver.getSubnetsByRegion
is huge method. And it look like it is easy to extract some of those parts for fetching different objects from Amazon.AmazonEC2Driver, line 319-323 - just use
.orElseGet(() -> die(...))at the far end of stream command.
orElseThrowwould be nicer here, but
die` has some other logging staff which is better thing here.AmazonEC2Driver
lines 326-331 - dead code?Reached
bubble-server/src/main/java/bubble/cloud/compute/vultr/VultrDriver.java
today. Continuing on Monday...No new comments on commit
828cdcf736
and no comments on commitd97c7641bc
Continuing CR with next commit
1900999670
tags: algo_related
was removed fromAllow MITM private port
task inbubble-server/src/main/resources/ansible/roles/mitmproxy/tasks/route.yml
so we should definitelly try this out with backup and restore before releasing it.bubble-server/src/main/resources/packer/roles/algo/files/config.cfg.hbs
or is this a part of algo version update? Is there need to upgrade ubuntu version to 20.04 there for lightsail and openstackcloud_provider
-s also?CR on currently existing commits done.
Awesome feedback. I’ve made a bunch of changes and will have one last commit coming, then I’ll merge.
WIP: (blocked by https://git.bubblev.org/bubblev/bubble/pulls/4 ) cobbzilla/introduce_packerto Introduce packer support 4 years agoReviewers
6bfd3be018
.