之前工作流的运行都是用的docker-java提供的api拉起的docker容器直接跑服务,但是最新线上的新业务资源消耗较大,单个容器如果不加控制,CPU和内存都会拉满,导致服务器莫名宕机事故的发生,所以Docker限制cpu使用率和内存限制就得安排上
实施 HostConfig构建自定义HostConfig,设置cpu和内存限制,pipeline配置了就按照配置来,如果没有就走默认配置
public void setUp() { this.dockerHostConfig = new HostConfig(); Double memoryValue = this.pipeline.getMemory() != null ? this.pipeline.getMemory() * 1024 * 1024 * 1024 : this.config.getDefaultMemoryLimitInGb() * 1024 * 1024 * 1024; this.dockerHostConfig.withMemory(memoryValue.longValue()); double cpu = StringUtils.isNotBlank(this.pipeline.getCpu()) ? Double.parseDouble(this.pipeline.getCpu()) : this.config.getDefaultCpuCoreLimit(); // 单个 CPU 为 1024,两个为 2048,以此类推 this.dockerHostConfig.withCpuShares((int)(cpu * 1024)); } CreateContainerCmd 构建 public String startContainer(String image, String name, List<ContainerPortBind> portBinds, List<ContainerVolumeBind> volumeBinds, List<String> extraHosts, List<String> envs, List<String> entrypoints, HostConfig hostConfig, String... cmds) { List<Volume> volumes = new ArrayList<>(); List<Bind> volumesBinds = new ArrayList<>(); …… …… …… CreateContainerCmd cmd = this.client.createContainerCmd(image) .withName(name) .withVolumes(volumes) .withBinds(volumesBinds); if (portBinds != null && portBinds.size() > 0) { cmd = cmd.withPortBindings(portBindings); } if (cmds != null && cmds.length > 0) { cmd = cmd.withCmd(cmds); } if (extraHosts != null && extraHosts.size() > 0) { cmd.withExtraHosts(extraHosts); } if (envs != null) { cmd.withEnv(envs); } if (entrypoints != null) { cmd.withEntrypoint(entrypoints); } // 这一句是重点 cmd.withHostConfig(hostConfig); CreateContainerResponse container = cmd.exec(); this.client.startContainerCmd(container.getId()).exec(); return container.getId(); } docker inspect containerId执行 docker inspect a436678ccb0c 结果如下
"HostConfig": { "Binds": [], "ContainerIDFile": "", "LogConfig": { "Type": "json-file", "Config": { "max-file": "3", "max-size": "10m" } }, "NetworkMode": "default", "PortBindings": null, "RestartPolicy": { "Name": "", "MaximumRetryCount": 0 } "CpuShares": 2048, "Memory": 6442450944, "NanoCpus": 0, "CgroupParent": "", "BlkioWeight": 0, "BlkioWeightDevice": null }CpuShares和Memory已经是我们设置的默认值,API生效,我们再来看下执行的日志
proc "pipeline_task_4b86c7830e4c4e39a77c454589c9e7e9_1" starting 2021-09-22 17:30:15 logPath:/mnt/xx/xx/logs/2021/09/22/bfbadf65-ac41-459d-a96d-3dc9a0105c25/job.log + java -jar /datavolume/xxx/xx.jar --spring.profiles.active=test STDERR: Error: Unable to access jarfile /datavolume/xxx/xx.jar 5c494aeacb87af3a46a4fedc6e695ae888d4d2b9d7e603f24ef7fe114956c782 finished! proc "pipeline_task_4b86c7830e4c4e39a77c454589c9e7e9_1" exited with status 1 proc "新增节点" error start to kill all pipeline task pipeline exit with error执行文件没有找到,向上看Binds为空,所以挂载丢了,可以为什么了?明明 withVolumes() 和 withBinds() 两个方法逻辑都没有动,还是看下源码分析一下吧
问题定位与解决看源码之前我们先了解一下docker的hostConfig,文件路径在:/var/lib/docker/containers/{container-id}/hostconfig.json
其实这个就是容器运行的宿主机配置,磁盘绑定,cpu、内存限制、DNS、网络以及种种配置都在这个文件中,docker-java中HostConfig对象其实就是这个json对应的model,我们自定义了HostConfig对象,问题应当是出在 cmd.withHostConfig(hostConfig); 这一句代码上
以前的绑定逻辑
之前没有限制,所以在实例化CreateContainerCmd时候没有定制HostConfig参数
CreateContainerCmd withBinds
/** * * @deprecated see {@link #getHostConfig()} */ @Deprecated default CreateContainerCmd withBinds(Bind... binds) { Objects.requireNonNull(binds, "binds was not specified"); getHostConfig().setBinds(binds); return this; }getHostConfig() 方法追溯到实现类 CreateContainerCmdImpl hostConfig是直接在类实例化的时候new出来的一个新对象
@JsonProperty("HostConfig") private HostConfig hostConfig = new HostConfig();我们再看下 CreateContainerCmd 的 withHostConfig() 方法,代码也是在实现类里面
@Override public CreateContainerCmd withHostConfig(HostConfig hostConfig) { this.hostConfig = hostConfig; return this; }直接覆盖了对象中原来的hostConfig, 我们的withHostConfig又在最后调用的可不就把挂载丢了吗,正好CreateContainerCmd 的 withBinds 方法也被 @Deprecated 修饰了,我们就来调整一下代码
public String startContainer(String image, String name, List<ContainerPortBind> portBinds, List<ContainerVolumeBind> volumeBinds, List<String> extraHosts, List<String> envs, List<String> entrypoints, HostConfig hostConfig, String... cmds) { List<Volume> volumes = new ArrayList<>(); List<Bind> volumesBinds = new ArrayList<>(); …… //这一行很关键 hostConfig.withBinds(volumesBinds); if (portBinds != null && portBinds.size() > 0) { hostConfig.withPortBindings(portBindings); } if (extraHosts != null && extraHosts.size() > 0) { hostConfig.withExtraHosts(extraHosts.toArray(new String[extraHosts.size()])); } CreateContainerCmd cmd = this.client.createContainerCmd(image).withHostConfig(hostConfig) .withName(name) .withVolumes(volumes); if (cmds != null && cmds.length > 0) { cmd = cmd.withCmd(cmds); } if (envs != null) { cmd.withEnv(envs); } if (entrypoints != null) { cmd.withEntrypoint(entrypoints); } CreateContainerResponse container = cmd.exec(); this.client.startContainerCmd(container.getId()).exec(); return container.getId(); };OK,搞定,docker stats 查看容器的cpu占用,始终不会超过200%
参考链接https://github.com/docker-java/docker-java