emmmmmm,最近有一套老系统在重构,目前项目运行环境为jdk1.7+resin-4.0.47+activemq+redis+mysql,嗯,就这些,现在准备要上测试服了,服务器买了一堆,现在需要用的基础服务包括但不仅限于elasticsearch、logstash、kibana、filebeat、zookeeper、activemq、mongodb、redis、MySQL,现在不需要用tomcat||resin了,项目用的Spring Boot,直接运行jar包就好,我不是很懂,测试环境从不加监控,所以zabbix暂时也放一边,现在需要把运行环境搭建一下,所以,直接上Docker Swarm了。
准备工作
目前docker-ce已经全部安装完毕,使用ansible进行的批量安装,这里就不多提了,整理了一下,需要用到的基础服务如下,elasticsearch集群、elasticsearch-head、logstash、kibana、kafka集群、zookeeper集群、activemq、mongodb、redis集群、MySQL。
测试服随意一点,以方便快捷为主,所以全部放到容器,现在启用swarm,把做基础服务的服务器加到集群里。
启动swarm
[root@docker-manager ~]# docker swarm init --advertise-addr 172.24.90.38
Swarm initialized: current node (bh950ji26br60or076cmwvmu3) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.加入集群
[root@docker-manager ~]# docker swarm join-token worker
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377
[root@docker-manager ~]# ansible service -m shell -a "docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377"
现在已经全部加进来了,顺便提一嘴,现在swarm没做高可用,暂时只有一个manager,建议生产环境的话最低三个,也就是一个Leader,两个Reachable,下面开始创建服务。
创建MySQL服务
配置文件
我都是用yml文件去创建的服务,也就是stack,感觉这些服务写配置文件比较好,K8S暂时玩不利索,所以暂时用swarm了,还有就是我的服务约束是用主机名指定的,不太建议这样做,如果被指定的服务器宕掉了就会抛no suitable node的错,会一直等待被指定的服务器恢复,单机的服务没啥子办法,服务多的服务器话最好是用标签,自己看着办吧,MySQL的话就比较简单了,如下。
[root@docker-manager /swarm/mysql]# cat mysql.yml
version: '3.7'
services:
mysql:
image: registry.cn-beijing.aliyuncs.com/rj-bai/mysql:5.7
hostname: mysql
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == mysql]
ports:
- 3306:3306
environment:
MYSQL_ROOT_PASSWORD: passwd
volumes:
- /data/mysql:/var/lib/mysql
networks:
- recharge
networks:
recharge:
external: true
name: recharge新建网络
我指定使用了一个名为recharge的网络,现在还没有,所以要创建一下,创建后就可以启动服务了
[root@docker-manager ~]# docker network create --driver overlay --subnet 13.14.15.0/24 --ip-range 13.14.15.0/24 --gateway 13.14.15.1 recharge
jnptx5xp3jhmcn8uw4owrzqv9创建服务
记得创建数据目录撒
[root@docker-manager /swarm/mysql]# ansible mysql -m file -a "path=/data/mysql state=directory"
[root@docker-manager /swarm/mysql]# docker stack deploy -c mysql.yml --with-registry-auth mysql
[root@docker-manager /swarm/mysql]# docker stack ps mysql
验证的话只要看一下数据目录有没有东西就行了,有东西就说明正常启动了。
[root@docker-manager /swarm/mysql]# ansible mysql -m shell -a "ls /data/mysql/"
没有问题的撒,过,下面搞activemq
创建activemq服务
先贴一下Dockerfile吧
Dockerfile
FROM webcenter/activemq
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone && \
sed -i '/1G/d' /opt/activemq/bin/env那个sed删掉的是脚本内一个名为ACTIVEMQ_OPTS_MEMORY的变量,是用来定义mq启动内存的,默认1G,现已加入全局变量,请自行调整。
配置文件
[root@docker-manager /swarm/activemq]# cat activemq.yml
version: '3.7'
services:
activemq:
image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3
hostname: activemq
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == activemq]
ports:
- 8161:8161
- 61616:61616
environment:
ACTIVEMQ_OPTS_MEMORY: -Xms2048M -Xmx2048M
networks:
- recharge
networks:
recharge:
external: true
name: recharge创建服务
[root@docker-manager /swarm/activemq]# docker stack deploy -c activemq.yml --with-registry-auth activemq
[root@docker-manager /swarm/activemq]# docker stack ps activemq
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
jcmhuzekzm75 activemq_activemq.1 registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3 activemq Running Preparing 8 seconds ago 这个怎么确认是否创建成功,访问一下8161端口就知道了,这个是web管理端口
[root@docker-manager /swarm/activemq]# curl -I -u admin:admin 127.0.0.1:8161
HTTP/1.1 200 OK
Date: Mon, 21 Jan 2019 08:35:41 GMT
X-FRAME-OPTIONS: SAMEORIGIN
Content-Type: text/html
Content-Length: 6047
Server: Jetty(9.2.13.v20150730)木有问题撒,过,下面是mongodb
创建mongodb服务
这个我用到了mongo&mongo-express,mongo-express为mongodb的web管理工具,功能类似phpMyAdmin,选装,Dockerfile如下。
Dockerfile
mongo
FROM mongo
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezonemongo-express
FROM mongo-express
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone 配置文件
mongo 如下
[root@docker-manager /swarm/mongodb]# cat mongo.yml
version: '3.7'
services:
mongo:
image: registry.cn-beijing.aliyuncs.com/rj-bai/mongodb:4.0.5
hostname: mongodb
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == mongodb]
# environment:
# MONGO_INITDB_ROOT_USERNAME: root
# MONGO_INITDB_ROOT_PASSWORD: passwd
ports:
- 27017:27017
volumes:
- /data/mongodb:/data/db
networks:
- recharge
networks:
recharge:
external: true
name: recharge被我注释掉的是定义登陆mongodb的用户名和密码,我问了一下开发人员说不用密码就好,所以我就没加,既然没加密码,安全限制不用说了,自己去做吧,下面是mongo-express
mongo-express如下
[root@docker-manager /swarm/mongodb]# cat mongo-express.yml
version: '3.7'
services:
mongo-express:
image: registry.cn-beijing.aliyuncs.com/rj-bai/mongodb-express:0.12.0
hostname: mongodb-express
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == mongodb]
ports:
- 8081:8081
environment:
# ME_CONFIG_MONGODB_ADMINUSERNAME: root
# ME_CONFIG_MONGODB_ADMINPASSWORD: passwd
ME_CONFIG_BASICAUTH_USERNAME: admin
ME_CONFIG_BASICAUTH_PASSWORD: Sowhat?
networks:
- recharge
networks:
recharge:
external: true
name: recharge
至于这两个服务我为什么要拆开,因为暂时stack暂时不支持定义服务的启动顺序,正常来说是先启动mongodb,再启动mongo-express,如果先启动mongo-express第一次会启动失败,等mongo起了之后就没问题了,大概就这样,所以我分开了。
创建服务
[root@docker-manager /swarm/mongodb]# ansible mongodb -m file -a "path=/data/mongodb state=directory"
[root@docker-manager /swarm/mongodb]# docker stack deploy -c mongo.yml --with-registry-auth mongo
[root@docker-manager /swarm/mongodb]# docker stack deploy -c mongo-express.yml --with-registry-auth mongo-express
打开管理页面,看一下是否能连接到mongodb,常规操作一下。
没问题,这个页面的时间不太清楚是在哪里取的值,服务器和容器的时间都没问题,所以不管他,过,下面redis集群。
创建redis集群
这个镜像是我自己手动做的,用的最新稳定版本,配置文件是直接传到了镜像里,也开启了持久化,大概是这样。
Dockerfile
先贴一下Dockerfile
FROM registry.cn-beijing.aliyuncs.com/rj-bai/centos:7.5
RUN yum -y install wget make gcc && yum clean all && \
wget http://download.redis.io/releases/redis-5.0.3.tar.gz && tar zxf redis-5.0.3.tar.gz && rm -f redis-5.0.3.tar.gz && \
cd redis-5.0.3/ && make && make install
COPY start.sh /
COPY redis.conf /
CMD ["/bin/bash", "/start.sh"]start.sh文件内容
#!/bin/bash
if [ -n "$DIR" ];
then
sed -i 's#DIR ./#DIR '$DIR'#g' /redis.conf
fi
if [ ! -n "$REDIS_PORT" ];
then
redis-server /redis.conf
else
sed -i 's#6379#'$REDIS_PORT'#g' /redis.conf && redis-server /redis.conf
firedis.conf主要内容
port 6379
save 900 1
save 300 10
save 60 10000
dbfilename "dump.rdb"
dir ./用脚本替换了两个值,一个是端口一个是数据存储目录,大概就这样。
配置文件
[root@docker-manager /swarm/redis]# cat redis.yml
version: '3.7'
services:
redis1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis1
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-1]
environment:
DIR: /data/7000
REDIS_PORT: 7000
volumes:
- /data/7000:/data/7000
networks:
- host
redis2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis2
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-1]
environment:
DIR: /data/7001
REDIS_PORT: 7001
volumes:
- /data/7001:/data/7001
networks:
- host
redis3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis3
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-2]
environment:
DIR: /data/7002
REDIS_PORT: 7002
volumes:
- /data/7002:/data/7002
networks:
- host
redis4:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-2]
environment:
DIR: /data/7003
REDIS_PORT: 7003
volumes:
- /data/7003:/data/7003
networks:
- host
redis5:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis5
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-3]
environment:
DIR: /data/7004
REDIS_PORT: 7004
volumes:
- /data/7004:/data/7004
networks:
- host
redis6:
image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
hostname: redis6
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == redis-3]
environment:
DIR: /data/7005
REDIS_PORT: 7005
volumes:
- /data/7005:/data/7005
networks:
- host
networks:
host:
external: true
name: host我用的是网络是host,开始创建服务。
创建服务
[root@docker-manager /swarm/redis]# docker stack deploy -c redis.yml --with-registry-auth redis
[root@docker-manager /swarm/redis]# docker stack ps redis
这还不算完事,现在只是把redis启动了,还没有做集群,还得这样。
[root@docker-manager ~]# docker run --rm -it inem0o/redis-trib create --replicas 1 172.24.89.242:7000 172.24.89.242:7001 172.24.89.241:7002 172.24.89.241:7003 172.24.89.237:7004 172.24.89.237:7005执行后需要输入一个yes,看到这个提示就算是成功了。
登录到容器确认一下。
[root@docker-manager /swarm/redis]# ssh redis-1
Last login: Mon Jan 21 18:39:54 2019 from 172.24.90.38
[root@redis-1 ~]# docker exec -it redis_redis1.1.lhe09fkmcs2j5mnvj3ow7uo14 /bin/bash
[root@redis1 /]# redis-cli -c -p 7000
127.0.0.1:7000> cluster nodes
28aa26332cd2799fe7b615865fa0259b9154299a 172.24.89.241:7003@17003 slave 34f738b7eff022be6eeb5c4cceafd52a935b1fc6 0 1548067394712 4 connected
3a0ad1bc61b8e765e7548cc87db6b3bfe9a7f60f 172.24.89.237:7004@17004 master - 0 1548067394211 5 connected 10923-16383
09db8b01f9f9d0b38152af82a8c38fdf85e1a9b3 172.24.89.242:7001@17001 slave 37b6dec48b97e816431d7f2cbe71489c3afdc508 0 1548067395000 3 connected
34f738b7eff022be6eeb5c4cceafd52a935b1fc6 172.24.89.242:7000@17000 myself,master - 0 1548067394000 1 connected 0-5460
a7c5314086b0a7d57d09861da8af31c514f8a167 172.24.89.237:7005@17005 slave 3a0ad1bc61b8e765e7548cc87db6b3bfe9a7f60f 0 1548067396217 6 connected
37b6dec48b97e816431d7f2cbe71489c3afdc508 172.24.89.241:7002@17002 master - 0 1548067395214 3 connected 5461-10922
127.0.0.1:7000> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:1113
cluster_stats_messages_pong_sent:1068
cluster_stats_messages_sent:2181
cluster_stats_messages_ping_received:1063
cluster_stats_messages_pong_received:1113
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:2181没有问题撒,大概就是这样,过,下面是zookeeper集群
创建zookeeper集群
Dockerfile
基于官方镜像及文档,使用Dockerfile重新构建了一下镜像,Dockerfile如下
FROM zookeeper:latest
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone配置文件
[root@docker-manager /swarm/zookeeper]# cat zookeeper.yml
version: '3.7'
services:
zoo1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: zoo1
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == zookeeper-1]
ports:
- 2181:2181
environment:
ZOO_MY_ID: 1
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
networks:
- recharge
zoo2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: zoo2
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == zookeeper-2]
ports:
- 2182:2181
environment:
ZOO_MY_ID: 2
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
networks:
- recharge
zoo3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: zoo3
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [ node.hostname == zookeeper-3]
ports:
- 2183:2181
environment:
ZOO_MY_ID: 3
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
networks:
- recharge
networks:
recharge:
external: true
name: recharge创建服务
[root@docker-manager /swarm/zookeeper]# docker stack deploy -c zookeeper.yml --with-registry-auth zookeeper
[root@docker-manager /swarm/zookeeper]# docker stack ps zookeeper
去看一下是否成功了,正常来说一个leader两个follower就对了,这种效果撒
哦了撒,没问题,这个zookeeper主要是项目用的注册中心,下一个,kafka集群。
创建kafka集群
kafka依赖zookeeper,之前的那个zookeeper是项目用的,所以现在再启动一套供kafka使用,kafka的Dockerfile如下
Dockerfile
FROM wurstmeister/kafka
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezonezookeeper的构建方式还是以前那样
配置文件
zookeeper和kafka的配置文件我还拆出来了,原因是kafka依赖zookeeper,得先启动zookeeper,再启动kafka,配置文件如下。
zookeeper
[root@docker-manager /swarm/kafka]# cat kafka-zookeeper.yml
version: '3.7'
services:
kaf-zoo1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: kaf-zoo1
deploy:
replicas: 1
placement:
constraints: [node.hostname == kafka-1]
environment:
ZOO_MY_ID: 1
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
networks:
- recharge
kaf-zoo2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: kaf-zoo2
deploy:
replicas: 1
placement:
constraints: [node.hostname == kafka-2]
environment:
ZOO_MY_ID: 2
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
networks:
- recharge
kaf-zoo3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
hostname: kaf-zoo3
deploy:
replicas: 1
placement:
constraints: [node.hostname == kafka-3]
environment:
ZOO_MY_ID: 3
JVMFLAGS: -Xms1024m -Xmx1024m
ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
networks:
- recharge
networks:
recharge:
external: true
name: recharge
kafka
[root@docker-manager /swarm/kafka]# cat kafka.yml
version: '3.7'
services:
kafka1:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
hostname: kafka1
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == kafka-1]
ports:
- 9092:9092
environment:
KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
KAFKA_ADVERTISED_HOST_NAME: kafka1
KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
KAFKA_BROKER_ID: 1
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- recharge
kafka2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
hostname: kafka2
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == kafka-2]
ports:
- 9093:9092
environment:
KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
KAFKA_ADVERTISED_HOST_NAME: kafka2
KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
KAFKA_BROKER_ID: 2
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- recharge
kafka3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
hostname: kafka3
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == kafka-3]
ports:
- 9094:9092
environment:
KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
KAFKA_ADVERTISED_HOST_NAME: kafka3
KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
KAFKA_BROKER_ID: 3
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- recharge
networks:
recharge:
external: true
name: recharge
创建服务
先创建zookeeper,然后再kafka
[root@docker-manager /swarm/kafka]# docker stack deploy -c kafka-zookeeper.yml --with-registry-auth kafka-zookeeper
[root@docker-manager /swarm/kafka]# docker stack deploy -c kafka.yml --with-registry-auth kafka
是否正常启动这个是否正常启动看kafka日志就行了。
[root@docker-manager ~]# docker service logs 服务名自己看一下吧,没有抛错就没问题撒,下面创建elasticsearch 集群。
创建elasticsearch集群
这里先简单的提一下,我们这里用的es*3,logstash*1,kibana*1,es-head*1,kafka集群,需要kafka的原因是项目日志不是通过filebeat或logstash推到es里面的,而是log4j推到了kafka,logstash的input为kafka,output为es集群,说白了就是项目把日志推到了kafka里面,kafka推到了logstash,最后logstash再推到es集群,之前就做过,都是传统方式二进制安装,太麻烦就没写文档,大概就是这样,首先一步步实现吧,先把es集群做了,先贴一下Dockerfile
Dockerfile
先贴一下默认启用的插件吧,有别的需要请使用Dockerfile自行安装或卸载,使用6.5.4版本,默认包含 X-Pack插件。
[root@695653b6515c elasticsearch]# elasticsearch-plugin list
ingest-geoip
ingest-user-agent上述两个插件我会用到,别的就用不到了,所以这两个就够了,下面是Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch:6.5.4
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone配置文件
我使用了三个es组成了集群,所以配置文件如下,服务名尽量不要改撒,和kibana有关联的。
[root@docker-manager /swarm/elasticsearch]# cat elasticsearch.yml
version: '3.7'
services:
elasticsearch:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == elasticsearch-1]
environment:
- cluster.name=es
- node.name=es-1
- http.cors.enabled=true
- http.cors.allow-origin=*
- discovery.zen.minimum_master_nodes=2
- "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
- "ES_JAVA_OPTS=-Xms2G -Xmx2G"
volumes:
- /elasticsearch:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- recharge
elasticsearch2:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == elasticsearch-2]
environment:
- cluster.name=es
- node.name=es-2
- http.cors.enabled=true
- http.cors.allow-origin=*
- discovery.zen.minimum_master_nodes=2
- "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
- "ES_JAVA_OPTS=-Xms2G -Xmx2G"
volumes:
- /elasticsearch:/usr/share/elasticsearch/data
ports:
- 9201:9200
networks:
- recharge
elasticsearch3:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == elasticsearch-3]
environment:
- cluster.name=es
- node.name=es-3
- http.cors.enabled=true
- http.cors.allow-origin=*
- discovery.zen.minimum_master_nodes=2
- "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
- "ES_JAVA_OPTS=-Xms2G -Xmx2G"
volumes:
- /elasticsearch:/usr/share/elasticsearch/data
ports:
- 9202:9200
networks:
- recharge
networks:
recharge:
external: true
name: recharge就这样,接下来需要创建目录和改一些参数。
创建服务
[root@docker-manager ~/sh]# cat es.sh
#!/bin/bash
cat >>/etc/security/limits.conf<<OEF
* soft nofile 65536
* hard nofile 65536
* soft nproc 2048
* hard nproc 4096
OEF
cat >>/etc/sysctl.conf<<OEF
vm.max_map_count=655360
fs.file-max=655360
OEF
/usr/sbin/sysctl -p
[root@docker-manager ~/sh]# ansible elasticsearch -m script -a "/root/sh/es.sh"
[root@docker-manager ~/sh]# ansible elasticsearch -m file -a "path=/elasticsearch state=directory owner=1000 group=1000 mode=755"
[root@docker-manager /swarm/elasticsearch]# docker stack deploy -c elasticsearch.yml --with-registry-auth elasticsearch
[root@docker-manager /swarm/elasticsearch]# docker stack ps elasticsearch
看一下集群状态撒
[root@docker-manager ~]# curl http://127.0.0.1:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1548133528 05:05:28 es green 3 3 0 0 0 0 0 0 - 100.0%木有问题撒,下面把logstash启了
创建logstash服务
先看Dockerfile吧,如下
Dockerfile
FROM docker.elastic.co/logstash/logstash:6.5.4
USER root
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone
USER logstash
RUN logstash-plugin install logstash-input-kafka && logstash-plugin install logstash-output-elasticsearch
COPY kafka.conf /usr/share/logstash/config/
COPY start.sh /
CMD ["/bin/bash","/start.sh"]需要什么插件自行安装吧,我是直接吧配置文件传了进去,也可以使用自定义的,脚本内容如下
#!/bin/bash
if [ -n "$CONFIG" ];
then
logstash -f "$CONFIG"
else
logstash -f ./config/kafka.conf
fikafka.conf内容
这个文编的编写最好咨询开发人员,用的topics都有什么,然后去创建对应的topics,现在还没定义,我写的default
input{
kafka{
bootstrap_servers => ["kafka1:9092,kafka2:9092,kafka3:9092"]
consumer_threads => 5
topics => ["default"]
decorate_events => true
type => "default"
}
}
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:logdate}"]
}
date {
match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS"]
target => "@timestamp"
}
mutate {
remove_tag => ["logdate"]
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200","elasticsearch2:9200","elasticsearch3:9200"]
index => "logstash-%{type}-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug {}
}
}启动logstash
配置文件
[root@docker-manager /swarm/logstash]# cat logstash.yml
version: '3.7'
services:
logstash:
image: registry.cn-beijing.aliyuncs.com/rj-bai/logstash:6.5.4
hostname: logstash
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == logstash]
environment:
- "LS_JAVA_OPTS=-Xms1G -Xmx1G"
networks:
- recharge
networks:
recharge:
external: true
name: recharge如果要挂载自定义的配置文件,请使用root用户挂载撒
创建服务
[root@docker-manager /swarm/logstash]# ansible logstash -m script -a "/root/sh/es.sh"
[root@docker-manager /swarm/logstash]# docker stack deploy -c logstash.yml --with-registry-auth logstash
[root@docker-manager /swarm/logstash]# docker stack ps logstash
没问题的撒,然后接下来kibana
创建kibana服务
Dockerfile如下
Dockerfile
FROM docker.elastic.co/kibana/kibana:6.5.4
USER root
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone配置文件
version: '3.7'
services:
kibana:
image: registry.cn-beijing.aliyuncs.com/rj-bai/kibana:6.5.4
hostname: kibana
deploy:
replicas: 1
endpoint_mode: vip
placement:
constraints: [node.hostname == logstash]
ports:
- 5601:5601
networks:
- recharge
networks:
recharge:
external: true
name: recharge创建服务
[root@docker-manager /swarm/kibana]# docker stack deploy -c kibana.yml --with-registry-auth kibana创建es-head
Dockerfile如下
FROM mobz/elasticsearch-head:5
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone大概就是这样撒,现在可以启动了。
配置文件
version: '3.2'
services:
es-head:
image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch-head:5
deploy:
placement:
constraints:
- node.hostname == logstash
ports:
- 9100:9100
networks:
- recharge
networks:
recharge:
external:
name: recharge可以启动了撒。
创建服务
[root@docker-manager /swarm/logstash]# docker stack deploy -c es-head.yml --with-registry-auth es-head到这里就结束了,最后看一眼所有的服务
所有的服务
就这些,然后看一眼kibana吧,看es集群是否运行正常,主页如下
点开monitoring,启用监控,自己看吧。
现在es 集群还没有数据,而且有些服务还没有测,所以,我提供连接信息后,开发人员要了一个项目包,该包我看了一下除了没有涉及到连接mq&mysql,其他的都涉及到了,数据库不用说了,做了N次了,绝对没问题,mq让他们在本地项目连接测了一下,没问题,那就用这个包测一下吧。
测试阶段
然后现在有一个蛋疼的问题,在创建redis的时候我用的网络是host网络,而不是recharge网络,我们这里项目连接基础服务都是通过DNS解析去连接的,也就是写了hosts 文件,项目配置文件连接信息写的都是redis1,kafka1这种的,所以redis得手动写hosts了,至于为什么我用的hosts网络,原因就是做集群的时候不能用hosts去指定节点信息,支持不是很友好,如果跑在容器里,我会启动6个,做集群的时候我还得去查容器的IP,真的麻烦,还有个问题有时候会涉及到手动删除指定缓存,还得登录到容器,考虑上生产的时候不用容器跑redis了,测试的暂时就这样吧。
现在直接在manager节点手动启一服务个就行了,不写配置文件了,我随便拉了一个openjdk1.8的镜像,使用Dockerfile把包传了进去,项目连接信息有这些。
spring.dubbo.registry.address=zookeeper://zoo1:2181;zookeeper://zoo2:2181;zookeeper://zoo3:2181
redis.ipPorts=redis1:7000,redis1:7001,redis2:7002,redis2:7003,redis3:7004,redis3:7005
spring.data.mongodb.host=mongo
elk.kafka.urls=kafka1:9092,kafka2:9092,kafka3:9092镜像创建过程我就不写了,做个运行jar包的镜像应该都没问题,我这里直接启动,已经做好了
[root@docker-manager ~]# docker service create --name spring-boot --network recharge --constraint node.role==manager --replicas 1 spring-boot
[root@docker-manager ~]# docker service logs -f spring-boot
正常启动了,没有任何抛错,现在logstash应该也有项目日志输出了,看一下。
[root@docker-manager ~]# ssh logstash
[root@docker-manager ~]# docker service logs -f logstash_logstash
取了其中一段,去kibana看一眼撒
有了撒,创建索引后
有数据了,当然用的不止是这一个,现在顺便把nginx日志的绘图加进去,现在nginx我并没有扔到容器里,所以推log需要filebeat了,现在做一下。
nginx日志分析
点开kibana图标→Logging→Add log data→Nginx logs,会看到详细步骤,我都是centos7+的系统,所以选择RPM,阔以看到要装两个插件,都已经装了,所以登录到nginx服务器,执行几条命令就好,为了方便我全部服务器添加了es、kibana的hosts解析,所以配置如下。
[root@nginx ~]# rpm -Uvh https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.5.4-x86_64.rpm
[root@nginx ~]# vim /etc/filebeat/filebeat.yml
setup.kibana:
host: "kibana:5601"
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["elasticsearch:9200","elasticsearch2:9201","elasticsearch3:9202"]我的log并不是全推,这个nginx只有两个,所以写成这样。
[root@nginx ~]# filebeat modules enable nginx
Enabled nginx
[root@nginx ~]# vim /etc/filebeat/modules.d/nginx.yml
- module: nginx
access:
enabled: true
var.paths: ["/usr/local/nginx/logs/yourlogfile.log","/usr/local/nginx/logs/yourlogfile.log"]
error:
enabled: true
var.paths: ["/usr/local/nginx/logs/error.log"]最后启动filebeat
[root@nginx ~]# filebeat setup
Loaded index template
Loading dashboards (Kibana must be running and reachable)
Loaded dashboards
Loaded machine learning job configurations
[root@nginx ~]# systemctl start filebeat.service之后会多一个名为filebeat的索引,然后就有图形了,在添加数据的页面可以看到支持分析的logs有很多,像是什么redis、mysql系统日志之类的,自行探索吧,我暂时不打算加了,上生产再加。
最终测试
咨询了开发人员这套系统的项目是13个,生产数据库准备用阿里云的RDS,测试服暂时是准备了十个业务服务器,所以现在组成集群的服务器有这些,27个
emmmm,说实话第一次这样做,不知道基础服务能不能抗住,所以我准备测一下,启服务试试,越多越好,所以我写了下面的文件。
version: '3.7'
services:
spring-boot:
image: registry.cn-beijing.aliyuncs.com/rj-bai/spring-boot:latest
deploy:
replicas: 50
endpoint_mode: vip
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
window: 10s
update_config:
parallelism: 10
delay: 60s
failure_action: rollback
rollback_config:
parallelism: 10
delay: 60s
failure_action: pause
ports:
- 1116:1116
networks:
- recharge
networks:
recharge:
external: true
name: rechargecompose文件的编写建议参考官方文档,我上面用的参数在那里都能找到,我是直接启动50个spring-boot服务,没有什么服务约束,任意服务器都能跑,感觉这样测没啥意义,但是能让我心里有个底,50个服务同时启动上面的服务器能不能抗住,所以,就启动吧,上述项目没有涉及到连接mysql&mq,其他的都涉及到了,开始。
[root@docker-manager /swarm/spring-boot]# docker stack deploy -c spring-boot.yml --with-registry-auth spring-boot
[root@docker-manager /swarm/spring-boot]# docker stack ps spring-boot大概是这种效果。
然后看logstash,日志疯狂输出,等停了应该就算完事了,看一眼写了多少日志。
算了一下大概3943条日志,现在已经启动完了,看看有没有宕掉的服务。
[root@docker-manager ~]# for i in `docker stack ls | grep -vi "NAME" | awk {'print $1'}`;
> do
> docker stack ps $i --filter desired-state=shutdown
> done
nothing found in stack: activemq
nothing found in stack: elasticsearch
nothing found in stack: es-head
nothing found in stack: kafka
nothing found in stack: kafka-zookeeper
nothing found in stack: kibana
nothing found in stack: logstash
nothing found in stack: mongo-express
nothing found in stack: mongodb
nothing found in stack: mysql
nothing found in stack: redis
nothing found in stack: spring-boot
nothing found in stack: zookeeper没有,一切正常,最后瞜一眼全部服务
[root@docker-manager ~]# docker stack ls
[root@docker-manager ~]# docker service ls
没啥子卵问题,先把这个spring-boot服务删了吧,留着没啥用了,还有zabbix就不要用swarm去部署了,直接用compose去启动就行了,直接写过,这里就不贴了,下面简单提一下jenkins。
jenkins
最后简单提一下jenkins这一块,jenkins也是暂时没有放到容器里,因为jenkins服务器现在要做的事情有很多,之前说白了就是jenkins把项目包打出来,调用playbook开始更新,所以装了ansible,这样就行了,现在情况不同了。
现在要更新项目就必须用镜像去更新了,看了一下docker的插件,貌似没有我想要的,所以我暂时的做法是jenkins打包,打包成功后写Dockerfile将项目包传到准备好的jdk1.8镜像里,然后传到阿里云的镜像仓库,jenkins远程调用manager服务器上的脚本进行更新,经过确认所有项目都会映射一个端口出来,且不冲突,需要挂载一个目录到/www/logs,本目录为项目日志存储目录,所以jenkins构建成功后操作如下,变量均为jenkins内置变量,没什么流程控制,将就看吧,大体思路就是这样。
#!/bin/bash
### 登陆到私有仓库
docker login --username=user --password password registry.cn-beijing.aliyuncs.com
### 创建项目名和构建次数目录
mkdir -p /data/docker/$JOB_NAME/$BUILD_NUMBER
### 复制项目包到新建目录
cp $WORKSPACE/xxx.jar /data/docker/$JOB_NAME/$BUILD_NUMBER
### 编写Dockerfile,构建镜像
cd /data/docker/$JOB_NAME/$BUILD_NUMBER
jar=`ls *.jar`
cat >>Dockerfile<<OEF
FROM registry.cn-beijing.aliyuncs.com/xxx/oracle-jdk:1.8
ADD $jar /
OEF
docker build -t registry.cn-beijing.aliyuncs.com/xxx/xxx:$BUILD_NUMBER .
### 镜像传到私有仓库
docker push registry.cn-beijing.aliyuncs.com/xxx/xxx:$BUILD_NUMBER
sleep 5
### 服务端更新
ssh 目标IP "/bin/bash /scripts/deploy-service.sh" "镜像" "服务名" "端口"manager脚本如下,所有测试项目调用这一个脚本就够了,只要参数别传错了。
#!/bin/bash
### 加载环境变量
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
### 判断参数传入个数,没有做更详细的判断
Image=$1
Server_Name=$2
Port=$3
if [ $# != 3 ] ; then
echo "USAGE: $0 请依次传入镜像地址、服务名称和端口号撒"
exit 1;
fi
### 登录到镜像仓库,下载镜像
docker login --username=user --password password registry.cn-beijing.aliyuncs.com > /dev/null
docker pull "$Image" > /dev/null
### 判断镜像是否成功
if [ "$?" -ne 0 ]
then
echo "Pull "$Image" Failed"
exit 1
fi
### 检查当前是否有该服务,如果有直接更新,没有创建
docker service ls | grep "$Server_Name" > /dev/null 2>&1
if [ $? -eq 0 ]
then
docker service update --with-registry-auth --image "$Image" "$Server_Name" > /dev/null && echo "Update "$Server_Name" Success" || echo "Update "$Server_Name" failed"
else
docker service create --name "$Server_Name" \
--replicas 1 \
--constraint node.role==worker \
--with-registry-auth \
--publish "$Port:$Port" \
--update-delay 60s \
--update-parallelism 1 \
--update-failure-action rollback \
--rollback-delay 60s \
--rollback-parallelism 1 \
--rollback-failure-action pause \
--env "environment=test" \
--env "Xms=-Xms512M" \
--env "Xmx=-Xmx512M" \
--network recharge \
--mount type=bind,src=/www/logs,dst=/www/logs \
$Image > /dev/null && echo "Deploy "$Server_Name" Success" || echo "Deploy "$Server_Name" failed"
fi经过我这头的测试是没啥子问题,测试服都在用了,怎么感觉都是反逻辑,但是暂时还没别的好办法,上生产的的话需要的参数更多,像是什么健康检查之类测试都没加,测试每个项目的容器测试只是启动了一个,生产肯定要更多,JAVA的启动内存也得根据项目去改,总之各种改,所以,先跑路回家过年了,来年再说。