[00:46:30] !log T238305 cp3053.mgmt /admin1-> racadm serveraction hardreset [00:46:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:46:34] T238305: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 [00:46:49] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10CDanis) `23:22:06 <+icinga-wm> PROBLEM - Host cp3053 is DOWN: PING CRITICAL - Packet loss = 100%` nothing in logs as usual [00:49:27] RECOVERY - Host cp3053 is UP: PING OK - Packet loss = 0%, RTA = 83.41 ms [02:54:31] RECOVERY - Debian mirror in sync with upstream on sodium is OK: /srv/mirrors/debian is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [03:55:30] (03PS1) 10Ammarpad: Document why we have duplicate false value [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565751 (https://phabricator.wikimedia.org/T183549) [04:07:47] (03PS2) 10Ammarpad: Document why we have duplicate false value [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565751 (https://phabricator.wikimedia.org/T183549) [05:57:13] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Marostegui) Is there any action plan to investigate these issues? [06:06:33] 10Puppet, 10VPS-project-codesearch, 10Patch-For-Review: Puppetize codesearch - https://phabricator.wikimedia.org/T242319 (10Legoktm) docker instances aren't starting because of: ` Jan 19 06:05:30 codesearch6 dockerd[8955]: time="2020-01-19T06:05:30.388754452Z" level=error msg="Handler for POST /v1.40/contai... [06:50:24] 10Puppet, 10VPS-project-codesearch, 10Patch-For-Review: Puppetize codesearch - https://phabricator.wikimedia.org/T242319 (10Legoktm) >>! In T242319#5815001, @Legoktm wrote: > docker instances aren't starting because of: > > ` > Jan 19 06:05:30 codesearch6 dockerd[8955]: time="2020-01-19T06:05:30.388754452Z"... [07:07:41] 10Puppet, 10VPS-project-codesearch, 10Patch-For-Review: Puppetize codesearch - https://phabricator.wikimedia.org/T242319 (10Legoktm) After reading https://github.com/moby/moby/issues/26824#issuecomment-412309421 which said that running iptables legacy and nftables at the same time was a bad idea, I performed... [07:23:41] PROBLEM - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-m [07:25:20] (03PS1) 10Legoktm: codesearch: Use iptables-legacy for docker compatibility [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) [07:27:14] (03CR) 10Legoktm: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/20444/" [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [07:29:27] 10Puppet, 10VPS-project-codesearch, 10Patch-For-Review: Puppetize codesearch - https://phabricator.wikimedia.org/T242319 (10Legoktm) OK! https://codesearch6.wmflabs.org/search/ is ready for testing as a fully puppetized codesearch buster instance. [07:50:25] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [07:57:49] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [08:03:17] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [08:16:07] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [08:17:57] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [08:45:31] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [08:54:45] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [08:58:25] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [09:05:45] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [09:51:47] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [10:04:35] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [10:12:53] PROBLEM - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-m [10:13:45] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [10:16:53] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Vgutierrez) >>! In T238305#5814996, @Marostegui wrote: > Is there any action plan to investigate these issues? Currently T242579 is our only hope of getting more information about this issue [10:21:05] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [10:26:37] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [10:26:53] * effie checking [10:48:47] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [10:56:11] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:01:43] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:09:03] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:10:06] effie: o/ - is it change-prop related? Just realized it is about codfw [11:10:34] it does not look like it, I am still digging here and there [11:10:49] ack, lemme know if you need help [11:10:55] I am around for ~30/40 mins :) [11:11:03] traffic has not increased, but latency to monitoring urls increased sometime after 7 UTC [11:11:14] tx :) [11:11:35] I will give it a little more time and then open a task [11:12:45] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:16:58] I tried to render a per host latency graph in codfw, the heaviest affected are mwdebugs afaics [11:17:01] weird [11:17:15] maybe because they are vms [11:18:59] also I am confused, I don't see any requests in the apache logs [11:19:13] I also saw some tcp attempt fails starting [11:19:26] so I was wondering if some network device is missbehaving [11:20:07] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:20:11] elukey: https://w.wiki/FkK [11:20:30] !log restart-php-fpm on mw2181 to rule out temporary php-related issues in codfw [11:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:57] that is a good one, maybe they are coming from a single row [11:22:28] yep rack A3 from netbox [11:23:10] ah no also one in A4 [11:23:44] and C4 [11:24:45] very interesting from mw2181's access logs [11:25:15] time to render the Special:Blank page is 3136583 for the last that I checked [11:26:10] that is in micro seconds, so could it be that the latency increase is simply the health check taking ages ? [11:26:57] it appears so yes [11:27:07] I didn't see anything funny on some random logs I checked [11:27:13] so something between lvs and mw hosts [11:27:27] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:27:46] I am curious [11:32:57] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:39:53] effie: need to go, seems something not super critical, will re-check later on! [11:40:11] yeah I will leave it here too [11:40:14] tx [11:40:17] PROBLEM - High average GET latency for mw requests on appserver in codfw on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:50:01] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - https://phabricator.wikimedia.org/T243148 (10Marostegui) [11:51:00] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - https://phabricator.wikimedia.org/T243148 (10Marostegui) ` racadm>>serveraction powerstatus racadm serveraction powerstatus Server power status: OFF racadm>> ` [11:53:11] 10Operations, 10serviceops: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) - https://phabricator.wikimedia.org/T243149 (10jijiki) [11:53:51] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - https://phabricator.wikimedia.org/T243148 (10Marostegui) Nothing relevant on `centrallog1001` from db2085. [11:56:51] RECOVERY - High average GET latency for mw requests on appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=appserver&var-method=GET [11:57:39] 10Operations, 10ops-codfw, 10DBA: db2085 crashed - https://phabricator.wikimedia.org/T243148 (10Marostegui) I have powered it back on - no errors on boot. MySQL hasn't been started. [11:59:39] RECOVERY - High average GET latency for mw requests on api_appserver in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET [12:02:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2085:3311, db2085:3318 T243148', diff saved to https://phabricator.wikimedia.org/P10210 and previous config saved to /var/cache/conftool/dbconfig/20200119-120236-marostegui.json [12:02:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:41] T243148: db2085 crashed - https://phabricator.wikimedia.org/T243148 [15:00:03] 10Operations, 10Wikibugs: wikibugs needs restart almost everyday - https://phabricator.wikimedia.org/T241109 (10valhallasw) 05Open→03Resolved a:03valhallasw [15:05:33] 10Operations, 10Wikibugs: wikibugs needs restart almost everyday - https://phabricator.wikimedia.org/T241109 (10valhallasw) From what I can see the bot crashes and restarts a few times per day, which -- although not great -- I think is acceptable. About 2/3rd of those are errors retrieving anchors, the rest ar... [15:11:46] marostegui: o/ [15:12:11] elukey: o/ [15:16:05] marostegui: I was checking the codfw mw appserver latency issue that Effie was working on, and it seems that it went away as soon as you depooled db2085 [15:17:16] the high latency was seen for Special:Blank page [15:17:18] of enwiki [15:17:46] so I suppose that the high latency was related to db2085 being overwhelmed [15:17:49] does it make sense? [15:18:01] check https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=appserver&var-method=GET&var-code=200&fullscreen&panelId=9 [15:20:11] 10Operations, 10serviceops: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) - https://phabricator.wikimedia.org/T243149 (10elukey) This seems to be related to T243148, db2085 was overwhelmed and this explains the high latency (Special:Blank page health checks were taking ages to... [15:20:20] ok added more info to --^ [15:22:17] all right going afk again, if anybody has a different idea please add it in the task, otherwise I think we are good to close it :) [15:25:16] (03PS1) 10Ayounsi: Add RPKI whitelist support [homer/public] - 10https://gerrit.wikimedia.org/r/565771 [15:28:18] (03CR) 10Volans: "one question inline" (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/565771 (owner: 10Ayounsi) [15:37:33] elukey: not really, that host doesn't receive reads or anything [15:37:36] so it is weird [15:47:20] marostegui: ah snap, then my theory is wrong [15:47:36] timing is really weird though [15:51:17] elukey: It is strange, that host isn't supposed to be even checked [15:51:24] Definitely isn't getting any reads [17:40:20] 10Operations, 10serviceops: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) - https://phabricator.wikimedia.org/T243149 (10Marostegui) db2085 is a s1 and s8 codfw slave (multi instance). We don't have read traffic on codfw databases, how could it cause those latency issues? [18:13:13] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 74445528 and 9 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:13:13] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 40505016 and 4 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:15:03] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1620888 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:16:53] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3992 and 32 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:17:55] PROBLEM - Host cp3061 is DOWN: PING CRITICAL - Packet loss = 100% [18:26:59] PROBLEM - Host mr1-eqsin.oob is DOWN: PING CRITICAL - Packet loss = 100% [18:32:55] RECOVERY - Host mr1-eqsin.oob is UP: PING OK - Packet loss = 0%, RTA = 233.36 ms [19:27:34] 10Operations, 10serviceops: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) - https://phabricator.wikimedia.org/T243149 (10Marostegui) And according to the graph the latency increase indeed starts when db2085 went down ` Jan 19 07:19:49 icinga1001 icinga: SERVICE ALERT: db2085;p... [19:29:00] 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Marostegui) `18:17:56 <+icinga-wm> PROBLEM - Host cp3061 is DOWN: PING CRITICAL - Packet loss = 100%` Might be another case... [19:32:58] (03PS2) 10Legoktm: codesearch: Use iptables-legacy for docker compatibility [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) [19:34:05] (03CR) 10jerkins-bot: [V: 04-1] codesearch: Use iptables-legacy for docker compatibility [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319) (owner: 10Legoktm) [19:35:15] (03PS3) 10Legoktm: codesearch: Use iptables-legacy for docker compatibility [puppet] - 10https://gerrit.wikimedia.org/r/565752 (https://phabricator.wikimedia.org/T242319)