[00:02:37] <icinga-wm>	 PROBLEM - HHVM rendering on mw2142 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw2257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw2276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:39] <icinga-wm>	 PROBLEM - PHP7 rendering on mw2205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:39] <icinga-wm>	 PROBLEM - PHP7 rendering on mw2262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:39] <icinga-wm>	 PROBLEM - PHP7 rendering on mw2284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw2176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw2201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:40] <icinga-wm>	 PROBLEM - HHVM rendering on mw2238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:40] <icinga-wm>	 PROBLEM - HHVM rendering on mw2269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:41] <icinga-wm>	 PROBLEM - HHVM rendering on mw2168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:41] <icinga-wm>	 PROBLEM - HHVM rendering on mw2215 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:02:42] <icinga-wm>	 PROBLEM - HHVM rendering on mw2206 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:03:35] <paladox>	 hmm
[00:03:36] <paladox>	 is that expected?
[00:03:47] <icinga-wm>	 RECOVERY - HHVM rendering on mw2142 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.282 second response time
[00:03:47] <icinga-wm>	 RECOVERY - HHVM rendering on mw2276 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.286 second response time
[00:03:47] <icinga-wm>	 RECOVERY - HHVM rendering on mw2257 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.289 second response time
[00:03:47] <icinga-wm>	 RECOVERY - PHP7 rendering on mw2262 is OK: HTTP OK: HTTP/1.1 200 OK - 80004 bytes in 0.294 second response time
[00:03:47] <icinga-wm>	 RECOVERY - PHP7 rendering on mw2205 is OK: HTTP OK: HTTP/1.1 200 OK - 80004 bytes in 0.299 second response time
[00:03:47] <icinga-wm>	 RECOVERY - PHP7 rendering on mw2284 is OK: HTTP OK: HTTP/1.1 200 OK - 80004 bytes in 0.316 second response time
[00:03:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2201 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.288 second response time
[00:03:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2176 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.291 second response time
[00:03:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2215 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.288 second response time
[00:03:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2238 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.291 second response time
[00:03:50] <icinga-wm>	 RECOVERY - HHVM rendering on mw2168 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.290 second response time
[00:03:50] <icinga-wm>	 RECOVERY - HHVM rendering on mw2206 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.290 second response time
[00:03:51] <icinga-wm>	 RECOVERY - HHVM rendering on mw2269 is OK: HTTP OK: HTTP/1.1 200 OK - 79963 bytes in 0.354 second response time
[00:03:58] <Platonides>	 this fixes it :P
[00:04:57] <paladox>	 lol
[00:09:47] <wikibugs>	 (03PS3) 10Elukey: Fix ports for wmcs/labs' Prometheus Memcached exporters [puppet] - 10https://gerrit.wikimedia.org/r/487453
[00:11:10] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] Fix ports for wmcs/labs' Prometheus Memcached exporters [puppet] - 10https://gerrit.wikimedia.org/r/487453 (owner: 10Elukey)
[00:11:17] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/14501/" [puppet] - 10https://gerrit.wikimedia.org/r/487453 (owner: 10Elukey)
[00:29:49] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:40:32] <wikibugs>	 (03PS1) 10Elukey: Fix wmcs' prometheus memcached exporter args [puppet] - 10https://gerrit.wikimedia.org/r/487456
[00:40:55] <elukey>	 gtirloni: --^
[00:43:06] <gtirloni>	 ok
[00:48:37] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] Fix wmcs' prometheus memcached exporter args [puppet] - 10https://gerrit.wikimedia.org/r/487456 (owner: 10Elukey)
[00:50:41] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[01:51:51] <wikibugs>	 (03PS1) 1020after4: Disallow local_infile for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/487459 (https://phabricator.wikimedia.org/T214248)
[01:53:17] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:55:53] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[02:03:32] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] Disallow local_infile for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/487459 (https://phabricator.wikimedia.org/T214248) (owner: 1020after4)
[02:56:16] <wikibugs>	 (03CR) 10Bstorm: "If nothing else, thank you for adding the .py extension in puppet!  That's a great idea.  I have mixed feelings about adding wmcs on *ever" [puppet] - 10https://gerrit.wikimedia.org/r/487368 (owner: 10Arturo Borrero Gonzalez)
[02:58:23] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[03:27:09] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[04:04:37] <wikibugs>	 10Operations: client-cluster.js -  "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10GTirloni)
[04:06:01] <wikibugs>	 10Operations: parsoid-vd -  "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049 (10GTirloni)
[04:07:06] <gtirloni>	 ^^ T215049
[04:07:07] <stashbot>	 T215049: parsoid-vd -  "no such file or directory, open '/srv/visualdiff/testreduce/testrun.ids" - https://phabricator.wikimedia.org/T215049
[04:07:21] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:20:09] <icinga-wm>	 PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[04:59:59] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:09:13] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:18:31] <icinga-wm>	 PROBLEM - MegaRAID on db1073 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[05:18:34] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1073 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T215050
[05:18:39] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1073 - https://phabricator.wikimedia.org/T215050 (10ops-monitoring-bot)
[05:29:56] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[05:38:08] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:56:04] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[06:07:22] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1073 - https://phabricator.wikimedia.org/T215050 (10Marostegui) p:05Triage→03Normal a:03Cmjohnson Let's get it replaced sooner than later as it is a master on m5
[06:09:13] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1073 - https://phabricator.wikimedia.org/T215050 (10Marostegui)
[06:11:57] <wikibugs>	 (03Abandoned) 10MaxSem: WIP: [labs] Puppetize XTools [puppet] - 10https://gerrit.wikimedia.org/r/368101 (https://phabricator.wikimedia.org/T170514) (owner: 10MaxSem)
[06:15:04] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:28:12] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:29:30] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:29:36] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs]
[06:32:20] <icinga-wm>	 PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apache2/sites-available/00-dummy.conf]
[06:56:02] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[06:58:44] <icinga-wm>	 RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[07:13:34] <bawolff_>	 !log reset 2FA on wikitech for [[User:Cicalese]]
[07:13:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:14] <wikibugs>	 10Operations, 10Gerrit, 10Icinga, 10monitoring: Investigate why icinga did not report high cpu/load for gerrit - https://phabricator.wikimedia.org/T215033 (10Dzahn)  The `check_load` plugin can be used for that. We do use it but only on other servers, API appservers, SWIFT and a passive check for Fundraisi...
[07:39:07] <wikibugs>	 10Operations, 10Maps (Kartotherian): Create discovery entry for Kartotherian - https://phabricator.wikimedia.org/T214672 (10Mathew.onipe) Also to further confirm that kartotherian has a discovery entry:  ` onimisionipe@elastic1017:~$ ping kartotherian.discovery.wmnet PING kartotherian.discovery.wmnet (10.2.1.1...
[09:59:08] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:26:30] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[10:59:24] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on db1068 is CRITICAL: 19 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db1068&var-datasource=eqiad+prometheus/ops
[11:08:31] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: graphite: refactor into role/profile [puppet] - 10https://gerrit.wikimedia.org/r/487481
[11:09:00] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: monitoring: refactor code into roles/profiles [puppet] - 10https://gerrit.wikimedia.org/r/487482
[11:10:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: monitoring: refactor code into roles/profiles [puppet] - 10https://gerrit.wikimedia.org/r/487482 (owner: 10Arturo Borrero Gonzalez)
[11:11:48] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: wmcs: monitoring: refactor code into roles/profiles [puppet] - 10https://gerrit.wikimedia.org/r/487482
[11:12:13] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: graphite: refactor into role/profile [puppet] - 10https://gerrit.wikimedia.org/r/487481
[11:14:37] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "I made this commit while working on" [puppet] - 10https://gerrit.wikimedia.org/r/487481 (owner: 10Arturo Borrero Gonzalez)
[11:15:37] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "The change with ID I7f6781aa17ed8924c13e91c83b798bdc59bb9c3c is requried by this patch." [puppet] - 10https://gerrit.wikimedia.org/r/487482 (owner: 10Arturo Borrero Gonzalez)
[11:17:52] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "Thank you very much folks for your review :-)" [puppet] - 10https://gerrit.wikimedia.org/r/487368 (owner: 10Arturo Borrero Gonzalez)
[12:34:31] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: cold-migrate: make it datacenter-aware [puppet] - 10https://gerrit.wikimedia.org/r/487487
[12:38:48] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: cold-migrate: make it datacenter-aware [puppet] - 10https://gerrit.wikimedia.org/r/487487 (owner: 10Arturo Borrero Gonzalez)
[13:28:43] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: cold-migrate: make nova database configurable [puppet] - 10https://gerrit.wikimedia.org/r/487491
[13:29:52] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:33:24] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T215066 (10alaa_wmde)
[13:33:38] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: cold-migrate: make nova database configurable [puppet] - 10https://gerrit.wikimedia.org/r/487491 (owner: 10Arturo Borrero Gonzalez)
[13:36:40] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T215066 (10alaa_wmde) a:05alaa_wmde→03None
[13:50:35] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: cold-migrate: allow to migrate VM instances in SHUTOFF state [puppet] - 10https://gerrit.wikimedia.org/r/487493
[13:54:50] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: cold-migrate: allow to migrate VM instances in SHUTOFF state [puppet] - 10https://gerrit.wikimedia.org/r/487493 (owner: 10Arturo Borrero Gonzalez)
[13:55:52] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[14:16:49] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: cold-migrate: use python logging [puppet] - 10https://gerrit.wikimedia.org/r/487496
[14:17:38] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: cold-migrate: use python logging [puppet] - 10https://gerrit.wikimedia.org/r/487496 (owner: 10Arturo Borrero Gonzalez)
[14:20:58] <icinga-wm>	 PROBLEM - puppet last run on wtp1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:52:42] <icinga-wm>	 RECOVERY - puppet last run on wtp1034 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[15:42:35] <wikibugs>	 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Request to merge wikipedia subdomains into one to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Vpab15)
[15:55:53] <wikibugs>	 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Request to merge wikipedia subdomains into one to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Krenair) Well you wouldn't be able to distinguish e.g. English Wikipedia from French Wikipedia traffic by looking at the DNS lookup or...
[15:59:38] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[16:01:42] <wikibugs>	 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Request to merge wikipedia subdomains into one to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Vpab15) 05Open→03Resolved a:03Vpab15 Thanks Krenair  I will mark this as resolved then
[16:06:32] <wikibugs>	 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Request to merge wikipedia subdomains into one to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Krenair) I'm not sure it's strictly resolved, I wouldn't say it's invalid and I don't think it would get outright declined either. I f...
[16:08:12] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[16:25:50] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[16:29:42] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to add alaasarhan to the wmde LDAP group - https://phabricator.wikimedia.org/T215066 (10Legoktm)
[16:35:08] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to add alaasarhan to the wmde LDAP group - https://phabricator.wikimedia.org/T215066 (10WMDE-leszek) As an Engineering Manager at WMDE, I endorse this request.
[16:37:30] <wikibugs>	 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Request to merge wikipedia subdomains into one to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Vpab15) 05Resolved→03Open I misunderstood then. I took a look at the ESNI task you mentioned, but couldn't really understand if im...
[16:53:05] <wikibugs>	 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Request to merge wikipedia subdomains into one to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Vpab15) a:05Vpab15→03None
[17:29:32] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:56:46] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[18:29:44] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[18:59:18] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[19:06:07] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team: cloudcontrol1004 mgmt HTTPS SSL error - https://phabricator.wikimedia.org/T215075 (10Cmjohnson)
[19:06:34] <wikibugs>	 (03PS2) 1020after4: Disallow local_infile for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/487459 (https://phabricator.wikimedia.org/T214248)
[19:08:06] <wikibugs>	 (03CR) 1020after4: [C: 03+1] Disallow local_infile for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/487459 (https://phabricator.wikimedia.org/T214248) (owner: 1020after4)
[19:26:42] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[19:38:40] <wikibugs>	 10Operations, 10ops-eqiad: cloudstore100{8,9} - Upgrade to 10GbE - https://phabricator.wikimedia.org/T214079 (10Cmjohnson) @GTirloni I  do not have room in row A. These can go into Row D racks D2 and D7.  Doing this will require a DNS (ip) change and I will have to fix the servers to use the 10G NIC.   A re-in...
[19:47:05] <wikibugs>	 10Operations, 10ops-eqiad: cloudstore100{8,9} - Upgrade to 10GbE - https://phabricator.wikimedia.org/T214079 (10GTirloni) @Cmjohnson that works for me. We can do both if time allows.
[19:51:34] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team: cloudcontrol1004 mgmt HTTPS SSL error - https://phabricator.wikimedia.org/T215075 (10GTirloni) cloudcontrol1004 is currently our standby OpenStack control server so it can be shutdown if needed. The proposed time doesn't conflict with any...
[20:25:41] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "Looks right to me, for v4." [dns] - 10https://gerrit.wikimedia.org/r/486504 (https://phabricator.wikimedia.org/T214448) (owner: 10Papaul)
[20:29:52] <icinga-wm>	 PROBLEM - Host mw1299 is DOWN: PING CRITICAL - Packet loss = 100%
[20:51:12] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] icinga: enable check for psi and omega clusters [puppet] - 10https://gerrit.wikimedia.org/r/484679 (https://phabricator.wikimedia.org/T212850) (owner: 10Mathew.onipe)
[20:53:02] <icinga-wm>	 PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:29:20] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[21:56:40] <icinga-wm>	 RECOVERY - Check systemd state on scandium is OK: OK - running: The system is fully operational
[22:05:41] <wikibugs>	 10Operations, 10DNS, 10Domains, 10Traffic, 10HTTPS: Merge Wikipedia subdomains into one, to discourage censorship - https://phabricator.wikimedia.org/T215071 (10Aklapper)
[22:34:27] <marostegui>	 I will try to reboot mw1299 from the mgmt iface
[22:37:50] <icinga-wm>	 RECOVERY - Host mw1299 is UP: PING WARNING - Packet loss = 44%, RTA = 0.25 ms
[23:16:59] <vgutierrez>	 !log restart pdfrender on scb1004
[23:17:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:17:52] <icinga-wm>	 RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.003 second response time
[23:18:47] <wikibugs>	 10Operations, 10DBA, 10Packaging: db2085 doesn't boot with 4.9.0-8-amd64 - https://phabricator.wikimedia.org/T214840 (10Marostegui) p:05Triage→03Normal
[23:59:35] <icinga-wm>	 PROBLEM - Check systemd state on scandium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.